SAS代写 Important Note 1: AS mentioned before, please do not submit zip folder. Submit two files: A word report and SAS code.
Important Note 1: SAS代写
AS mentioned before, please do not submit zip folder. Submit two files: A word report and SAS code. The word report must meet the requirements stated before.
Important Note 2: SAS代写
Please be specific when you write your answers. There are 6 parts, so you need to specify exactly which part you are answering.
The SAS dataset HeinzHunts has data on grocery store purchases of Hunts and Heinz ketchup. SAS代写
Each observation corresponds to one purchase occasion (of one of these brands) and consists of the following variables:
- Heinz: =1 if Heinz was purchased, =0 if Hunts was purchased
- PriceHeinz: Price of Heinz
- PriceHunts: Price of Hunts
- DisplHeinz: = 1 if Heinz had a store display, =0 if Heinz did not have a store display
- DisplHunts: = 1 if Hunts had a store display, =0 if Hunts did not have a store display
- FeatureHeinz: = 1 if Heinz had a store feature, =0 if Heinz did not have a store feature
- FeatureHunts: = 1 if Hunts had a store feature, =0 if Hunts did not have a store feature
1. Create a variable LogPriceRatio = log (PriceHeinz/PriceHunts). SAS代写
data data; set da.Heinzhunts; LogPriceRatio = log (PriceHeinz/PriceHunts); run;
2.Randomly select 80% of the data set as the training sample, remaining 20% as test sample SAS代写
proc surveyselect data=data method=srs seed=123 outall samprate=0.8 out=splitdata; run; data training; set splitdata; if Selected=1; drop Selected; run; data test; set splitdata; if Selected=0; drop Selected; run;
3.Estimate a logit probability model for the probability that Heinz is purchased – SAS代写
using LogPriceRatio, DisplHeinz, FeatureHeinz, DisplHunts, FeatureHunts as the explanatory variables.Include interaction terms between display and feature for a particular brand (e.g., DisplHeinz * FeatureHeinz).
data training; set training; interHeinz = DisplHeinz*FeatHeinz; interHunts = DisplHunts*FeatHunts; run; proc logistic data=training; model Heinz (descending) = LogPriceRatio DisplHeinz FeatHeinz DisplHunts FeatHunts interHeinz interHunts; run;
4.Interpret the results. SAS代写
What promotional methods (feature / display) are effective for Hunts? For Heinz? How would you interpret the results for the interaction effects?
Based on the result above, we can know that DisplHeinz and FeatHeinz both have a significantly positive relationship with Heinz at the level of 10%. Therefore, feature and display methods are effective for Heinz. Similarly, DisplHunts and FeatHunts are negatively related with Heinz, or positively related with Hunts, at a significance level of 10%.
So, feature and display are also effective to Hunts. The coefficient of interaction term is negative, meaning that the effect of the combined promotional methods is less than the sum of the individual effects. However, the P value of interaction terms is higher than 0.1, which means the interaction effect is not significant.
5.Based on the estimated model, and using the logit probability formula, SAS代写
calculate the change in predicted probability that Heinz is purchased if LogPriceRatio changes from 0.5 to 0.6 and Heinz does not use a feature or display, while Hunts uses a feature and a display.
Recall that in the logit model: , where Y is the outcome variable, X are the predictor variables, and are the estimated model coefficients.
data pred; input LogPriceRatio DisplHeinz FeatHeinz DisplHunts FeatHunts interHeinz interHunts; cards; 0.5 0 0 1 1 0 1 0.6 0 0 1 1 0 1 ; run; proc logistic data=training; model Heinz (descending) = LogPriceRatio DisplHeinz FeatHeinz DisplHunts FeatHunts interHeinz interHunts; score data=pred out=estimates; run;
Based on the result above, we can know that the predicted probability that Heinz is purchased is decreasing from 0.1559905324 to 0.0903286148.
6.The estimated model is to be used for targeting customers for Hunts coupons to build loyalty for the brand. SAS代写
Coupons are to be sent to customers who are likely to buy Hunts, and not to customers who are likely to buy Heinz. Therefore, the coupons should be sent to customers whose predicted probability of buying Heinz is below a certain threshold level that needs to be determined based on the costs of misclassifications (incorrectly sending / not sending a coupon)
The following information about the costs of incorrect classification is available: The cost of incorrectly sending a coupon to a customer who would have bought Heinz is $1 per customer, and the cost of incorrectly failing to send a coupon to a customer who would have bought Hunts is $0.25 per customer.
Based on these costs, what is the optimal threshold probability level that should be used with the estimated model to decide which consumers should receive coupons.
(HINT: Step 1: Using the appropriate SAS command, create an ROC table for the test data from the estimated model. The ROC table provides the number of false positive and false negative classifications for each possible probability threshold.
Step 2: Using the cost information, calculate the total cost of misclassification for each probability threshold.
Total Cost = # of False Positives * False Positive Cost + # of False Negatives * False Negative Cost
Think carefully as to what is false positive and negative in this context.
Step 3: Choose the probability threshold that leads to the lowest total cost.)
data test; set test; interHeinz = DisplHeinz*FeatHeinz; interHunts = DisplHunts*FeatHunts; run; proc logistic data=test; model Hunts (descending) = LogPriceRatio DisplHeinz FeatHeinz DisplHunts FeatHunts interHeinz interHunts / outroc = rocscore ctable; run; data rocscore; set rocscore; cost = _falpos_*1+_falneg_*0.25; run; proc sort data=rocscore; by cost; run;
The false positive is incorrectly sending a coupon to a customer who would have bought Heinz. The false negative is incorrectly failing to send a coupon to a customer who would have bought Hunts. So the probability threshold that leads to the lowest total cost, which is 15, is 0.7860743617. SAS代写