To predict future sales of the supermarket, I built a multiple regression model whose dependent variable was “Sales” from the historical data provided. First, I built scatter diagrams and a correlation analysis to select the independent variables to incorporate in the model. From the scatter plots, the variables that were found to have a strong linear relationship with the variable sales were Wages $m, No. Staff, Adv.$’000, and Car Spare. The correlation matrix indicated that the variables that had a strong relationship with sales were ‘Wages $m’, ‘No. Staff’, ‘Adv.$’000’, ‘Competitors’, ‘SundayD’, ‘Mng-Exp’, and ‘Car Spares’. Therefore, I choose to include these variables in the regression model for further analysis.
Using seven independent variables for the multiple regression model, R2 = 84.12% which means that the joint variation in ‘Wages $m’, ‘No. Staff’, ‘Adv.$’000’, ‘Competitors’, ‘SundayD’, ‘Mng-Exp’, and ‘Car Spares’ accounts for 84.12% of the variation in “Sales”. I used the F test to verify whether there was any substantial relationship between the dependent factor and the set of the independent factors.
H0: b1 = b2 = . . . = b7 = 0 (there is no significant relationship between the dependent variable and the independent variables)
H1: at least one b1≠ 0 (there exists a strong relationship between the dependent variable and at least one of the independent variables)
From the regression output, ‘Wages $m’, ‘Adv.$’000’, ‘Competitors’, and ‘Mng-Exp’: p-value = 0.000 < 0.05, Reject H0. These variables are all important in the model
For ‘SundayD’: p-value = 0.010 < 0.05, Reject H0. The variable is significant in the model.
For, ‘No. Staff’ and ‘Car Spares’: p -value = 0.371 > 0.05 and 0.578>0.05, Accept H0. The variables are not significant in the model
Therefore, we form another regression model with five independent variables. For this model, R2 = 84.02% which means that the joint variation in ‘Wages $m’, ‘Adv.$’000’, ‘Competitors’, ‘SundayD’, and ‘Mng-Exp’ accounts for 84.02% of the variation in “Sales” . The F test for these five variables, indicated that all the variables were significant in the model. However, the was an issue of multicollinearity between the variables ‘Wages $m’, and ‘Adv.$’000’.
I decided to remove the variable from the regression model which reduced R2 to 79.10%. However, since the model is for predictive purpose only, multicollinearity should not pose any problems.
The final regression model for predicting sales was found to be:
Sales $m = 2.152 +2.145 Wages $m + 0.02 Adv.$’000 – 0.293 Competitors + 0.74 SundayD + 0.192 Mng-Exp
The sales will be calculated as:
Sales $m = 2.152 +2.145 * $2.5m + 0.02* $150,000 – 0.293*2 + 0.74*1+ 0.192 *5 = $11.6 million
Part B: Forecasting AusShampoo Sales
We used Excel to find the equation suitable for predicting sales. The best equation was exponential, y = 138.38e0.383x with an R2 of 0.743. this was higher than that of a linear equation, y = 12.079x + 89.137 at R2 = 0.7301.
The forecasted sales for the following month, 2018-04, were 570.84 using the exponential equation.
Dielman, T.E., 2001. Applied regression analysis for business and economics. Pacific Grove, CA: Duxbury/Thomson Learning.
Levine, D.M., Berenson, M.L., Stephan, D. and Lysell, D., 1999. Statistics for managers using Microsoft Excel (Vol. 660). Upper Saddle River, NJ: Prentice Hall.
Stolzenberg, R.M., 2004. Multiple regression analysis. Handbook of data analysis, 165, p.208.