In all lecture examples involving the use of EXCEL, as well as the solutions to tutorial questions, I have been very particular about how to format the data, clean up the output (e.g. adjust to four decimal places, label everything, edit the charts etc etc). This is because this ‘EXCEL hygiene’ is essential in the workplace and also highly valued. Generating output is easy, consistently ensuring it is clearly labelled and easy to identify, understand and track, is more difficult, simply because it takes more time. This time is worth the investment and will be expected in your report.
QUESTION ONE:
Begin by using the Random Sampling procedure demonstrated in both the lecture and tutorials in Week 10, to select a RANDOM SAMPLE of 35 observations from your data andcopy and paste all three variables (Movie ID, Twitter Activity, Revenue) into a separate worksheet labelled, ‘Sample_35’, in columns B, C and D. In column A, you are to number the rows (1-35) and label this column ‘Observation’.
(a) Include a screenshot of this Sample_35 worksheet here to demonstrate you have sampled correctly. Label this as EXHIBIT 1 and include a relevant title.
(b) In order to investigate if a linear relationship between Twitter Activity and Revenue is a reasonable assumption, use EXCEL’s scatterplot option to produce a graph of these two variables. Include the line of best fit (DO NOT INCLUDE R2 – it is not to be discussed here). Label this graph as EXHIBIT 2 with a relevant title and remember to optimise its presentation via the various formatting options available.
(c) Based ONLY on the scatterplot you produced as Exhibit 2, does a linear relationship seem reasonable? If so, is it a positive or negative slope? Provide evidence for your answer andinterpret what this means in context of this question.
QUESTION TWO:
Regardless of your answer in Question One, now assume that a linear relationship is reasonable.
(a) Using the Regression Analysis procedure in EXCEL, produce a simple linear regression model with the following requirements:
• Select 99% Confidence Level in the Output Options.
• Report all values to 4 decimal places where relevant.
• Provide the Summary Output labelled as EXHIBIT 3 with an appropriate title.
(b) Based on this output, state the equation of this regression model (correct to 4 decimal places), remembering to define the variables.
QUESTION THREE:
Before interpreting this model, it is first essential to determine whether or not it is a true representation of the relationship that exists between Twitter Activity and Revenue in the population. To do this, a hypothesis test of significance is required.
Using a 5% level of significance, determine whether or not this relationship between the Twitter Activity and Revenue is a statistically significant, linear relationship. Remember to include ALL steps, show ALL working and interpret your conclusion IN CONTEXT of this question.
QUESTION FOUR:
Report to Management
Assuming now that the model you have identified is statistically significant, provide the following in a short report to management:
(a) State and interpret the coefficient of determination for this model.
(b) State and provide an interpretation of the Y intercept, b0 and the slope coefficient, b1.
(c) Use the regression model developed in (a) to predict the revenue expected if twitter activity hits 1,000,000. Provide an interpretation of your answer.
(d) Is it appropriate to use this model to make this prediction? Explain.
(e) From the Summary Output provided in EXHIBIT 3, state and interpret the 95% confidence interval estimate of the population slope beta, correct to 4 decimal places.
i) State (WITHOUT interpretation), the 99% confidence interval estimate of the slope, also provided in EXHIBIT 3.
ii) Describe how this interval is different from your answer in (e),
iii) AND what that difference means