Question 1 (Total of 25 marks)
1. In preparation of an international swimming event in 100m backstroke, Michael Phelps tested two different types of swimsuits from his major sponsor.
Samples of 20 swimming times were recorded below using the X-Glide and the X-Fly:
X-Glide 100m backstroke (seconds)
52.52 50.78 50.93 51.02 51.48 51.60 51.75 52.97 53.93 55.45
50.53 50.80 50.97 51.05 51.48 51.60 52.85 53.10 54.15 56.32
X-Fly 100m backstroke (seconds)
50.08 50.52 50.60 50.65 51.10 51.53 51.92 53.30 53.75 52.23
50.10 50.58 50.60 50.72 51.48 51.65 52.10 53.75 54.02 57.55
(a) (10 marks) Use EXCEL to obtain a histogram and a descriptive summary for each swimsuit (X-Glide and X-Fly).
(b) (6 marks) Based on the histograms and descriptive summaries in (a), briefly describe the shape (symmetry, modality and outliers) of the data for each swimsuit.
Instructions for identifying outliers:
Whether an observation is an outlier is a matter of judgement. One rule commonly used for identifying outliers is the so-called 1.5 × IQR rule. An observation is suspected to be an outlier if it lies more than 1.5 × IQR below the first quartile Q1 or above the third quartile Q3.
Apply this rule to the data from each swimsuit. Identify suspected outliers (if any) by their exact value(s).
(c) (5 marks) Nominate appropriate measures of centrality and dispersion for the distribution of swimming times for each swimsuit. Give reason(s) for your choice. For each distribution, give and interpret the values of the summary measures you have chosen.
(d) (4 marks) On the basis of your results of (a)-(d), are there any differences between the two swimsuits? Write a report based on your findings to Michael in preparation for his upcoming event. (at most one page, double-spaced, at least 2cm margins, 12pt Times New Roman or equivalent).
Question 2 (Total of 15 marks)
A study examining brand awareness amongst children was conducted. A convenience sample of 442 letters written to Santa Claus was analysed, with letters classified as Brand Obsessed (multiple brand gifts requested), Single Brand (one brand gift requested) and No Brand (no brand gifts requested). The results were summarized as follows:
Gender No Brand Single Brand Brand Obsessed Total
Boy 48 26 99 173
Girl 71 73 125 269
Total 119 99 224 442
(a) (2.5 marks) What is the probability a randomly selected child is a Boy and requests No Brand? [Include an appropriate probability statement.]
(b) (2.5 marks) What is the probability a randomly selected child is a Girl or requested a Single Brand? [Include an appropriate probability statement.]
(c) (7 marks) Are Brand Obsessed and Gender statistically independent? Show all calculations that support your answer. [Include appropriate probability statements.]
(d) (3 marks) Explain whether the chosen sampling method is appropriate for probability calculations. If necessary, suggest an alternative sampling method.
Question 3 (Total of 20 marks)
Are TV commercials too loud? Channel ROAR! is a television channel specialising in playing loud commercials. They claim they are an exciting alternative to the quieter and ‘boring’ television channels, which play commercials at an average volume of 60 decibels. A sample of 100 TV commercial volumes from Channel ROAR! was collected and summarised as below.
TV Commercial Volume (decibels)
Standard Error 0.16
Std Deviation 15.93
Sample Variance 253.76
(a) (6 marks) Describe the symmetry and modality of the distribution. Demonstrate the existence of outliers using the 1.5xIQR rule and provide the maximum possible number of outliers in this distribution. [Hint: remember you will need to compute the Interquartile Range from the output above for the outlier test.]
(b) (4 marks) Which measures of central tendency and dispersion should be used to describe this distribution? Give (brief) reasons. Give and interpret their values.
(c) (10 marks) Test at the 0.01 level of significance whether there is evidence that the average commercial volume on Channel ROAR! is different from the average commercial volume for other TV stations.
Question 4 (Total of 20 marks)
Radio host Shock Jake encourages listeners to call in to express their views on topical issues. Radio station YouSayFM has decided to place these calls on a 7 second broadcast delay. This means Shock Jake can censor a call if inappropriate language is used.
It is believed that reaction times to cut off a caller are Normally distributed. YouSayFM decided to test Shock Jake’s reaction time, with a sample of 32 trials yielding a mean of exactly 5 seconds and a standard deviation of 2.2 seconds. YouSayFM would like to know whether Shock Jake’s reaction time is fast enough for the proposed broadcast delay.
(a) (6 marks) Construct and interpret a 99% confidence interval for the population mean reaction time for Shock Jake. Will a 7 second delay be enough? Explain briefly.
(b) (2 marks) Would you advise YouSayFM to reduce the confidence level from 99% to 95% to better determine whether a 7 second delay is enough? Explain briefly [Hint: consider the impact on the width of the confidence interval when changing confidence level]. Do not perform any calculations.
(c) (6 marks) Assuming 99% confidence, what is the sample size required to estimate, to within 1 second, the population mean reaction time of Shock Jake. Have YouSayFM taken enough samples?
(d) (6 marks) Radio standards report an average reaction time of 4.5 seconds and a standard deviation of 2.2 seconds. What is the probability that for a sample of 32 reaction times, Shock Jake will produce an average reaction time less than 5 seconds?
Note: If you are using a Normal distribution for this calculation, identify the quantity that follows a Normal distribution and use a diagram!
Question 5 (Total of 20 marks)
Choc D’Lite is a recipe book advertising desserts as being full on flavour but low on fat. A sample of 42 recipes was selected, and the scatterplot of Amount of Chocolate (grams) versus Calories, together with the equation of a fitted regression line, is shown below.
(a) (1 mark) Which is the independent and which is the dependent variable?
(b) (2.5 marks) What is the value of the intercept and what does it mean in this example? Is it meaningful? Explain.
(c) (2.5 marks) What is the value of the Slope and what does it mean in this example? Explain whatt it means.
(d) (6 marks) What are the values of the correlation coefficient and the coefficient of determination? What do they measure in this example? Interpret the two values.
(e) (6 marks) Use the linear regression model described above to predict Calories if the Amount of Chocolate used is 25 grams. Is this prediction likely to be accurate? Explain briefly.
(f) (2 marks) The legal team behind the Choc D’Lite recipe book is concerned about misleading information being reported. To demonstrate their concern, they investigate a recipe in the cookbook that requires 30 grams of chocolate. By looking at the scatterplot and the regression line fitted through the data, would you expect the regression model to underestimate or overestimate the average amount of calories in this case? Should the legal team be concerned? Explain briefly [no calculations are required for this question.]