Problem 1: Appropriateness of Inference
For the following scenario, answer the following questions. The underlined text is the name of the StatCrunch data set to be used for that part. Please note, do not conduct inference in this problem; just answer each question.
Heights of Fathers and Sons. To test the claim that sons are taller than their fathers on average, a researcher randomly selected 13 fathers who have adult male children. She records the height of both the father and son in inches.
a)What is (are) the parameter(s) of interest? Choose one of the following symbols u the population mean)uo (the mean difference from paired (dependent) data) u1-u2 (the difference of two independent means) and describe the parameter in context of this question in one sentence.
b)Depending on your answer to part (a), construct one or two relative frequency histograms. Remember to properly title and label the graph(s). Copy and paste these graphs into your document.
c)Describe the shape of the histogram(s) in one sentence.
d)Depending on your answer to part (a), construct one or two boxplots and copy and paste these graphs into your document.
e)Does the boxplot (or do the boxplots) show any outliers? Answer this question in one sentence and identify any outliers if they are present.
f)Considering your answers to parts (c) and (e), is inference appropriate in this case? Why or why not? Defend your answer using the graphs in two to three sentences
Problem 2: Appropriateness of Inference
For the following scenario, answer the following questions. The underlined text is the name of the StatCrunch data set to be used for that part. Please note, do not conduct inference in this problem; just answer each question.
Bacteria Counts. Researchers wanted to determine if carpeted rooms contained more bacteria than uncarpeted rooms. To determine the amount of bacteria in a room, researchers pumped the air from the room over a Petri dish at the rate of 1 cubic foot per minute for a random selection of eighteen carpeted rooms and another random sample of eighteen uncarpeted rooms in a very large hospital. Colonies of bacteria were allowed to form in the Petri dishes. The data are presented as the count of bacteria per cubic foot.
a)What is (are) the parameter(s) of interest? Choose one of the following symbols ???the population mean)???D (the mean difference from paired (dependent) data)????????2 (the difference of two independent means) and describe the parameter in context of this question in one sentence.
b)Depending on your answer to part (a), construct one or two relative frequency histograms. Remember to properly title and label the graph(s). Copy and paste these graphs into your document.
c)Describe the shape of the histogram(s) in one sentence.
d)Depending on your answer to part (a), construct one or two boxplots and copy and paste these graphs into your document.
e)Does the boxplot (or do the boxplots) show any outliers? Answer this question in one sentence and identify any outliers if they are present.
f)Considering your answers to parts (c) and (e), is inference appropriate in this case? Why or why not? Defend your answer using the graphs in two to three sentences.
Problem 3: Difference in Pours
Researchers randomly selected participants to take part in a study. The participants were randomly assigned either a tall, thin “highball” glass or a short, wide “tumbler,” each which held 355 ml. The participants were asked to pour a shot (1.5 oz. = 44.3 ml) of water into their glass. Did the shape of the glass make a difference in how much liquid they poured? Assume all conditions for conducting inference are satisfied. Use the following sample statistics to complete this problem.
Statistics/Drinkware |
Tumbler |
Highball |
Sample size, n |
99 |
99 |
Sample mean, |
60.9 ml |
42.2 ml |
Sample standard deviation, s |
17.9 ml |
16.2 ml |
a)Define the population parameter of interest in context of this question in one sentence.
b)Construct the 95% confidence interval in StatCrunch using STAT T Stats Two Sample With Summary. Copy and paste the output table into your document.
c)Imagine you were using a hypothesis test to determine if a significant difference exists in the pours. State the hypotheses you would use to test this hypothesis.
d)What decision and conclusion can be made in this case? Provide an answer and a reason for your choice in one or two sentences. Please only use your confidence interval to answer this question (do not run this hypothesis test in this part).
e)Produce hypothesis test output using STAT T Stats Two Sample With Summary and verify your decision in part (d).
Problem 4: Flexible Work Schedules
A particular county’s Health Department experimented with a flexible four-day workweek. For a year, the department recorded the mileage driven by 11 field workers on an ordinary five-day workweek. Then, it changed to a flexible four-day workweek and recorded the mileage for another year for the same 11 field workers. Test the hypothesis that the five-day workweek has a greater average mileage. Assume all conditions are satisfied in this problem. The data set used for this problem is called “Flexible Work Schedule.” Use a significance level of 0.10.
a)Define the population parameter in context in one sentence.
b)State the null and alternative hypotheses using correct notation.
c)Calculate the difference between miles by subtracting (5 Day – 4 Day). List the difference for each of the 11 pairs in your document. You may calculate these differences in StatCrunch.
d)Obtain the mean of these differences and the standard deviation of these differences in StatCrunch. You may copy and paste the box that you obtain from StatCrunch or list the values. Please round these values to four decimal places.
e)Calculate the test statistic. Please do this “by hand” using the formula and showing your work (please type your work).
f)Use StatCrunch Stat T Stats Paired and enter 5 Day for Sample 1 and 4 Day for Sample 2 to verify your test statistic value and to obtain your p-value. Please correctly present your p-value using probability notation. Copy and paste your output into your document and make it only includes the mean and standard deviation.
g)State whether you reject or do not reject the null hypothesis and the reason for your decision in one sentence.
h)State your conclusion in context of the problem (i.e. interpret your results and/or answer the question being posed) in one or two complete sentences.
Problem 5: Sleep Habits
A poll randomly surveyed 1508 US residents aged 13 – 65 asking about their sleep habits, and, in particular, their use of technology around the time they try to go to sleep. Research shows that people who regularly use their computers in the hour before trying to go to sleep are less likely to report getting a good night’s sleep. The poll found that of the 19-29 years old sampled, 205 out of 293 reported using a computer in the hour before trying to go to sleep. In contrast, of the 30 – 45 year olds sampled, 313 out of 469 reported computer use an hour before trying to sleep.
a)Define the population parameter in context in one sentence.
b)Only check the large sample conditions for this inference. Show your work for the numerical calculations.
c)Calculate the two sample proportions separately and then calculate the estimate of the parameter by subtracting (younger group – older group).
d)Construct the 99% confidence interval in StatCrunch using Proportion Stats Two Sample With Summary.
e)Does your confidence interval capture 0? Answer this question in one sentence and discuss what that means in relation to a two-sided hypothesis test.
Problem 6: Physical Fitness
Data from a physical fitness program was collected on 31 men that were asked to run 1.5 miles. Variables measured include oxygen uptake in ml/min (Oxy), their resting pulse rate before the run in beats per minute (RstPulse), their time to run 1.5 miles in minutes (Runtime), the pulse rate at the end of the run in beats per minute (RunPulse), their maximum pulse rate during the run in beats per minute (MaxPulse), their weight in kg (Weight), and their age in years (Age). The StatCrunch data is called Physical Fitness. Investigate the relationship between the explanatory variable “Runtime” and response variable “Oxy” by doing the following:
a)Make a scatterplot using “Runtime” and “Oxy” and copy and paste it in your solutions (use Graph Scatter Plot in StatCrunch).
b)Calculate the correlation coefficient (use Stat Summary Stats Correlation in StatCrunch). Provide this value in your document.
c)Interpret the scatterplot and correlation coefficient in terms of trend, strength, and shape (form) in one complete sentence.
d)Using the “Runtime” variable as the explanatory variable, run a Simple Linear Regression analysis in StatCrunch. Use Stat Regression Simple Linear. Copy and paste only the StatCrunch results output (no tables).
e)Add the fitted line plot to your document. This graph appears on page 2 of your output.
f)Type the regression equation into your document.
g)Interpret the slope of the regression line (in context of this data set).
h)Is it meaningful to interpret the y-intercept? Why or why not?
i)State r-squared (i.e., the coefficient of determination) and explain what this value means in context of the data set.
j)Use the regression equation from part (f) to predict the Oxygen uptake of a runner who completed the 1.5 mile run in 17 minutes. State your predicted value in a sentence that is in context of the data. Do not forget to mention the units. Note: You can do this calculation “by hand” or using StatCrunch.
k)Is your prediction in part (j) an example of extrapolation? Why or why not?