Stats Qs: Sampling - Location & Hypothesis Testing essay.

Question 1

You wish to study safety standards in commercial fishing companies. You have a list of those fishing companies which have chosen to be members of a national association of commercial fishing companies (not all fishing companies choose to join this organisation). You suspect that large fishing companies will have stricter safety standards than small fishing companies. Your list of fishing companies includes a measure of the size of each company and the company’s address. You decide to select your sample from the list of the national association’s members. You will distribute a questionnaire to your sample by mail.

1.What type of sampling method is most appropriate for this study:

A.Simple random sampling

B.Clustered random sampling

C.Convenience sampling

D.Stratified random sampling

E.Stratified convenience sampling.

2.In this study, what is the list of those fishing companies which have chosen to be members of a national association of commercial fishing companies?

A.It is the target population.

B.It is the study sample.

C.It is the sampling frame.

D.It is the stratification set.

E.It is the clustering set

3.You have selected a specific part of a fishing company for closer study. This is the part of the factory where the fish are processed prior to transport to the fish markets. You have a full list of employees who work in this part of the factory, but they are all on the same pay scale and are listed as having the same job type and the list gives you no other information about them. You will select a sample of these employees from this list. What type of sampling method is most appropriate for this part of the study?

A.Simple random sampling

B.Clustered random sampling

C.Convenience sampling

D.Stratified random sampling

E.Stratified convenience sampling.

6.These data can be best described as:

A.Left skewed

B.Symmetric

C.Right skewed

D.Approximately normal

E.Approximately a z score

Use the following information to answer questions 7 to 9

The National Assessment Program - Literacy and Numeracy (NAPLAN) tests are an annual assessment for all Australian students in Years 3, 5, 7 and 9. The tests cover skills in reading, writing, spelling, grammar and punctuation, and numeracy. The assessments are undertaken every year in the second full week in May.

7.Which of the following statements best applies to the NAPLAN scores presented in figure 1?

A.The NAPLAN scores are left (negatively) skewed

B.The NAPLAN scores are approximately symmetric

C.The NAPLAN scores are right (positively) skewed

D.The large sample size means that these data do not have a skew.

8.What is the best measure of location for these data?

A.The mean

B.The median

C.The mode

D.The upper quartile

E.The lower quartile

9.For these data which statement best described the relationship between measures of location.

A.The mean is larger than the median.

B.The mean is approximately equal to the median.

C.The mean is smaller than the median.

D.We can’t tell from the graph whether the mean is greater than, equal to or less than the median.

E.The data are skewed, so we can’t calculate a mean.

Question 2

10.Which of the following statements best describes simple random sampling?

A.Each member of the population has an equal chance of being chosen.

B.Each possible sample of a given size has an equal chance of being chosen.

C.Adjacent members of the population must not be chosen.

D.Likely errors cannot be estimated.

11.Suppose you have a sample of birthweights of 80 new born babies (in kg). Which of the following best describes an appropriate way to present these data?

A.We should plot the data in a histogram.

B.We should plot the data in a pie chart.

C.We should plot the data in a bar chart.

D.We should present the data in a table.

E.We should plot the data in a qnorm plot.

12.You have two samples of women. The 95% confidence interval for the mean weight (in kg) in the first group is (65, 89) and for the second group is (69, 92).Which of the following statements best describes the relationship between the mean heights in the two population groups?

A.The mean weight for the first population group is the same as the mean weight for the second population group.

B.The mean weight for the first population group is probably the same as the mean weight for the second population group.

C.The range of plausible values for the means of the two population groups overlap.

D.The mean of the first population group lies in the range 65kg to 89kg.

E.Both B. and C. are true.

13.Suppose we calculate an estimate of the mean overall mark for HSH746 for a sample of students in 2018. Which of the following statements best applies to our estimate of the mark.

A.The sampling distribution is approximately a normal distribution if the sample is large enough.

B.A confidence interval for this estimate will only contain the true population mean if the sample is large enough.

C.We can only calculate a confidence interval for this estimate if the sample size is larger than 30.

D.The standard deviation of the sample of marks is a measure of the accuracy of our sample mean.

E.Both A. and B. are true.

14.Which of the following statements best describes what we are attempting to do when we perform a hypothesis test.

A.We are attempting to prove the null hypothesis.

B.We are attempting to disprove the null hypothesis.

C.We are looking to see if our data provide evidence for the null hypothesis.

D.We are looking to see if our data provide evidence against the null hypothesis.

E.None of the above because our test is focussed on the alternative hypothesis.

15.You conduct a small trial of a new treatment for lung cancer. The mortality in the group with the new treatment was half that in the group with the current standard treatment. However, the difference was not statistically significant. Which of the following statements best describes your conclusion from this trial.

A.The new treatment is no better than the old treatment.

B.Further study of the new treatment would be unethical as it has been shown to be no better than the old treatment.

Question 3

C.The reduction in mortality is so great that we can overlook the lack of statistical significance and conclude that the new treatment is better than the old treatment.

D.The reduction in mortality is so great that we can overlook the lack of statistical significance and focus our future research on the new treatment.

E.We should carry out a new trial with a greater sample size before we can make reliable conclusions about comparisons between the old and new treatments.

16.You are conducting a study of the relationship between a person being a smoker and being a regular coffee drinker. You have a sample of 500 people and you calculate a contingency table for coffee drinking and smoking. The results are presented below:

You wish to know whether or not you can run a chi square test on these data to assess the relationship between coffee drinking and smoking.

Which of the following statements best describes whether or not you can run a chi square test?

A.You can NOT run a chi square test because there is one table cell with a count of 0.

B.You can NOT run a chi square test because smoking and coffee drinking are likely to be correlated and the chi square test requires statistical independence.

C.You can NOT run a chi square test because not all the table cells have counts greater than 5.

D.You CAN run a chi square test because all the table cells have expected counts greater than 5.

E.Both A. and C. are true.

17.The following graph is a scatterplot for two variables y and x. The estimated correlation between y and x is 0.86.

Which of the following statements best describes the relationship between the variables y and x.

A.There is a strong, positive linear relationship between y and x.

B.There is a weak positive linear relationship between y and x.

C.There is a weak negative linear relationship between y and x.

D.There is a strong, negative linear relationship between y and x.

E.None of the above.

18.Suppose we have selected a sample from a population using simple random sampling.

Which of the following statements most applies to our sample.

A.The standard error is an estimate of the true population standard deviation.

B.The sample mean is an estimate of the true population mean.

C.The mean of the sampling distribution is an estimate of the true population mean.

D.None of the above are true unless the sample size is greater than 30.

E.Both B. and C. are true.

Use the following information to answer questions 19 and 20.

19.You have set a target of 50% for the proportion of children in a school to be vaccinated. You take a sample of 50 children and identify which of the children in your sample is vaccinated – so the variable vaccine is set to 1 for those children who are vaccinated and set to 0 for those children who are not vaccinated. The following is the output of a Stata prtest procedure run on your sample.

Which of the following statements best describes your conclusions from the results of this prtest.

A.You have not yet met your target because the proportion of children in your sample who are vaccinated is 46%, which is less than the target value of 50%.

B.You have met your target because the 95% confidence interval contains 50%.

C.You have not yet met your target because the p value is 0.5716, so your result is not statistically significant.

D.You have met your target because the z value is negative (-0.5657).

E.The data do not show that you have failed to meet your target because the 95% confidence interval contains 50%.

20.A hypothesis test comparing the mean birthweight for babies between a group of mothers who smoke and a group of mothers who do not smoke gives rise to a t statistic of 2.61 with 20 degrees of freedom giving rise to a p value of 0.008. Which of the following statements best describes your conclusion from this test.

A.We have proved that the means of the two groups are the same.

B.We have proved that the means of the two groups are different.

C.We accept the null hypothesis that the means of the two groups are the same.

D.We reject the null hypothesis that the means of the two groups are the same.

E.The p values is too small for us to draw any conclusion from this test.

Question 1

What type of sampling method is most appropriate for this study:

1. Simple random sampling
2. Clustered random sampling
3. Convenience sampling
4. Stratified random sampling
5. Stratified convenience sampling.

In this study, what is the list of those fishing companies which have chosen to be members of a national association of commercial fishing companies?

1. It is the target population.
2. It is the study sample.
3. It is the sampling frame.
4. It is the stratification set.
5. It is the clustering set

You have selected a specific part of a fishing company for closer study. This is the part of the factory where the fish are processed prior to transport to the fish markets. You have a full list of employees who work in this part of the factory, but they are all on the same pay scale and are listed as having the same job type and the list gives you no other information about them. You will select a sample of these employees from this list. What type of sampling method is most appropriate for this part of the study?

Simple random sampling
Clustered random sampling
Convenience sampling
Stratified random sampling
Stratified convenience sampling.

Use the following information to answer questions 4 to 6

You have the following sample of five numbers

The meanof this sample to 1 decimal place is
1. 0
2. 2
3. 0
4. 4
5. 1

The medianof this sample to 1 decimal place is

Use the following information to answer questions 7 to 9

The figure below shows the results of the NAPLAN test for 2015.

Which of the following statements best applies to the NAPLAN scores presented in figure 1?

1. The NAPLAN scores are left (negatively) skewed
2. The NAPLAN scores are approximately symmetric
3. The NAPLAN scores are right (positively) skewed
4. The large sample size means that these data do not have a skew.
5. Both B. and D. are true

What is the best measure of locationfor these data?

1. The mean
2. The median
3. The mode
4. The upper quartile
5. The lower quartile

For these data which statement best described the relationship between measures of location.

1. The mean is larger than the median.
2. The mean is approximately equal to the median.
3. The mean is smaller than the median.
4. We can’t tell from the graph whether the mean is greater than, equal to or less than the median.
5. The data are skewed, so we can’t calculate a mean.

Which of the following statements best describes simple random sampling?

1. Each member of the population has an equal chance of being chosen.
2. Each possible sample of a given size has an equal chance of being chosen.
3. Adjacent members of the population must not be chosen.
4. Likely errors cannot be estimated.
5. Both A. and B.

Suppose you have a sample of birthweights of 80 new born babies (in kg). Which of the following best describes an appropriate way to present these data?

1. We should plot the data in a histogram.
2. We should plot the data in a pie chart.
3. We should plot the data in a bar chart.
4. We should present the data in a table.
5. We should plot the data in a qnorm plot.

You have two samples of women. The 95% confidence interval for the mean weight (in kg) in the first group is (65, 89) and for the second group is (69, 92).Which of the following statements best describes the relationship between the mean heights in the two population groups?

1. The mean weight for the first population group is the same as the mean weight for the second population group.
2. The mean weight for the first population group is probably the same as the mean weight for the second population group.
3. The range of plausible values for the means of the two population groups overlap.
4. The mean of the first population group lies in the range 65kg to 89kg.
5. Both B. and C. are true.

Suppose we calculate an estimate of the mean overall mark for HSH746 for a sample of students in 2018. Which of the following statements best applies to our estimate of the mark.

1. The sampling distribution is approximately a normal distribution if the sample is large enough.
2. A confidence interval for this estimate will only contain the true population mean if the sample is large enough.
3. We can only calculate a confidence interval for this estimate if the sample size is larger than 30.
4. The standard deviation of the sample of marks is a measure of the accuracy of our sample mean.
5. Both A. and B. are true.

Which of the following statements best describes what we are attempting to do when we perform a hypothesis test.

1. We are attempting to prove the null hypothesis.
2. We are attempting to disprove the null hypothesis.
3. We are looking to see if our data provide evidence for the null hypothesis.
4. We are looking to see if our data provide evidence against the null hypothesis.
5. None of the above because our test is focussed on the alternative hypothesis.

Question 2

You conduct a small trial of a new treatment for lung cancer. The mortality in the group with the new treatment was half that in the group with the current standard treatment. However, the difference was not statistically significant. Which of the following statements best describes your conclusion from this trial.

1. The new treatment is no better than the old treatment.
2. Further study of the new treatment would be unethical as it has been shown to be no better than the old treatment.
3. The reduction in mortality is so great that we can overlook the lack of statistical significance and conclude that the new treatment is better than the old treatment.
4. The reduction in mortality is so great that we can overlook the lack of statistical significance and focus our future research on the new treatment.
5. We should carry out a new trial with a greater sample size before we can make reliable conclusions about comparisons between the old and new treatments.

You are conducting a study of the relationship between a person being a smoker and being a regular coffee drinker. You have a sample of 500 people and you calculate a contingency table for coffee drinking and smoking. The results are presented below:

You wish to know whether or not you can run a chi square test on these data to assess the relationship between coffee drinking and smoking.

Which of the following statements best describes whether or not you can run a chi square test?

You can NOT run a chi square test because there is one table cell with a count of 0.
You can NOT run a chi square test because smoking and coffee drinking are likely to be correlated and the chi square test requires statistical independence.
You can NOT run a chi square test because not all the table cells have counts greater than 5.
You CAN run a chi square test because all the table cells have expected counts greater than 5.
Both A. and C. are true.
The following graph is a scatterplot for two variables y and x. The estimated correlation between y and x is 0.86.

Which of the following statements best describes the relationship between the variables y and x.

There is a strong, positive linear relationship between y and x.
There is a weak positive linear relationship between y and x.
There is a weak negative linear relationship between y and x.
There is a strong, negative linear relationship between y and x.
None of the above.

Suppose we have selected a sample from a population using simple random sampling.

Which of the following statements most applies to our sample.

The standard error is an estimate of the true population standard deviation.
The sample mean is an estimate of the true population mean.
The mean of the sampling distribution is an estimate of the true population mean.
None of the above are true unless the sample size is greater than 30.
Both B. and C. are true.

Use the following information to answer questions 19 and 20.

You have set a target of 50% for the proportion of children in a school to be vaccinated. You take a sample of 50 children and identify which of the children in your sample is vaccinated – so the variable vaccine is set to 1 for those children who are vaccinated and set to 0 for those children who are not vaccinated. The following is the output of a Stata prtest procedure run on your sample.

Which of the following statements best describes your conclusions from the results of this prtest.

You have not yet met your target because the proportion of children in your sample who are vaccinated is 46%, which is less than the target value of 50%.
You have met your target because the 95% confidence interval contains 50%.
You have not yet met your target because the p value is 0.5716, so your result is not statistically significant.
You have met your target because the z value is negative (-0.5657).
The data do not show that you have failed to meet your target because the 95% confidence interval contains 50%.
A hypothesis test comparing the mean birthweight for babies between a group of mothers who smoke and a group of mothers who do not smoke gives rise to a t statistic of 2.61 with 20 degrees of freedom giving rise to a p value of 0.008. Which of the following statements best describes your conclusion from this test.
1. We have proved that the means of the two groups are the same.
2. We have proved that the means of the two groups are different.
3. We accept the null hypothesis that the means of the two groups are the same.
4. We reject the null hypothesis that the means of the two groups are the same.
5. The p values is too small for us to draw any conclusion from this test.

Please answer the following questions using the data set provided with this assessment.

You are conducting a study of systolic blood pressure. You have a data set for 500 people with a measure of systolic blood pressure before and after an intervention. You have also collected information on your study subjects’ coffee drinking and cigarette smoking. The description of the variables on this data set are in the word document:

assessmentsectionB data description

Subject’s systolic blood pressure was measured using a mercury sphygmomanometer with the subject in a seated position and the arm flexed. Cigarettes smoked per day and habitual number of coffee cups consumed per day were self-reported. The average caffeine content of a cup of coffee (40 ml) is approximately 100mg. The mean number of coffee cups consumed per day was 5.3 (SD 3.2) and the mean number of cigarettes smoked per day was 6.5 (SD 5.8).

The data were collected in such a way as to ensure that they are statistically independent.

You are interested in whether or not the blood pressure lowering intervention had a measurable effect on mean systolic blood pressure. You are also interested in a possible effect of coffee drinking on blood pressure.

The data are in the comma separated file sysbp_study.csv

These are synthetic data, but you may refer to the source of these data as
‘supplementary assessment: sysbp study’.

Question 3

Conduct a test of whether or not blood pressure changed from before to after the intervention based on the mean systolic blood pressure. You should report and interpret the effect size and confidence interval as well as hypothesis test results.

As part of your answer, you should state

The hypothesis that you are testing
What test you will use and why you can use it.
The effect size and its 95% confidence interval
Any inference you can draw from this confidence interval.
The full results of the hypothesis test
Whether or not you reject the null hypothesis and at what level.

Hypothesis

The hypothesis to be tested in this case is;

H₀: The mean systolic blood pressure is the same for before and after intervention

H_A: The mean systolic blood pressure is different for before and after intervention

This will be tested at 5% level of significance

Test statistics

The test statistics to be used is the paired t-test since the same individuals are tested before and after.

The above results shows that the mean systolic blood pressure before the intervention was 140.58 while the mean after the intervention was 135.55. The p-value was obtained to be 0.000 (a value less than 5% level of significance), we therefore reject the null hypothesis and conclude that the mean systolic blood pressure is significantly different for before and after intervention.

The 95% confidence interval also shows that the mean difference is between 4.86 and 5.20. This confidence interval does not contain zero thus implying that the mean difference is statistically different from zero hence there mean difference is significant.

Effect size

The Cohen’s d is 2.128 implying that the effect size of the difference in mean is very large

Decision and conclusion

Since we reject the null hypothesis we conclude that the mean systolic blood pressure is significantly different for before and after intervention. The effect size further confirms that the mean difference has a large effect size. We therefore conclude that the intervention is effective.

Using the risk ratio, test whether the risk of high blood pressure before the intervention differs between people who do drink coffee and people who don’t drink coffee. You should report and interpret the effect size and confidence interval as well as hypothesis test results. (6 marks)

As part of your answer, you should state

The hypothesis that you are testing
The effect size and its 95% confidence interval
Any inference you can draw from this confidence interval.
The full results of the hypothesis test
Whether or not you reject the null hypothesis and at what level.

The hypothesis being tested is;

H₀: Risk of high blood pressure before the intervention is the same for people who do drink coffee and people who don’t drink coffee.

H_A: Risk of high blood pressure before the intervention differs between people who do drink coffee and people who don’t drink coffee.

The results are provided below;

The 95% confidence interval for the difference in risk ratio for people who drink and those who do not is between 1.4123 and 6.8730. The interval does not contain zero implying that the difference in risk between the two groups is significantly different from zero.

The risk ratio is 3.1156; this shows that the risk of high blood pressure before the intervention is high among the people who do drink coffee. The p-value is 0.000 (a value less than 5% level of significance), we therefore reject the null hypothesis and conclude that the risk of high blood pressure before the intervention significantly differs between people who do drink coffee and people who don’t drink coffee. Those who drink coffee have higher risk of blood pressure as compared to those who do not drink coffee.

The effect size from the Yule’s Q is 0.7252 implying a strong effect size for the risk ratio

Do the data suggest that coffee consumption would be a suitable predictor variable for systolic blood pressure before the intervention? State you reason(s) for your answer. (2 marks)

Yes the data suggests that coffee consumption would be a suitable predictor variable for systolic blood pressure before the intervention this is based on the fact that we established that those who take coffee have a significantly higher risk of blood pressure before the intervention as compared to those who do not take coffee.

Calculate the regression with coffee consumption (in cups) predicting systolic blood pressure before the intervention. What do you observe? What do you conclude?

From the above results, we observe that coffee consumption (in cups) is significant in predicting the systolic blood pressure before the intervention (p < 0.05). We further see that the value of R-Squared is 0.3762; implying that 37.62% of the variation in the predicting systolic blood pressure before the intervention is explained by coffee consumption (in cups).

The coefficient of coffee consumption (in cups) is 0.5236; this suggests that a unit increase in coffee consumption (in cups) would result to an increase in the systolic blood pressure before the intervention by 0.5236.

We conclude that coffee consumption (in cups) is an important and significant predictor of systolic blood pressure before the intervention.

Add number of cigarettes per day to the regression equation. What do you observe? What do you conclude?

When we added number of cigarettes per day to the regression equation we observed an increase in the value of adjusted R-squared from 0.3750 to 0.4690; this means that with two predictor variables, the proportion of variation in the dependent variable (systolic blood pressure before the intervention) is now 46.90% from the former 37.50%. We also observe that the two predictor variables are significant in the model (p < 0.05). However, surprisingly, we now observe that the coefficient of coffee consumption (in cups) is now negative (-0.08) implying that there is now negative relationship between coffee consumption (in cups) and systolic blood pressure before the intervention. So a unit increase in coffee consumption would result to a decrease in the systolic blood pressure before the intervention.

The coefficient of cigarettes per day is 0.3664; this implies that a unit increase in cigarettes per day would result to an increase in systolic blood pressure before the intervention by 0.3664.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Statistics Practice Questions: Sampling Methods, Measures Of Location, Hypothesis Testing Essay.. Retrieved from https://myassignmenthelp.com/free-samples/hsh746-biostatistics-1/systolic-blood-pressure-the-intervention.html.

"Statistics Practice Questions: Sampling Methods, Measures Of Location, Hypothesis Testing Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/hsh746-biostatistics-1/systolic-blood-pressure-the-intervention.html.

My Assignment Help (2021) Statistics Practice Questions: Sampling Methods, Measures Of Location, Hypothesis Testing Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/hsh746-biostatistics-1/systolic-blood-pressure-the-intervention.html
[Accessed 29 March 2025].

My Assignment Help. 'Statistics Practice Questions: Sampling Methods, Measures Of Location, Hypothesis Testing Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/hsh746-biostatistics-1/systolic-blood-pressure-the-intervention.html> accessed 29 March 2025.

My Assignment Help. Statistics Practice Questions: Sampling Methods, Measures Of Location, Hypothesis Testing Essay. [Internet]. My Assignment Help. 2021 [cited 29 March 2025]. Available from: https://myassignmenthelp.com/free-samples/hsh746-biostatistics-1/systolic-blood-pressure-the-intervention.html.