Part 1: Proportion of Females and Confidence Interval
Please answer each question in the template document provided and submit via Turnitin on or before the due date. The marks allocated to each question are shown in the assignment. A total of 30 marks are available and this assignment is worth 30% of your overall grade.Some of the questions in this assignment ask you to analyse the data set assigned to you for assignments. This is the same data set which you used for Assignment 1. Read ‘Description of your data set.docx’ for the descriptions of the variables.
Question 1
Note: Each student will get different answers as the data sets differ.Use the assignment data set assigned to you: Variables to analyse: ‘sex’
a. Calculate the point estimate and 95% confidence interval for the proportion of females in the population NSW 17yearolds using the random sample of NSW 17yearolds assigned to you.
b. Carefully write in words, what the confidence interval in part a. is telling us.
c. Are the results in part b. consistent with the statement: “50% of 17yearolds in NSW are female”? Explain why or why not.
Question 2 Each student will get different answers as the data sets differ.Research Question: Is average selfreported hours of moderate to vigorous physical activity (MVPA) per week equal between males and females in the population of NSW 17yearolds?
Use the assignment data set assigned to you: Variables to analyse: ‘MVPA’ and ‘sex’a. Use appropriate charts and/or statistics to describe the shape of the distribution of selfreported hours of MVPA per week for 17yearold males and females in the
sample.
b. Use an appropriate nonparametric test and R Commander to test the hypothesis that the average selfreported hours of MVPA per week is equal between males and females in the population of NSW 17yearolds. Use R Commander for all calculations but write your answers according to the 5step method.
Question 3
A researcher is questioning whether or not the introduction of new laws intended to limit the emission of polycyclic aromatic hydrocarbons (PAH) in the gasses emitted from aluminium smelting plants have been effective. She has compiled emission measures for a random sample of six aluminium smelters. For each smelter she has recorded emissions at one year before and two years after the introduction of the new legislation. The PAH concentrations are continuous variables. Results are shown in the following table.Smelter PAH concentration prior to introduction of new lawsPAH concentration after introduction of new laws
A 103 21
B 27 19
C 407 320 D 221 47
E 7,230 550
F 339 28
a. The researcher wishes to test her hypothesis that the concentration of PAH in gaseous emissions from aluminium smelters have decreased since the introduction of the new laws. Is this a onesided or a twosided hypothesis test? Explain why.
b. Name an appropriate statistical test to address this hypothesis (that the concentration of PAH in gaseous emissions from aluminium smelters had decreased since the introduction of the new laws). Justify your choice of test. DO NOT perform any analysis.
Question 4 Each student will get different answers as the data sets differ.Research question: Does mode of driver’s licence status differ by gender in the population of NSW 17yearolds?
Use the assignment data set assigned to you: Variables to analyse: ‘licence’ and ‘sex’a. Show the relationship between driver’s licence status and gender in the sample of NSW 17yearolds using a twoway contingency table. Include either row or column percentages. Type and label the table yourself: an R Commander screenshot will notbe accepted.
b. Looking at the results in part a) only, is there any evidence of association between gender and licence status in this sample of NSW 17yearolds? Explain why or why not.
c. Are the requirements for a Chisquare test met? Explain why.
d. Irrespective of your answer in part c) address the research question using a Chisquare test on the provided data. Please use R Commander but format your answer according to the 5 step method.
Question 5
a. Give one reason why different research studies require different sample sizes. Why not use the same sample size for every research study?
b. Dr Smith asks you to estimate the minimum sample size required to detect a difference of 0.5 hour in mean selfreported sedentary hours per week between 17yearold NSW boys and girls with ???????? = 0.05 and power=0.90 (???????? = 0.10). (He suggests, this 0.5 hour difference could, for example, be a mean of 9.5 hours compared to a mean of 10 hours.) He is confident from his previous reading that the population standard deviation is ???????? = 3.0 and he wishes to use equal group sizes for maximum efficiency. Estimate the minimum sample size required for Dr Smith’s study. Present your answer to Dr Smith as a sentence which summarises the required sample size to achieve what power subject to what conditions.
c. Suppose despite the answer in part b. Dr Smith decided to run his study with a sample size of n=20 per group (n=40 in total). What impact would this have on the project’s ability to answer the research question?
Question 1
 Calculate the Point estimate and the 95% confidence interval for proportion of females in the population NSW 17yearolds using a random sample of NSW 17yearolds assigned.
 Point estimate for proportion of females in the population
Based on the sample dataset given, the number of males are 97 while that of females are 98. The point estimate for proportion (p) of females is given by; = 0.50264
 The 95% confidence interval for proportion of female in the population
The 95 % where q = (1p) and = 1.96 (estimated from the standard normal)
< Proportion (p) <
 What the confidence interval obtained in part a means (2marks).
From part (a) the proportion of girls to boys is 0.50264. The confidence interval tell us that when we are 95% confident with the data we have, then the lower limit of girls’ proportion is 0.4324 while the highest limit the proportion of girls is 0.5728. Hence the point estimate falls within the confidence interval obtained in part (a) above
 The result in part (a) is consistent with the statement; “50% of 17 yearold in NSW are females since the proportion is 0.50264.
Question 2
 The appropriate chart to show the distribution of the selfreported hours of MVPA is the histogram. Figures 1 and 2 below are the histograms plotted in R that shows the distribution. The table 1 below shows the number of hours of MVPA per sex.
MALE 
16 
18 
23 
26 
30 
30 
30 
30 
30 
35 
FEMALE 
14 
18 
20 
22 
25 
25 
25 
25 
25 
30 
R codes used plotting the histogram above
MALES=c(16,18,23,26,30,30,30,30,30,35)
> hist (MALE,col="darkmagenta",border="red")
> hist (MALE,col="darkmagenta",border="white")
R codes;
> FEMALES=c(14,18,20,22,25,25,25,25,25,30)
> hist (FEMALE,col="blue",border="red")
Description of the histograms: The histograms above shows the distribution of the number of hour for MVPA on each gender. Based on the histograms above, it is evident that the most frequent number of hours on males is 30 (frequency of 6) while that of females is 25 hours per week.
 Hypothesis Testing
The appropriate nonparametric test to be applied in this case is Wilcoxon Signed rank test since we want to compare two related samples on a single sample to assess whether their population mean ranks differ
Step 1: Stating and Formulation of the null and alternative hypotheses
The average selfreported hours of moderate to vigorous physical activity (MVPA) per week is equal between males and females in the population of NSW 17yearolds
The average selfreported hours of moderate to vigorous physical activity (MVPA) per week is not equal between males and females in the population of NSW 17yearolds
The level of significance level, α = 0.05
Step 2: Selection of an appropriate test statistic
To make use of the available data on the size of the effect we shall apply Wilcoxon Signed rank Test. The test statistics W is the smaller of the sum of the positive ranks and the sum of the negative ranks.
Step 3: Components of the calculations
Male 
16 
18 
23 
26 
30 
30 
30 
30 
30 
35 

Female 
14 
18 
20 
22 
25 
25 
25 
25 
25 
30 
By testing the hypothesis in R, the following is the result from output window;
Step 3: Calculations (R codes/output codes)
> Male<c(16,18,23,26,30,30,30,30,30,35)
> Female<c(14,18,20,22,25,25,25,25,25,30)
> wilcox.test(Male,Female,alternative="two.sided")
Wilcoxon rank sum test with continuity correction
data: Male and Female
W = 73, pvalue = 0.08224
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(Male, Female, alternative = "two.sided") :
cannot compute exact pvalue with ties.
Step 4: Decision on the hypothesis
Since our Pvalue obtained is 0.082, at 95 % confidence level, we fail to reject the null hypothesis
Step 5: Conclusion
We then conclude that the average selfreported hours of moderate to vigorous physical activity (MVPA) per week equal between males and females in the population of NSW 17yearolds.
Question 3
 This is a onesided hypothesis test. This is because the researcher is interested in knowing whether the emissions from aluminum smelters has decreased since the introduction of the new laws.
 The appropriate statistical test to address this hypothesis is Wilcoxon signrank test. This is because we want to compare two related samples on a single sample to assess whether their population mean ranks differ and thus Wilcoxon signrank test is applicable in the case.
Question 4
 The following is a contingency table between gender and license status.
LICENSE STATUS 

GENDER 
Valid 
Revoked 
Suspended 
Total 

Male 
49 
32 
16 
97 

Female 
50 
33 
15 
98 

Total 
99 
65 
31 
= 195 
By using R command in testing the hypothesis, the output codes are as shown below;
R output
> Male<c(49,32,16)
> Female<c(50,33,15)
> gender.survey<data.frame(rbind(Male,Female))
> names(gender.survey)<c('valid','revoked','suspended')
> chisq.test(gender.survey)
Pearson's Chisquared test
data: gender.survey
Xsquared = 0.052617, df = 2, pvalue = 0.974
 There no evidence of association between gender and license status in this sampleof NSW 17yearolds. This is because our pvalue is 0.974 which is higher than 0.05 hence failing to reject the null hypothesis concluding that mode of transport don’t differ by gender in the population of NSW 17yearolds.
 The requirements for a ChiSquare test are met since the sample is more than 45 observations.
Step 1: Setting up the hypotheses
And pvale
Step 2: Selection of appropriate test statistics
To make use of the available data on the size of the effect we shall apply ChiSquare Test.
Step 3: Decision on the hypothesis
The null hypothesis will be rejected if the computed Pvalue is less than 0.05
Step 4: Computation of the test statistics in R
> Male<c(49,32,16)
> Female<c(50,33,15)
> gender.survey<data.frame(rbind(Male,Female))
> names(gender.survey)<c('valid','revoked','suspended')
> chisq.test(gender.survey)
Pearson's Chisquared test
data: gender.survey
Xsquared = 0.052617, df = 2, pvalue = 0.974
Step 5: Conclusion
Our pvalue obtained in this case is 0.974 that is higher than 0.05. Hence the null hypothesis is accepted and we can conclude that mode of transport don’t differ by gender in the population of NSW 17yearolds. It implies that there no evidence of association between gender and license status in this sample of NSW 17yearolds.
Question 5
 Different researches require different sample sizes since each and every research have different aims and objectives making them to have different target group during the study.
 Using the Online calculator the following five steps are applied;
Step 1: The required margin of error is E = 0.05
Step 2: The estimated standard deviation of the difference is δ = 3.0
Step 3: To produce 95% confidence, we use ? = 1.96
Step 4: Therefore the minimum required sample size is n = ( ) ² = 138.29 which is approximately = 138
Step 5: Hence the required sample size to achieve the power subject to the condition given is 138.
 The sample size of 40 is relative a smaller sample size to be used during the study. The sample size will lead to a bigger margin of error and will also lower the confidence interval hence making the data to be biased.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay.. Retrieved from https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/alternativehypotheses.html.
"Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/alternativehypotheses.html.
My Assignment Help (2021) Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay. [Online]. Available from: https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/alternativehypotheses.html
[Accessed 21 June 2024].
My Assignment Help. 'Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/alternativehypotheses.html> accessed 21 June 2024.
My Assignment Help. Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay. [Internet]. My Assignment Help. 2021 [cited 21 June 2024]. Available from: https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/alternativehypotheses.html.