Stats essay: proportion - distribution

Part 1: Proportion of Females and Confidence Interval

Please answer each question in the template document provided and submit via Turnitin on or before the due date. The marks allocated to each question are shown in the assignment. A total of 30 marks are available and this assignment is worth 30% of your overall grade.Some of the questions in this assignment ask you to analyse the data set assigned to you for assignments. This is the same data set which you used for Assignment 1. Read ‘Description of your data set.docx’ for the descriptions of the variables.

Question 1

Note: Each student will get different answers as the data sets differ.Use the assignment data set assigned to you: Variables to analyse: ‘sex’
a. Calculate the point estimate and 95% confidence interval for the proportion of females in the population NSW 17-year-olds using the random sample of NSW 17-year-olds assigned to you.
b. Carefully write in words, what the confidence interval in part a. is telling us.
c. Are the results in part b. consistent with the statement: “50% of 17-year-olds in NSW are female”? Explain why or why not.
Question 2 Each student will get different answers as the data sets differ.Research Question: Is average self-reported hours of moderate to vigorous physical activity (MVPA) per week equal between males and females in the population of NSW 17-year-olds?
Use the assignment data set assigned to you: Variables to analyse: ‘MVPA’ and ‘sex’a. Use appropriate charts and/or statistics to describe the shape of the distribution of self-reported hours of MVPA per week for 17-year-old males and females in the
sample.
b. Use an appropriate non-parametric test and R Commander to test the hypothesis that the average self-reported hours of MVPA per week is equal between males and females in the population of NSW 17-year-olds. Use R Commander for all calculations but write your answers according to the 5-step method.
Question 3
A researcher is questioning whether or not the introduction of new laws intended to limit the emission of polycyclic aromatic hydrocarbons (PAH) in the gasses emitted from aluminium smelting plants have been effective. She has compiled emission measures for a random sample of six aluminium smelters. For each smelter she has recorded emissions at one year before and two years after the introduction of the new legislation. The PAH concentrations are continuous variables. Results are shown in the following table.Smelter PAH concentration prior to introduction of new lawsPAH concentration after introduction of new laws
A 103 21
B 27 19
C 407 320 D 221 47
E 7,230 550
F 339 28
a. The researcher wishes to test her hypothesis that the concentration of PAH in gaseous emissions from aluminium smelters have decreased since the introduction of the new laws. Is this a one-sided or a two-sided hypothesis test? Explain why.
b. Name an appropriate statistical test to address this hypothesis (that the concentration of PAH in gaseous emissions from aluminium smelters had decreased since the introduction of the new laws). Justify your choice of test. DO NOT perform any analysis.
Question 4 Each student will get different answers as the data sets differ.Research question: Does mode of driver’s licence status differ by gender in the population of NSW 17-year-olds?
Use the assignment data set assigned to you: Variables to analyse: ‘licence’ and ‘sex’a. Show the relationship between driver’s licence status and gender in the sample of NSW 17-year-olds using a two-way contingency table. Include either row or column percentages. Type and label the table yourself: an R Commander screenshot will notbe accepted.
b. Looking at the results in part a) only, is there any evidence of association between gender and licence status in this sample of NSW 17-year-olds? Explain why or why not.
c. Are the requirements for a Chi-square test met? Explain why.
d. Irrespective of your answer in part c) address the research question using a Chisquare test on the provided data. Please use R Commander but format your answer according to the 5 step method.
Question 5
a. Give one reason why different research studies require different sample sizes. Why not use the same sample size for every research study?
b. Dr Smith asks you to estimate the minimum sample size required to detect a difference of 0.5 hour in mean self-reported sedentary hours per week between 17-year-old NSW boys and girls with ???????? = 0.05 and power=0.90 (???????? = 0.10). (He suggests, this 0.5 hour difference could, for example, be a mean of 9.5 hours compared to a mean of 10 hours.) He is confident from his previous reading that the population standard deviation is ???????? = 3.0 and he wishes to use equal group sizes for maximum efficiency. Estimate the minimum sample size required for Dr Smith’s study. Present your answer to Dr Smith as a sentence which summarises the required sample size to achieve what power subject to what conditions.
c. Suppose despite the answer in part b. Dr Smith decided to run his study with a sample size of n=20 per group (n=40 in total). What impact would this have on the project’s ability to answer the research question?

Part 1: Proportion of Females and Confidence Interval

Question 1

Calculate the Point estimate and the 95% confidence interval for proportion of females in the population NSW 17-year-olds using a random sample of NSW 17-year-olds assigned.

Point estimate for proportion of females in the population

Based on the sample dataset given, the number of males are 97 while that of females are 98. The point estimate for proportion (p) of females is given by; = 0.50264

The 95% confidence interval for proportion of female in the population

The 95 % where q = (1-p) and = 1.96 (estimated from the standard normal)

< Proportion (p) <

What the confidence interval obtained in part a means (2marks).

From part (a) the proportion of girls to boys is 0.50264. The confidence interval tell us that when we are 95% confident with the data we have, then the lower limit of girls’ proportion is 0.4324 while the highest limit the proportion of girls is 0.5728. Hence the point estimate falls within the confidence interval obtained in part (a) above

The result in part (a) is consistent with the statement; “50% of 17 year-old in NSW are females since the proportion is 0.50264.

Question 2

The appropriate chart to show the distribution of the self-reported hours of MVPA is the histogram. Figures 1 and 2 below are the histograms plotted in R that shows the distribution. The table 1 below shows the number of hours of MVPA per sex.

MALE	16	18	23	26	30	30	30	30	30	35
FEMALE	14	18	20	22	25	25	25	25	25	30

R codes used plotting the histogram above

MALES=c(16,18,23,26,30,30,30,30,30,35)

> hist (MALE,col="darkmagenta",border="red")

> hist (MALE,col="darkmagenta",border="white")

R codes;

> FEMALES=c(14,18,20,22,25,25,25,25,25,30)

> hist (FEMALE,col="blue",border="red")

Description of the histograms: The histograms above shows the distribution of the number of hour for MVPA on each gender. Based on the histograms above, it is evident that the most frequent number of hours on males is 30 (frequency of 6) while that of females is 25 hours per week.

Hypothesis Testing

The appropriate non-parametric test to be applied in this case is Wilcoxon Signed rank test since we want to compare two related samples on a single sample to assess whether their population mean ranks differ

Step 1: Stating and Formulation of the null and alternative hypotheses

The average self-reported hours of moderate to vigorous physical activity (MVPA) per week is equal between males and females in the population of NSW 17-year-olds

The average self-reported hours of moderate to vigorous physical activity (MVPA) per week is not equal between males and females in the population of NSW 17-year-olds

The level of significance level, α = 0.05

Step 2: Selection of an appropriate test statistic

To make use of the available data on the size of the effect we shall apply Wilcoxon Signed rank Test. The test statistics W is the smaller of the sum of the positive ranks and the sum of the negative ranks.

Step 3: Components of the calculations

Male		16	18	23	26	30	30	30	30	30	35
Female		14	18	20	22	25	25	25	25	25	30

By testing the hypothesis in R, the following is the result from output window;

Step 3: Calculations (R codes/output codes)

> Male<-c(16,18,23,26,30,30,30,30,30,35)

> Female<-c(14,18,20,22,25,25,25,25,25,30)

> wilcox.test(Male,Female,alternative="two.sided")

Wilcoxon rank sum test with continuity correction

data: Male and Female

W = 73, p-value = 0.08224

alternative hypothesis: true location shift is not equal to 0

Warning message:

In wilcox.test.default(Male, Female, alternative = "two.sided") :

cannot compute exact p-value with ties.

Step 4: Decision on the hypothesis

Since our P-value obtained is 0.082, at 95 % confidence level, we fail to reject the null hypothesis

Step 5: Conclusion

We then conclude that the average self-reported hours of moderate to vigorous physical activity (MVPA) per week equal between males and females in the population of NSW 17-year-olds.

Question 3

This is a one-sided hypothesis test. This is because the researcher is interested in knowing whether the emissions from aluminum smelters has decreased since the introduction of the new laws.
The appropriate statistical test to address this hypothesis is Wilcoxon sign-rank test. This is because we want to compare two related samples on a single sample to assess whether their population mean ranks differ and thus Wilcoxon sign-rank test is applicable in the case.

Question 4

The following is a contingency table between gender and license status.

	LICENSE STATUS
GENDER		Valid	Revoked	Suspended	Total
	Male	49	32	16	97
	Female	50	33	15	98
	Total	99	65	31	= 195

By using R command in testing the hypothesis, the output codes are as shown below;

R output

> Male<-c(49,32,16)

> Female<-c(50,33,15)

> gender.survey<-data.frame(rbind(Male,Female))

> names(gender.survey)<-c('valid','revoked','suspended')

> chisq.test(gender.survey)

Pearson's Chi-squared test

data: gender.survey

X-squared = 0.052617, df = 2, p-value = 0.974

There no evidence of association between gender and license status in this sampleof NSW 17-year-olds. This is because our p-value is 0.974 which is higher than 0.05 hence failing to reject the null hypothesis concluding that mode of transport don’t differ by gender in the population of NSW 17-year-olds.
The requirements for a Chi-Square test are met since the sample is more than 45 observations.

Step 1: Setting up the hypotheses

And p-vale

Step 2: Selection of appropriate test statistics

To make use of the available data on the size of the effect we shall apply Chi-Square Test.

Step 3: Decision on the hypothesis

The null hypothesis will be rejected if the computed P-value is less than 0.05

Step 4: Computation of the test statistics in R

> Male<-c(49,32,16)

> Female<-c(50,33,15)

> gender.survey<-data.frame(rbind(Male,Female))

> names(gender.survey)<-c('valid','revoked','suspended')

> chisq.test(gender.survey)

Pearson's Chi-squared test

data: gender.survey

X-squared = 0.052617, df = 2, p-value = 0.974

Step 5: Conclusion

Our p-value obtained in this case is 0.974 that is higher than 0.05. Hence the null hypothesis is accepted and we can conclude that mode of transport don’t differ by gender in the population of NSW 17-year-olds. It implies that there no evidence of association between gender and license status in this sample of NSW 17-year-olds.

Question 5

Different researches require different sample sizes since each and every research have different aims and objectives making them to have different target group during the study.

Using the Online calculator the following five steps are applied;

Step 1: The required margin of error is E = 0.05

Step 2: The estimated standard deviation of the difference is δ = 3.0

Step 3: To produce 95% confidence, we use ? = 1.96

Step 4: Therefore the minimum required sample size is n = ( ) ² = 138.29 which is approximately = 138

Step 5: Hence the required sample size to achieve the power subject to the condition given is 138.

The sample size of 40 is relative a smaller sample size to be used during the study. The sample size will lead to a bigger margin of error and will also lower the confidence interval hence making the data to be biased.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay.. Retrieved from https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/alternative-hypotheses.html.

"Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/alternative-hypotheses.html.

My Assignment Help (2021) Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/alternative-hypotheses.html
[Accessed 23 May 2025].

My Assignment Help. 'Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/alternative-hypotheses.html> accessed 23 May 2025.

My Assignment Help. Statistical Study: Proportion, Distribution, And Hypothesis Testing Analysis Essay. [Internet]. My Assignment Help. 2021 [cited 23 May 2025]. Available from: https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/alternative-hypotheses.html.

Get instant help from 5000+ experts for