a) Create a normally distributed ‘population’ of 100000 cases with a mean (μ) of 50 and a standard deviation (σ) of 8, and call it ‘pop’. This is your pretend population.
b) Take a sample of size N = 30 (without replacement) from the population, and call it ‘samp’.
c) Pretend to give your sample an intervention. More specifically, add a normally distributed variable with µ = 5 and σ = 5 (call it ‘inter’) to your sample (‘samp’). Call this new variable ‘inter_samp’.
d) Use null hypothesis testing (all steps, α = .05) to determine if your intervention was effective at increasing scores. Be sure to interpret (even if it is purely subjective) the magnitude of the effect size and the confidence interval width, and include a general summary of the results.
Dr. White asked 400 students to choose between five characteristics of an instructor that they find most important. The characteristics that they had to choose among were (dataset codes in parentheses): 1) enthusiasm (enthusiasm); 2) humour (humour); 3) level of difficulty (difficulty); 4) clarity (clarity); or 5) caring attitude (caring). The dataset (available on Moodle) is entitled ‘ass2_q2.csv’. Use α = .01 for the questions below.
a) Use null hypothesis testing (all steps) to determine if enthusiasm, humour, level of difficulty, clarity and caring attitude are not equally important in terms of being the most important characteristic of a good instructor. Be sure to interpret the effect size (again, even if it is purely subjective) and include a general summary regarding the results.
b) What Dr. Black was really interested in was whether enthusiasm or level of difficulty differed in terms of the frequency with which they are chosen as a characteristic that students value in an instructor. Using only students who chose one of these two options (enthusiasm, difficulty), use R to generate a table of frequencies for each of the two options. Then, BY HAND, use null hypothesis testing (all steps) to determine if students differed in their preference for enthusiasm or level of difficulty as their most important characteristic.Calculate the p-value and/or critical value using R. Be sure to (subjectively) interpret the effect size and include ageneral summary statement regarding your results.
Generating a normally distributed population in R
R Code
#creating random samples of n=100000 with a mean of 50 and standard deviation of 8
pop<-rnorm(100000,mean=50,sd=8)
#sampling data from “pop” with N=30 and without replacement
samp<-sample(pop,15)
#generating another sample data known as “inter” with mean=5 and sd=5
inter<-rnorm(1,mean=7,sd=5)
#combining both “inter” and “samp” data
inter_samp<-c(samp,inter)
#Hypothesis statement and hypothesis test
#Null hypothesis “H0:mu=mu0; alternative hypothesis “H1:mu>mu0”
t.test.right<-function(data,mu0,alpha)
{
#declaring and defining the t-statistic formula
t.stat<-(mean(data)-mu0)/(sqrt(var(data)/length(data)))
#degrees of freedom calculation
dof<-length(data)-1
#calculating t critical value
#Es alpha 0.05 -> 1.64(df = Inf)
t.critical<-qt(1-alpha,df=dof)
# Calculation of p-value
p.value<-1-pt(t.stat,df=dof)
#Decision making using test results
if(t.stat>t.critical)
{
print("Reject H0")
}
else
{
print("Accept H0")
}
print('T statistic')
print(t.stat)
print('T critical value')
print(t.critical)
print('P value')
print(p.value)
return(t.stat)
}
t.test.right(inter_samp,mu0=50,alpha= 0.05)
#summary
summary(inter_samp)
#calculation of 95 percent confidence interval
error<-qt(0.995,df=length(inter_samp)-1)*sd(inter_samp)/sqrt(length(inter_samp))
error
#Lower bound confidence interval calculation
Lower<-mean(inter_samp)-error
Lower
#Upper bound confidence interval calculation
Upper<-mean(inter_samp)+error
Upper
#End of program
Program Output
> #creating random samples of n=100000 with a mean of 50 and standard deviation of 8
> pop<-rnorm(100000,mean=50,sd=8)
> #sampling data from “pop” with N=30 and without replacement
> samp<-sample(pop,15)
> #generating another sample data known as “inter” with mean=5 and sd=5
> inter<-rnorm(1,mean=7,sd=5)
> #combining both “inter” and “samp” data
> inter_samp<-c(samp,inter)
> #Hypothesis statement and hypothesis test
> #Null hypothesis “H0:mu=mu0; alternative hypothesis “H1:mu>mu0”
> t.test.right<-function(data,mu0,alpha)
+ {
+ #declaring and defining the t-statistic formula
+ t.stat<-(mean(data)-mu0)/(sqrt(var(data)/length(data)))
+ #degrees of freedom calculation
+ dof<-length(data)-1
+ #calculating t critical value
+ #Es alpha 0.05 -> 1.64(df = Inf)
+ t.critical<-qt(1-alpha,df=dof)
+ # Calculation of p-value
+ p.value<-1-pt(t.stat,df=dof)
+ #Decision making using test results
+ if(t.stat>t.critical)
+ {
+ print("Reject H0")
+ }
+ else
+ {
+ print("Accept H0")
+ }
+ print('T statistic')
+ print(t.stat)
+ print('T critical value')
+ print(t.critical)
+ print('P value')
+ print(p.value)
+ return(t.stat)
+ }
> t.test.right(inter_samp,mu0=50,alpha= 0.05)
[1] "Accept H0"
[1] "T statistic"
[1] -3.043226
[1] "T critical value"
[1] 1.75305
[1] "P value"
[1] 0.9958919
[1] -3.043226
> #summary
> summary(inter_samp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.32 39.54 44.94 42.33 48.95 52.16
> #calculation of 95 percent confidence interval
> error<-qt(0.995,df=length(inter_samp)-1)*sd(inter_samp)/sqrt(length(inter_samp))
> error
[1] 7.424564
> #Lower bound confidence interval calculation
> Lower<-mean(inter_samp)-error
> Lower
[1] 34.9077
> #Upper bound confidence interval calculation
> Upper<-mean(inter_samp)+error
> Upper
[1] 49.75682
> #End of program
Interpretation
The results shows that 95% confidence interval is (34.91, 49.76) with a mean of 42.33 and standard error of 7.42
Reject null hypothesis is the observed t statistic is greater than the value of t critical (Gentleman, 2009; Braun & Murdoch, 2012; Baker & Trietsch).The test statistic = -3.04 which is greater than t critical (1.79) therefore, we fail to reject null hypothesis. We do not have sufficient evidence thus we accept null hypothesis and conclude that intervention was effective in increasing the scores.
The five number summaries are 11.32, 39.54, 44.94, 42.33, 48.95, and 52.16.
Null hypothesis: Enthusiasm, humour, difficulty, and clarity characteristics are equally important for a good instructor
Alternative hypothesis: At least of the characteristics is not important for a good instructor
We are dealing with categorical variables therefore chi-square test is the most appropriate statistical test for this problem. Next we create a frequency table as follows:
Observed , O |
Expected ,E |
O-E |
= |
|
Enthusiasm |
121 |
80 |
41 |
= 21.0125 |
Humour |
45 |
80 |
-35 |
= 15.3125 |
Difficulty |
92 |
80 |
12 |
= 1.8 |
Clarity |
87 |
80 |
7 |
= 0.6125 |
Caring |
55 |
80 |
-25 |
= 7.8125 |
= 400 |
= 46.55 |
n=5 variables
Mean, = = = 80
Observed Chi-test value, = 46.55
Degrees of freedom = n-1 = 5-1 = 4
Chi-test critical value () = 9.4877 (from chi-square table)
Interpretation
The observed chi-square statistic is less than the chi-square critical value thus the test results are statistically significant. We therefore reject null hypothesis and conclude that at least of the four characteristics (enthusiasm, humour, difficulty, and clarity) is not important for a good instructor.
R code
#Attaching dataset
R_order <- read_excel("C:/Users/User/Downloads/Desktop/R order.xlsx")
View(R_order)
attach(R_order)
#Creating frequency table
table<-table(enthusiasm,difficulty)
table
R output
>attach(R_order)
> table<-table(enthusiasm,difficulty)
> table
difficulty
enthusiasm 1
121 92
Null hypothesis: Students differed in their preference for enthusiasm or level of difficulty
Alternative hypothesis: Students did not differ in their preference for enthusiasm or level of difficulty
Chi-square calculation by hand
Observed , O |
Expected ,E |
O-E |
= |
|
Enthusiasm |
121 |
106.5 |
121-106.5 =14.5 |
= 1.974 |
Difficulty |
92 |
106.5 |
92-106.5= -14.5 |
= 1.974 |
= 213 |
= 3.948 |
n=2 variables
Mean, = = = 106.5
Observed Chi-test value, = 3.948
Degrees of freedom = n-1 = 2-1 = 1
Chi-test critical value () = 3.84 (from chi-square table)
# R code to calculate p-value at alpha = 0.01
pchisq(3.948, df=1, lower.tail=FALSE)
[1] 0.04692711
The observed chi-square statistic is greater than the chi-square critical value thus the test results are statistically significant. We therefore reject null hypothesis and conclude that Students did not differ in their preference for enthusiasm or level of difficulty. Similarly, using P-value techniques, we reject null hypothesis since the observed p-value is less than 0.01 thus we reject null hypothesis.
References
Baker, K., & Trietsch, D.(2009) Principles of sequencing and scheduling.
Braun, J., & Murdoch, D.(2012). A first course in statistical programming with R.
Gentleman, R. (2009). R programming for bioinformatics. Boca Raton: CRC Press.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Population Sampling And Hypothesis Testing In R: An Essay.. Retrieved from https://myassignmenthelp.com/free-samples/psyc2021-statistical-methods/hypothesis-test.html.
"Population Sampling And Hypothesis Testing In R: An Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/psyc2021-statistical-methods/hypothesis-test.html.
My Assignment Help (2021) Population Sampling And Hypothesis Testing In R: An Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/psyc2021-statistical-methods/hypothesis-test.html
[Accessed 21 January 2025].
My Assignment Help. 'Population Sampling And Hypothesis Testing In R: An Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/psyc2021-statistical-methods/hypothesis-test.html> accessed 21 January 2025.
My Assignment Help. Population Sampling And Hypothesis Testing In R: An Essay. [Internet]. My Assignment Help. 2021 [cited 21 January 2025]. Available from: https://myassignmenthelp.com/free-samples/psyc2021-statistical-methods/hypothesis-test.html.