1. On average do students enrolled in Unit X and students enrolled in Unit 1X spend the same time in the unit MySCU sites?
2. Explain the choice of independent and dependent variables
Answers:
With regards to the major issues highlighted by you, this would aim to offer clarification regarding all those issues.
Sample data has been extended for the students who appeared in the final with regards to the hours spent on MySCU site. These students have been selected from two units namely 1 and 11. Each sample has a total student count of 65. In accordance with tools available under inferential statistics, it has been derived that there is statistically significant difference in the mean hours that students from Unit 1 and Unit 11 tend to spend on MySCU site.
Also, a correlation and linear regression analysis was required to be performed on the sample I data. For this, the session score served as the independent variable while the final exam marks served as the dependent variable. The scatter diagram was also highly useful understanding that a strong relationship existed between the variables chosen as graphically apparent from a positive slope. Further, the regression model also echoes the same observation as the session mark is found to be significant as an independent variable and is successful in explaining a majority of the changes in the dependent variable.
However, for improving the predictive power of the regression model, a multiple regression model has been computed using time spent on MySCU site as the second independent variable besides the existing variable. However, this does not yield any improvement as the independent variable added (i.e. Time spent on MySCU) is found to be statistically insignificant and has low correlation with the dependent variable. Therefore, the linear regression model based only on session marks is the superior choice out of the two given regression models.
The objective is to perform hypothesis testing for ascertaining the given claim regarding the time spent by students.
Null Hypothesis (Ho): µ1 = µ11
Alternative Hypothesis (H1): µ1 ≠ µ11
Assumed level of significance is taken as 5%.
The standard deviation of population is not known in the given case, therefore it would be prudent to prefer a t test instead of a z test. The variables of interest have similar standard deviation an also common observations value.
The p value corresponding to two tail is to be used which stands at 0.00. Considering that the p value is smaller than the assumed significance level, hence rejection of null hypothesis takes place. Consequently, the alternative hypothesis is selected and hence it would be fair to reach the conclusion that time spent in unit MySCU tends to differ in a statistically significant manner for students appearing for the final and had enrollment in unit 1 and 11.
The relevant scatter plot between the two variables of interest is illustrated as follows.
The visual appearance of the scatter diagram above clearly indicates that session marks and exam marks show a positive relationship and it seems strong as well considering a strong linear trend.
The equation for regression model is captured below.
Exam Mark = 6.33 + 0.77* Session mark
The coefficient of session mark or slope has a value of 0.77 and highlights that as session mark would increase by 1 unit, correspondingly exam mark would also increase by 0.77.
The value of the intercept is 6.33 and it highlights that in the event of zero being scored by the student in the session, then also 6.33 marks would be scored in the final exam.
R2 values is 0.5654 which is indicative of the fact that the given independent variable I,e, session mark is successful in offering explanation to only 56.54% of the changes in exam mark which may be caused due to changes in the other independent variables.
Applicable correlation coefficient = 0.56540.5 = 0.7615
The correlation coefficient is moderate to strong and clearly suggests that session mark would be a critical independent variable in the determination of exam exams.
Multiple Regression Output
The regression model can be captured using the following equation.
Exam mark = 5.56 + 0.10*Time Spent MySCU Site Hours (Unit5) + 0.73* Sessions Mark
From the above equation, it is apparent that slope of time spent on MySCU amounts to 0.1 which implies that as a given student spends an incremental hour on MyCSU, he/she would be expected to score higher by 0.1 marks in the final exam.
The slope corresponding to session mark stands at 0.73 which implies that as a given student scores higher in session mark by 1 mark, he/she would be expected to score higher by 0.73 marks in the final exam.
The regression line has an intercept of 5.56 which indicates that if any particular student did not spent even a single hour on the MySCU site and simultaneously achieved a zero score as session mark would also score 5.56 marks in the final exam if he/she appears for the same.
R2 values is 0.5709 which is indicative of the fact that the given independent variable I,e, session mark is successful in offering explanation to only 57.09% of the changes in exam mark & time spent in MySCU hours which may be caused due to changes in the other independent variables.
The regression model containing only one independent variable i.e. session mark is the more superior model out of the two. For the multiple regression model even though there has been a marginal improve in R2 but it is on account of an additional independent predictor and cannot be attributed to addition of time spent on MySCU. The evidence of this is that the adjusted R2 value has decreased for multiple regression making it a worse model than linear regression. Also, the slope of time spent on MySCU is not significant and also the correlation coefficient between time spent and the marks in the final stands at only 0.45.