# BSB123 Data Analysis

## Question:

1. (a) Construct a side-by-side boxplot of GPA for male and female science students, and compare their distributions (central location, spread and skewness).

(b) Recent research showed that in mathematics “on average for 15-year-old Australian students, females achieved at a significantly lower level than male students” (p.2, Buckley 2016)[1] . However, there is also research evidence that, “despite the stereotype that boys do better in math and science, girls have made higher grades than boys throughout their school years for nearly a century” (APA 2014)[2]. So it is not certain if one gender performs better than the other at university. Test if there is any difference in GPA on average between male and female students in the Science Department at the level of significance of 5%.

1. Education literature has shown that students with higher socio-economic status (SES) tend to have stronger academic achievement.
• Test at the level of significance of 5% whether students whose parents have a post-graduate qualification have a higher GPA than students whose parents have only an undergraduate qualification.
• Test at the level of significance of 5% if students whose parents have an undergraduate qualification have a higher GPA than students whose parents have only a secondary or below qualification.

Note: When conducting a test in Q1 and Q2, you should discuss briefly whether it is a one or two tail test, the test statistics, any assumptions made and draw a conclusion based on Excel output.

You plan to develop a regression model to further investigate how various factors influence students’ GPA in the Science Department.

1. Before you conduct any regression analysis, you use Excel to construct a correlation matrix of all the quantitative variables in the dataset. Based on the correlation matrix, comment briefly on the direction and strength of linear associations between GPA and other quantitative variables (viz. HS_SCI, HS_ENG, HS_MATH and ATAR).
2. Based on the correlation matrix obtained in Question 3, would you say that:
• HS_SCI is a predictor of GPA
• HS_ENG is a predictor of GP
• HS_MATH is a predictor of GPA
• ATAR is a predictor of GPA

Conduct simple regression to support your answers to questions (i) to (iv).

1. You conduct a stepwise regression according to the following procedure:

Step 1: HS_SCI only

Step 2: HS_SCI and HS_ENG

Step 3: HS_SCI, HS_ENG and HS_MATH

Step 4: HS_SCI, HS_ENG, HS_MATH and PARENT EDUC

Step 5: HS_SCI, HS_ENG and HS_MATH, PARENT EDUC AND GENDER

Step 6: HS_SCI, HS_ENG and HS_MATH, PARENT EDUC, GENDER and ATAR

Present the regression output for each of the six steps.

Choice of reference category: It is recommended that you choose U (undergraduate education) as the reference category for the categorical variable: PARENT EDUC.

1. For each of the independent variables contained in the regression model in Step 5, fully interpret the regression (slope) coefficients and comment on their statistical significance.

In discussing statistical significance of a regression coefficient, you have to justify your choice of a one or two tail test.

1. In stepwise regression Step 6 you noticed that the regression coefficient of ATAR is negative (!). Does this result surprise you, given the correlation of ATAR and GPA? Why/Why not? Does the inclusion of ATAR improve overall model fit? Discuss fully.

As the chief project officer, you are asked to write a summary report (300 words) detailing all the findings from your data analysis. The issues you can discuss in your report should include (but are not limited to) the following:

• There has been research evidence that SES and academic achievement are closely related[3]. Based on your analysis in this study, critically examine education researchers’ claim that student SES has an impact on their academic performance.
• Do you think there is any gender difference in academic achievement of Science students?
• In stepwise regression from Steps 1 to 3 where HS_SCI, HS_ENG and HS_MATH are added to the model progressively, observe the changes to the regression (slope) coefficient of the independent variable, HS_SCI. Do you think HS_SCI is a good predictor of students’ GPA?
• Is ATAR a good predictor of GPA? Should ATAR be included in the model?
• Based on this study, what is the final model you would recommend to the Head of the Science Department?
• Comment on the overall adequacy of the final model.
• Are there any other important factors influencing academic performance that were not included in this study? What are they.

