# BSB123 : Data Analysis

## Question:

1).(a) Construct separateboxplots of salaries for male and female academics, and compare their distributions (central location, spread and skewness).

(b) Test if male academics on average earn more than their female counterparts at 1%.

2.(a) Considering assistant professors only, test if male assistant professors on average earn more than female assistant professors at 1%.

(b) Considering associate professors only, test if male associate professors on average earn more than female associate professors at 1%.

(c) Considering professors only, test if male professors on average earn more than female professors at 1%.

You plan to develop a regression model to investigate how various factors influence academic salaries.

3).Before you conduct any regression analysis, you use Excel to construct a correlation matrix of all the quantitative variables in the dataset. Based on the correlation matrix, comment briefly on the associations between Salary and other quantitative variables.

4).You conduct a stepwise regression according to the following procedure:

Step 1: Gender only

Step 2: Gender and School

Step 3: Gender, School and Rank

Step 4: Gender, School, Rank and Years of Service

Step 5: Gender, School, Rank, Years of Service and Age

Choice of reference variable: It is recommended that you choose Health and ASSO as the reference variables for the categorical variables: School and Rank.

Present the regression output for each of the five steps.

5).Based on the regression output obtained in Step 4, answer the following:

a).Which summary measure in the regression output is used to assess the overall adequacy of the model? Comment on the overall adequacy of the model obtained in Step 4.

b). For eachof the four independent variables, fully interpret the regression coefficients and comment on their statistical significance. (In discussing statistical significance of a regression coefficient, you have to justify your choice of one or two tail test.)

6).Based on the correlation between Age and Salary, did you expect Age to have a statistically significant effect on Salary? In Step 5, is the statistical significance of the regression coefficient of Age as expected? Discuss fully.

