You must provide your own unique solution. You may work with others, but each of you is responsible for submitting your own problem set solution. Each question is 30 marks and each part is of equal value. Submission of one file knit from RMarkdown is best, but acceptable alternatives are allowed.
1. Re-estimate the models from assignment 3, question 2, but now use the log(wage) as the dependent variable. Include the same set of covariates/explanatory variables: age, age2, sex, educational attainment, sector of employment, collective agreement status, firm size, immigrant status and province. In assessing the question, here are items to consider:
a. Should age be in log form? Note that none of the other explanatory variables are numeric, so only consider the issue of the log-transformation for the numeric transfor-mation of age.
b. Which model format fits best, with or without the log transformed wage as dependent variable?
c. What does the model predict for unemployment rates across provinces?
d. Does immigrant status interact with the other explanatory variables in explaining wage differences? To analyze, first estimate the model from assignment 3, question 2, part c. Then generate predicted values of wages using emmeans() with the "type = 'response' " parameter to convert from log-transformed values back into dollars. Do this for each pair of variables interacted with immigrant status, but no need to include age, age2 or province. Plots really help visualize these effects.
2. Predict the impact of immigrant status on the probability of unemployment.
• You need to define the unemployment variable. The dataset contains the variable lfsstat which has four categories, two identifying employed (at work, or absent from work), one for unemployed, and one for those not in the labour force. Recode this variable by defining a new variable, unemploy, as:
• Estimate your model using a subset of the variables from question 1: age, age2, sex, educational attainment, sector of employment, immigrant status and province.'
• For ease of analysis, recode the variable cowmain into a new variable sector with three categories: public, private, and self-employed. Drop the category "Unpaid family worker" by setting it to NA.
• Since the variable unemploy takes on two values: TRUE and FALSE, estimate a logit model. Use the glm() function with the option "family = binomial" which yields a logit prediction model.
a. Should age (and age2) be in log form? Note that none of the other explanatory vari-ables are numeric, so only consider the issue of the log-transformation for the numeric version of age.
b. What does the model predict for unemployment rates across provinces?
c. Does immigrant status interact with the other explanatory variables in explaining differences in the probabilities of unemployment? To analyze, first estimate a model with immigrant status interacted with the other categorical explanatory variables (not age or province), then generate predicted probabilities of unemployment using emmeans() with the "type = 'response' " parameter to convert log-odds into probabilities. Do this for each pair of immigrant status-educational attainment, immigrant status-sex, and immigrant status-sector of employment. Plots really help visualize these effects.