1) Suppose that Pfizer has hired you – a brilliant econometrician – to evaluate the effectiveness of their COVID-19 vaccine. The company has already completed their randomized control trial, where they randomly gave 1000 people the vaccine, and 1000 people a placebo. You have data that indicates who got the vaccine, who got the placebo, and the level of an antibody in their blood that would indicate whether or not the person contracted COVID-19 (as opposed to having them from the vaccine).
a. Write down a linear regression model that you could use to estimate the difference in COVID antibody levels between vaccinated and unvaccinated people. Explain the variables you would use and the parameters of the model.
b. Explain how you would estimate your model, and any assumptions you would need for your estimator to deliver an unbiased estimate of the effect of the vaccine.
c. Suppose you heard on a podcast that the COVID-19 vaccine does not prevent infection. Explain how you would test this claim with your data.
2) For each statement below, evaluate whether it is true, false, or uncertain, and fully explain your answer.
a. In the simple linear regression model, if R2 = 0 in an OLS estimation then the estimate! equals zero.
b. If the OLS estimator of the regression slope is unbiased, that means the estimator equals the parameter
c. If the linear regression model, but when you estimate the model you leave out, the OLS estimate of will always be biased.
3) Suppose you open a Stata dofile with the following code:
clear
set obs 500
set seed 12345
gen x = rnormal(15,4)
gen u = rnormal(0,10)
gen y = 3 + 4*x + u
a. If you estimated the slope in a regression of y on x, would it be unbiased? Explain how you know.
b. Suppose you estimate the slope using the data created above, and find that the estimate is 4.06. Is it a problem that your estimate is not exactly equal to the true slope of 4? Explain.
c. Imagine that you drew 10,000 samples from the population using the code above, and estimated the OLS slope for each one separately. You then compute the following summary statistics for those 10,000 estimated slopes Variable | Obs Mean Std. dev. Min Max
Using your estimate from (b) and the summary statistics in this question, test the null hypothesis that the slope equals zero versus the alternative that it does not equal zero at the 5% level.
4) Suppose you model the relationship between dependent variable test score (testscr) and independent variables student-teacher ratio (str) and % of people who receive a free meal at school (mealpct) and estimate the relationship by OLS in Stata. The result is as follows:
a. Precisely interpret the coefficient estimates from the model.
b. Suppose that the errors of this model are heteroskedastic. Would that cause any problems in this specific context? Explain.
c. Construct a 90% confidence interval for the effect of a 10 percentage point change in mealpct on test scores.
5) Suppose you are trying to relate wages to years of schooling and their parents’ income. You propose the following regression model:
a. Explain the assumptions required for an OLS to produce unbiased estimates of the slopes.
b. Suppose you added a variable to this regression that measures a person’s years of experience. Would expect the adjusted R2 be higher or lower than in the regression from (a)? Explain
c. Suppose you discover that the everyone in your sample has the same level of experience. Does this cause any problems with including experience in the regression?