Regression analysis of house price based on square feet

Data and Methodology

Question 1

Consider a regression of the price of a house (price) on its surface measured by the square feet (sqft) using the dataset br.rdata. Exclude the last 10 observations of the dataset and save them for an out-of-sample study. The observations that are retained form the in-sample data.

1. Produce a scatter plot of the two variables. Discuss your findings.

Hint: There is no need to sort the original data.

2. Estimate the linear model
price = !1 + !2sqft + e (Model 1),

and add the regression line to your scatter plot.

3. Comment your results.

• Are the coefficients statistically significant at a 5% level?
• How do you interpret the slope and intercept parameter? [04 points]
• Does the model fit well the data?

4. Compute and interpret the elasticity ? of the price with respect to the sqft.

5. In a table, report the lower and upper bounds of the 5% confidence interval of the predicted value of the price for each of the ten observations that you excluded. Next, add to this table a new column that contains the true observations (last ten observations that you excluded). For how many observations, is the true data included in the confidence interval?

6. Compute the out-of-sample R2 for the 10 excluded data. How does it compare to the insample R2 for the same dataset.

Hint: To compute the in-sample R2, you need to run the regression using the full dataset br and then use the R2 formula to compute the R2 for the subsample of the last 10 observations. The idea is to compare the model fit to this subsample when it is included in the regression sample vs when it is excluded.

Estimate this alternate model

ln(price) = !1 + !2 ln(sqft) + e (Model 2),
and add the fitted line to the scatter plot.

Hint: For this graphical representation, you may sort the original data

8. Compute and interpret the elasticity of the price with respect to the sqft. How does this elasticity estimate compare to the one you found in 4)? Does Model 2 fit the data better
than Model 1 in-sample and out-of-sample?

2 Question 2

The dataset bangla.rdata contains information on sugar cane supply. 1. Estimate the following model with one lag

ln(AREAt) = !1 + !2 ln(PRICEt) + e

2. Plot the correlogram of the residuals. What autocorrelations are significantly di↵erent from zero?

3. Perform an LM test for autocorrelated errors using one lagged residual and a 5% significance level.

4. Find two 95% confidence intervals for the elasticity of supply, one using least squares standard errors and one using HAC standard errors. What are the consequences for interval estimation when serially correlated errors are ignored?

5. Estimate the model under the assumption that the error is an AR(1) process. Is the estimate for significantly di↵erent from zero at a 5% significance level? Compute a 95% confidence interval for the elasticity of supply. How does it compare with those obtained in part 3)?

6. Estimate an ARDL(1,1) model for sugar supply response. What restrictions are necessary on the coefficients of this model to make it equivalent to that in 4)? Do these restrictions seem to be satisfied? Do the residuals from this model show any evidence of serial correlation?

Get instant help from 5000+ experts for