This assignment requires a considerable amount of computer work and written comment. You may need to seek guidance from your tutor along the way. Do not leave the Assignment until too late. Each question carefully describes what you are required to do, so please follow the instructions carefully. Your answer to each question should begin with the number of the question. Refer to the Assessment Criteria and General Marking Guide below for details.
As anyone who has looked at house prices knows, house prices depend on the local market. To control for that, we will restrict our attention to a single market. We have a random sample of 211 home sales for January 2015 from a metro region around the city of Chicago; data are obtained from a real estate research site of Zillow.com.
The first thing often mentioned in describing a house for sale is the number of bedrooms. In this assignment you will examine the relationship between house prices and the number of bedrooms, and develop a potential statistical regression model to predict house prices from the variable number of bedrooms.
Data is contained in a file called ‘HouseSales.xls’ and the columns of the file contain the following information:
Column Name Description
A House ID Number Number to identify each house sold
Price of the house (in thousands of $US)
C Bedrooms Number of bedrooms in the house
Before you begin any analysis you must take a random sample of 180 records from the 211 provided in the file HouseSales.xls. Use the Random Sample Generator, available on Moodle in the Lab Bundle, to do this. Your answers to the assignment tasks below are to be based on your sample of 180 records. Make sure you keep a safe copy of your sample, since you cannot use the Random Sample Generator to reproduce the first sample.
To prepare your data file ready for analysis, you must take the following steps:
1. Use the file Random Sample Generator, in the Lab Resources Bundle on Moodle, to generate a random sample of 180 records from the file.
2. Copy your sample to another spread sheet for working on your assignment and save it with another file name. Remember to save another copy of your sample under a different name as a backup
For each task below, you must answer all the questions in sequential order and submit all of the required printouts, graphs, tables and summaries required.
NB: Each graph and table should have a heading and each axis should have a label!!
1. Introduction and Variable List: Give a brief introduction to your report. Describe the nature of the data. Read questions 3 to 7 and briefly describe the specific data and relationships which will be examined here.
2. Data: Provide a printout of the data in your sample, sorted in ascending order based on House ID Number.
3. Produce a histogram showing the distribution of the price of ALL houses in your data sample. Provide your comments on the graph shape, and the most suitable measures of centre and spread for this data.
4. Produce a side-by-side box plot of house prices against bedrooms. Discuss what this box plot implies in context of the problem.
5. T-Test and CI:
a) Obtain appropriate descriptive statistics, and calculate a 95% confidence interval for the mean price of the houses, of your sample.
b) Assume that the average price of all houses sold in the region in January 2014 is $330, 000. Conduct a statistical hypothesis test to determine if the average price of houses sold in January 2015 (from your sample) is significantly different from the average price of the houses sold in January 2014. Mention any assumptions, include relevant hypotheses and report the results and conclusion in the conventional manner.
6. Scatter plot with trend lines: Obtain a scatterplot comparing the relationship between house prices and number of bedrooms. Think carefully about which variable should go on the vertical axis. Remember, it is the independent variable that goes on the horizontal axis (i.e. the x-axis). Include trend lines, their equations and R-squared values on the graph. Make sure you label axes properly and that your graph has an appropriate title.
Briefly compare the nature of the relationship between these two variables.
[5 marks] a
7. Use Excel to carry out a regression analysis on the two variables: house prices (in thousands of dollars) and the number of bedrooms.
a) Copy the output into your assignment and use it to determine the answers to the following questions.
b) Write down the regression equation.
c) State the R-squared value and the standard error and explain what they mean with respect to the data.
d) Write down the value of the gradient of the regression line and explain what it means for this data.
e) Are the values for the constant and the gradient (slope) significant (i.e. significantly different from zero) in this case? Justify your answer.
f)Do you think this regression model is a good model? Justify your answer using the regression output.
8. Using the information obtained for your analyses write a short conclusion about what you found from the study above.