Aim of the Study
Aim: For this coursework you will act as a junior business analytics consultant working for one of the group members hometown estate agency with the aim of carrying out study on the housing market his her hometown.
The manager of the real estate agency wants to obtain a general view of the housing market focusing on the housing prices type and size of houses for sale.
The Data Set
You will collect your data from a reliable source using the first part of your hometown’s postcode or a UK postcode of your preference. (e.g. if your postcode is B17 8GH you should use B17 only to collect the data). Select information for a sample of around 80 to 100 houses making sure that you have a representative sample of different type of houses (flat, terraced, detached, semidetached), and sizes (i.e. number of bedrooms). The total number of houses in your postcode (i.e. population) should be more than 100. If this is not the case then you need to extend your search to a neighbouring postcode to ensure that you have a large enough population from where you will draw the sample of 80 to 100 houses.
For each house collect information on the price and a maximum of four characteristics such as: house type, number of bedrooms, number of bathrooms, and distance from nearest railway station (i.e. if you think that distance is not a good choice of characteristics you can use another but please justify your choice). With the exemption of price the above characteristics are just suggestions. You need to think which features (characteristics) in your postcode are important and could influence people’s purchasing behaviour.
Once you have completed your statistical analysis write a report that provides the manager of the company with the information he/she needs. However, remember that managers are numerate but not statisticians! You therefore need to produce a professionally written report separated in two parts, the main part and appendix. The main part – around 5 pages - needs to include only information without any statistical terminology, i.e. Managers that are not experts in statistics need to be able to understand it. All technical information and analysis needs to be included in the appendix.
The report should contain amongst other things the following information,
1. A description of the problem, the source of your data, the limitation and any problems encountered in using the data. Discuss your sampling method and justify why the sample is representative of the population.
Data Set Collection
2. Produce at least three visualisation methods for either the population or the sample (i.e. they should be of different type) to describe the data. One of them should definitely include more than one house characteristics. A clear summary of the information obtained from each visualisation method as well as a justification of their choice of type is required. Make sure that you evaluate the visualisation methods by critiquing their adequateness and ensure that appropriate principles and guidelines are followed when drawing them. At least one visualisation method should reflect the house market in that location as a whole.
3. A clear summary and table of the descriptive statistics and the information which can be obtained from these statistics.
4. Assuming that the housing prices are normally distributed present your manager with the 95% or 99% confidence interval of the average house price per house type and explain their meaning.
5. Undertake the necessary analysis to produce your manager with a summary on whether the average price of the different type of houses in your data sample is in line with the average price in the UK your region county.
6. Carry out correlation analysis (i.e. correlation matrix) between price (dependent variable) and all the other house characteristics you think affect the price of the house (independent variables). Interpret results. It is usually useful to know which variables are highly correlated but it is also sometimes useful to draw attention to correlations, which are unexpectedly low.
7. Carry out regression analysis and derive the most parsimonious model. Comment on the significance of the effect of the independent variables (e.g. size, house type, number of bedrooms etc.) on the dependent variable (price). Comment on the magnitude of the effect of the independent variables on the dependent variable. Provide the reasoning behind the steps taken to identify the most parsimonious model and the reason for choosing your model (i.e. model selection criterion).
8. Carry out the residual analysis for the final (most parsimonious) model. There is no need to carry out the residual analysis for any other model since it is a waste of time. Do not show me any residual analysis for any model other than the final one. Comment on the suitability of the final model with regard to its adequacy, goodness of fit and suitability. If the model is inadequate, what does it mean and comment on what should be done to address this problem. (Hint: residual analysis refers to checking the five regression assumptions).
9. Write the derived statistical model and give example of its usage.
10.Any analysis taken should be justified answering the following questions. Why is this method used? What does this method do? What is the information obtained?