A cost estimator for a construction company has collected the data found in the source file Estimation.xlsx describing the total cost (Y) of 97 difference projects and the following 3 independent variables thought to exert relevant influence on the total cost: total units of work required (X1), contracted units of work per day (X2), and city/location of work (X3). The cost estimator would like to develop a regression model to predict the total cost of a project as a function of these 3 independent variables.
a. Prepare two scatter plots showing the relationship between the total cost of the projects and each of the two independent variables (X1, and X2). What sort of relationship does each plot suggest?
b. Suppose the estimator wants to use the total units of work required (X1), contracted units of work per day (X2), and city/location of work (X3) as the independent variables to predict total cost. What should be the regression function between Y and X1, X2, and X3? What is the adjusted R-squared value of this model? What conclusions can you make? (Note that X3 is a dummy variable. You should process it into different categories as I showed you in the class lecture. You should expand X3 into Location1, Location 2, …. Location 5 to differentiate the six locations.)
You can submit your Part A Excel file you receive as the final answer
Part B
Use the data in the SeasonalData.xlsx file to answer the following questions:
Question 1: What are the base sales in the four quarters in the first year (i.e., 2008)? What are the seasonal influences in the four quarters in the first year (i.e., 2008)?
Question 2: What is the base sale in each quarter from the second year (i.e., 2009 to 2013)? What is the seasonal influence in each quarter from the second year (i.e., 2009 to 2013)?
Question 3: What is the forecasted sale in each quarter from the second year (i.e., 2009 to 2013)?
You may need the following video to complete this project