OPMT 1197: DATA ANALYSIS (EXCEL) PROJECT Instructions For this course during this semester, you will learn lots about doing statistical analysis. For this data analysis project, you address some questions that interest you with the statistical methodology we are learning in class. You have some questions you want to answer, a set of mandatory statistical questions that you have to answer, decide how to collect data, and analyze about it. You can work in groups of two or three people on the project. You can work by yourself. However, please be aware of the amount of work involved. The most important aspects of any statistical analysis are stating questions and collecting data. Hence, to get the full experience of running your own study, the project requires you to analyze data that you collect. It is not permissible to use data sets that have been put together by others. You are permitted to collect data off the web; however, you must be the one who decides on the analyses and puts the data set together Good projects begin with very clear and well-defined hypotheses. The statistics should not drive the question; the question should drive the statistics. Group Work Rules Each group should spread the work among members so that everyone share in the project. If some group members do not contribute their assigned workload, or are unwilling to take on work, your group may petition to have such group members dropped form the group. The process of this petition proceeds as follows: 1) Send an email to the instructor explaining how the group members have not contributed adequately. All members of the group must be sent this email. This is to ensure that everything is done openly. 2) The instructor will arrange a meeting with the group. Subjects of a petition who fail to attend the arranged meeting will be dropped from the group. 3) At this meeting, the instructor will make a decision on the petition. The petitions can be made until November 13, 2020. After this date, groups will not be split up. Also, any group member who does not contribute after promising to do so will be dropped from the group. Structure of the Report: The length of the report is between 8 and 10 pages (plus embedded charts and requested appendices). The report should include the following sections: Title page: Title, date, authors, instructor, course. Table of content: The report sections. Abstract: This may include a summary of the main sections (case background, problem, objective and findings) Introduction: This may include a brief description of the role of statistics in the business field and the nature of the business (for example: real-estate business). Problem: State briefly the case problem (statement, indications, causes & consequences) Objective: State briefly the report objective (Why you are conducting this research? and What is your accomplishment?) Methodology: Describe the population/sample (Diagram is required), the sampling techniques and the data collection methods. Explain any excluded data. Briefly mention the statistical techniques. Analysis and Results (Q1-Q13): Present the basic information requested. The results should be presented as meaningful statements in context, not generic hypothesis and “reject” statements. Organize in a logical fashion, not necessarily in order of specific analyses requested. It should read as a professional management report about the subject, not a series of separate analyses done for homework. A sample of the thirteen questions is listed below and it includes a project description too. Conclusion: Summarize the results of the “Analysis and Results” section, state your research finding and discuss any difficulties or limitations and suggestions for improvement. Appendices: This may include Excel printouts, extra tables or other relevant information not appropriate to be included in the main text. (legible and clearly labeled) References: State your references (APA style preferred) Sample Project Description Case Background: David currently lives in Ottawa, Ontario. David has received a job offer from a local company in North Vancouver, BC. David has researched the city and is excited about moving to North Vancouver and living near the ocean and the mountains. David has also heard that housing in the Vancouver area is expensive compared to Ottawa. David contacts a realtor in BC and asks for a brief report on the price of houses sold in North Vancouver over the past 10 years. The realtor decided to use a sample of 100 to prepare the report. The Excel file containing the data is posted online. You are asked to prepare a management report regarding the price of houses sold (in North Vancouver) and the factors that affect it. Analysis and Results: Has the mean price varied over the years? Use appropriate charts to describe individually the relationship between the key response variable (price sold) and the following key explanatory variables (neighbourhood, special feature, pool). Comment on the result. Construct a pivot table to group the following variables: For example, Neighbourhood vs. special feature. Use various statistics tools to describe the relationship between variables. Comment on the result. Think of using appropriate charts and tables to describe the population characteristics (OPTIONAL) Which of the variables (year sold, age, building size) is the most and least highly correlated with price? How strongly is age related to price sold? Is there a relationship between the # of bathrooms and price sold? Think of examining the relationship between the key response variable and other explanatory variables (OPTIONAL) Does having a pool matter? Is there a difference in the mean and distribution of price between the two groups (pool versus no pool)? Does having a special feature matter? Is there a difference in the mean and distribution of price between the two groups (special feature versus no special feature)? Is there a difference in price for the different neighbourhoods? Compare the means for the different neighbourhoods. Think of running other hypothesis tests to detect the explanatory variables that have an impact on the key response variable. (OPTIONAL) Create the best multiple regression model you can to predict price. Analysis and Results: Descriptive Statistics Analysis Report: The percentages of total sample/population size, variety of appropriate charts/tables (bar chart, pie chart, histogram and pivot table), the means and standard deviations where appropriate and your conclusion statements. Appendix: NA Scatter Plot & Correlation Analysis Report: The scatterplot, the correlation coefficient, and your conclusion statements. Appendix: NA Inferential Statistics Analysis Report: The values needed for the test (means, standard deviation, proportion, sample size, etc.) and your conclusion statements (?, critical value/test statistics and p-value in brackets). Appendix: Include raw data and EXCEL outputs for each test. Model Building Report: The steps needed to create the regression model (*), the equation for the best model, and your conclusion statement (interpretation of the R2, explanation of what each predictor retained contributes to the regression (interpret its coefficient in context)). Develop the multiple regression model (Justify the selection of suitable variables) Consider the significant variables Watch for collinearity and, if high, consider which variable is most appropriate. Include stepwise procedures and other procedures (if necessary) Appendix: Include raw data and EXCEL outputs (Regression analysis, VIF table, stepwise regression analysis, etc.) Recommended Steps for Completing the Project: Think about what type of data each variable is and what type of analysis would be appropriate. Indicate the test that should be used (State Why? Determine whether the conditions of using the test are met or not. Assumptions may be required). The calculation and technology output will be done in the next stages. Explore online resources for writing statistics report. Set tasks, deadlines and rules for the team member Clean your data/prepare your data for analysis. Use this data for the descriptive/correlation analysis part. Randomly select 2 or 3 samples of 15 values each from the cleaned data to perform the inferential analysis part (e.g. Sort the cleaned data into two groups; smokers and non-smokers and then randomly select two samples of 15 values each). Randomly select a sample of at least 30 employees from the cleaned data. Use this data to develop the multiple regression model. Throughout the term, complete specific analyses requested as we learn the techniques. Develop relevant charts, tables for the key variables using descriptive statistics tools Run all the hypothesis tests: 1, 2, & 3. Run all the correlation tests Run a regression model test Prepare the report Evaluation: 1. Abstract, Introduction, Problem, Objective, Methodology - 5 marks 2. Analysis and Result (complete, correct, accurate explanations) – 15 marks 3. Presentation and writing (formatting, report style, citation) – 3 marks 4. Appendices (clear, appropriate, well labelled) – 2 marks 5. Thoughtful conclusions and discussion – 5 marks TOTAL – 30 marks Submission Please submit the report by one group member to Learning Hub. You can submit your report to the following place: ACTIVITIES=> ASSIGNMENTS=> PROJECT. Please make sure the student’s names and student numbers are on the cover page.