The main focus of this project is for you to find/gather, organize, analyze and present data. You will be choosing a topic, designing and carrying out a statistical study. Your study will include relevant descriptive statistics that you have learned about in this course.
Your final project will consist of three parts – a proposal, a written report and a video recorded presentation which you will upload to YouTube or other remote server. This project is to be completed individually and is worth 20% of your final mark for this course.
Type of Project
First you will need to decide what type of data you will collect for your project:
- Primary Datais information that you collect on your own. For example, this could be obtained by having students at your day school complete a questionnaire on paper or online (using survey monkey, etc.). Or canvassing respondents in your neighborhood.
- Secondary Datais information that you are taking from another source. It is important to use reliable sources. When choosing your topic, be sure that you will be able to find good data. Some places that students often obtain data from are provided in the "useful data websites" handout
This type of culminating task is very common for the Data Management course. You will be able to find many project exemplars online. Be sure to see what other students have done to get ideas on what works well and what can be improved upon.
- Draw up a calendar and set due dates for yourself. This is a major project so do not leave it until the last minute to Give yourself plenty of time.
- Think things through thoroughly before you get started. Make sure that all pieces will work. Good planning saves time in the end.
Proposal Requirements :
- An outline describing the procedure you took in researching and gathering the data
- Thesis –your main thesis question/statement and the sub-problems you are going to answer.
- Population, Sample (if applicable)
- Analyze –Explain each of the following:
What are the main variables in your question?
Can these variables be measured statistically?
Is there enough data to make an interesting analysis?
- Hypothesis –Predict what do you expect to find / observe?
- Why is it important for you to investigate this topic? Who is it most relevant to?
- Data –Include either 1) all of the raw data that you are going to use from the Internet, books, etc., sourced.
Note: For large datasets a 1-page sample including a WWW link (with Name/Title) to the rest is sufficient. I need to know how the researchers got their data. OR the survey that you are going to use. It should not be distributed yet. I want to critique it first.
Hint: Start your Bibliography as soon as you find your first useful web site. Trying to go back and find information later is a nightmare. If you have problems gathering ANY of these components.
You've likely chosen a difficult topic and it needs to be changed. Almost EVERY problem that I have seen on final projects was because of an incomplete or poorly done proposal phase. The following flawed projects have been seen in past Data Management courses:
- Projects that were far too large in scope. A research team of 100 working for 25 years would be unable to prove causation in the way that these students wished to do. This happens most often with projects like drunk driving, teenage pregnancy or economic problems. Choose less glamorous and smaller topics that you can find data about.
- Projects which attempted to prove causation instead of correlation.
- Projects whose entire body of evidence was based on the unreliable sources from the Internet. They made no attempt to figure out where their sources' data came from.
- Projects where random sampling involved giving a survey to everyone in their class.
- Projects where the students developed their surveys first and their research questions second. They ended up not asking the correct survey questions and were unable to prove their point.
Written Report Format:
Do not write this until you are finished your project!
In one page, briefly summarize your entire report.
A summary section is something that would be read by a manager who didn’t have enough time to read the entire report, so make sure that you have enough details that it can stand by itself.
At the very least, include the following information:
Problem: A clear statement of what you are trying to learn
Plan: The procedure you will use to carry out the study (How do you choose people? How do you measure? Who does the measuring? What methods are you going to use?)
Data: The data are collected according to the plan (What data did you collect? Where did it come from?) -
Analysis: The data are summarized and analyzed to answer the thesis question (numerical, graphical, informative sentences)
Conclusions are drawn about what has been learned (note any biases, suggest further studies)
- Main thesis question. The thesis question is the theme of your report (e.g. What is the relationship between an NBA player’s salary and their success?). Try to use the word “relationship” in your thesis question. Remember, you do not have the tools to try and find any cause and effect.
- Sub-questions: The sub-questions are the smaller questions that you will answer that will lead you to conclude on your main thesis question. These should be specific enough that they contain your variables that you will compare.
- The problems may evolve slightly throughout the life of your project. (e.g. What is the relationship between salary and a player’s points per game? What is the relationship between salary and a player’s rebounds per game? What is the relationship between salary and the number of games that a player has won?)
- Hypothesis – What do you expect to find?
- Define the populationand describe the characteristics of the population
- Define the independent variables
- Define the dependent variables
- Put all of your raw datacollected in an appendix, not in this section
- Include summariesof your key variables here (frequency tables – but not histograms or graphs)
- Identify all problemsyou ran into with your data (Did you need to ‘massage’ it to use it in Excel/Fathom? Did you alter the scale?)
For each sub-question identified, use the concepts we learned in class to describe the data or find trends/relationships. Only include those that are relevant.
(a) Numerical Statistics (your report must include at least 3)
- Find means, modes, and medians
- Find the standard deviation, , , IQR, percentiles
- Use linear regression and find the correlation coefficient, equation of a line of best fit
- Use non-linear regression and find the coefficient of determination, equation of a curve of best fit
- Relate your data to the Normal Distribution, Binomial Distribution or another distribution.
- Use z-scores and z-tables to find some useful information.
- Permutations, Combinations and Probability:
Predict the probability of certain events using your model
Do something else relating to probability
Use a simulation to help you discover a probability
Use the binomial theorem
Create a probability distribution
(b) Graphical Representations (you must include at least 3)
- Scatter plots (this should be included in every project as you will be finding many relationships)
- Bar graph / histogram / frequency polygon (histogram + curve) / cumulative frequency polygon (each freq. is a cumulative total) / relative frequency polygon (freq. as a %) / line graph / moving average
- Box and whisker
- Stem and leaf plot
- Any other relevant to this course
(c) Information – descriptive sentences. This part is very important and often overlooked by students. Don’t just provide numbers and statistics. Be sure to interpret them for the reader. What do the numbers tell you? Include this with each concept / graph.