Goal: Create a memo-style paper addressed to a key decision-maker that uses simple linear (OLS) regression analysis to support recommendation to take or not take a course of action.
1)Setup a memo header with To, From, Date, and Re or Subject line
Your “To:” You are writing to a key decision-maker at Perine and Associates Consulting Group. Pretend that your analysis memo will be used to help inform one of my lead consultants that is working with one of the clients listed on the Projects 4 page in ELMS. You can be creative about naming the key-decision maker, or you can address it to me. Do not address it “To Whom It May Concern” or a similarly vague recipient.
2)Write 1-4 sentences to motivate the memo. This can be used to discuss why the issue is important and, if applicable, whom it may affect.
3)Identify your target population if it is not immediately clear from the introduction.
4)Write a hypothesis statement reflecting what it is you are researching for this memo. The statement does not need to be formally written, but your position should be clear and succinct. You do not need to state the null.
5)Give a spoiler statement of your findings.
6)Identify the dataset, or subset of data, and variables you are using and operationalize (define & identify) the variables in the dataset that will represent the variables in your theory.
a)If the target population if it is not immediately clear from using the entire dataset, then identify that you are using a segment of the larger available data, e.g. GSS should be stated as nationally representative but a target population of “poor people” using the GSS will need to be defined and properly subset.
b)Your dataset must be one of the approved datasets listed as applicable to the project client list (see the Projects 4 Page in ELMS).
c)You will need two variables—one independent and one dependent.
i)The dependent variable must be a continuous (quantitative) variable, so it must be interval or ratio.
ii)The IV must be a continuous (quantitative) variable, so it must be interval or ratio.
iii)You may not use an ordinal variable for either independent or dependent variable. That changes how you interpret findings, compared to how you interpret continuous variables
7)Discuss each variable.
a)Summarize the concept of the variable that you will use. Do not write the coded name from the dataset. For example, if your variable measures religion, you might call it religion and one GSS variable would be RELIG16, but you would not type RELIG16 in your memo.
b)Summarize the question or statement from the codebook or other dataset supporting documents that describes the variable.
c)Summarize the response categories, such as a Likert scale, list the response options.
d)Provide summary statistics for the independent and dependent variables.
i)Report the n, min, max, median, mean and standard deviation for each.
ii)If you have a substantial amount of missing values or extreme outliers and do not take any action on them, mention if/how these values might affect your analysis and potential generalization to the population. You may choose how to report it, e.g. frequency or proportion, but do so in the narrative. Don’t use summary table space for missing values.
8)Specify & Perform a Simple Linear Regression.
a)Check the simple linear regression assumptions
i)You do not need to state each assumption in your memo, but there should be evidence that you checked for assumption in your R script. If you did not violate any of the assumptions, then simply acknowledge in the memo that all assumptions are fine—but be sure to have the code that confirms it in your R script.
ii)Verify if you violate any of them, explain what you saw that identified the violation. As an option—you may suggest what you would / could do to resolve any of these violation issues.
iii)Use your regression diagnostic plots to help identify any violations.
b)Make sure you are using the correct variable types (See 6c above)
c)State the level of alpha you chose for the analysis.
d)Specify your regression model. This version will have your regression coefficient results and includes your variable names.
i)If you include the error term symbol (ε), you do not need a hat on your dependent variable (). If you do not include the error term, your DV should have a hat.
ii)Place the regression model equation on its own line. Do not put it inside of a paragraph alongside other sentences.
9)Plots
a)Create a scatterplot of your independent variable (x-axis) and dependent (y-axis). Include the regression best fit line.
i)Your scatterplot should have cleaned up title and axis labels—do not use default title and labels that R or other technologies generate. Color and other features are optional.
b)Regression diagnostics – include the R graphics in your paper. You may copy/paste these ‘as-is’. Place them at the end of your paper on a separate (3rd) page.
10)Interpret and report your findings.
a)Were your findings statistically significant at your selected alpha level?
i)Was the entire model statistically significant? What about the independent variable?
ii)Are you able to support your hypothesis statement (reject H0)? Are you able to generalize your findings to the target population?
b)Were your findings substantively significant?
i)Interpret your independent variable regression coefficient. You do not need to interpret the y-intercept.
ii)State if you believe the coefficients are meaningful or not, and why
iii)Interpret your R2 (use the Multiple R-squared value, not Adjusted)
c)Identify any potential weaknesses or gaps in your study and how, if possible, they might be remedied in a future analysis.
How to write and format the project:
A)The entire paper should be written in a narrative flow. Do not write responses that match specific direction numbers. For example, do not say, “For item 5) my spoiler finding is …”
B)All tables, charts, and other graphics will be presentation quality (PQ). Do not provide raw output from R or other software that has not been improved to PQ.
a)Tables, charts, and graphics should be legible and nicely positioned on the page (e.g. do not put them in the center of the page, do use features like text wrapping).
b)These should be large enough to convey the needed information, but do not make them so large that they dominate the page and minimize the space for your writing.
C)Use APA citation for reporting statistical results. See here for a brief example: http://my.ilstu.edu/~jhkahn/apastats.html
a)Use this when discussing your summary statistics.
b)When discussing regression coefficients in your text, you can simply refer to the number value, e.g. each additional gram of sugar corresponds to an average increase of 4.7 calories in the Starbuck drink (p<0.05).
D)The paper should be between 2 and 3 pages of narrative text. Your PQ descriptive table and scatterplot must be within these 3 pages. Any narrative text beyond 3 pages will cause you to lose points for going over the page length.
a)You may use a 3rd or 4th page for regression diagnostic plots and other tables that do not fit the main narrative text.
b)Use either Calibri or Times New Roman 11 or 12-point font with 1 line-spacing and two line spaces between paragraphs. Tables should be single spaced.
E)An R script file will be submitted and should contain everything relevant to your project.
a)Include your first and last name, INST 314 Project 4, and the last revised date at the top of the script as comments on separate lines.
b)Be sure to use comments and use of blank lines through your script to make your work easier to read and reproduce.
c)Make sure that your code is in order line by line, so that if I was to run it from line 1 to the end, I do not receive errors because your lines of code are out of order.