Statistics Course Assignments

Answered

Assignment 1: Boring's Histogram of Psychology Journals Expansion

1. Here are the data for Boringâ€™s histogram of the expansion of psychology journals between 1890 and 1940 (data = journal volumes):

Use R to make a â€œbarplotâ€ of these data. Be sure to include a main title, a label on the x-axis,Â and a label on the y-axis. Color the bars steel blue. Â Include your code with your assignment.

2. Using R, enter the following two datasets into two variables:

Use R to create a scatterplot of the two variables.Â

Add a linear trendline. Add a lowess line.Â

What does the difference between the two lines tell you about the relationship between theÂ two variables?

3. a) Go to the Google Ngram Viewer. Set the date Â range to 1940-2008, and simultaneously search the phrases psychological science, behavioral science, cognitive science, neuroscience (no â€œuâ€ in â€œbehavioralâ€). Copy and paste the resulting graph to your assignment. Â

b) Describe the general pattern of relations between these phrases (e.g., Which phrases lead over which date ranges? When does each rise markedly from 0? What is the pattern of the successive peaks in the curves? Is there anything else of interest that you see?). Â

c) When faced with a narrow but compelling dataset, it is tempting to over-interpret oneâ€™s finding. To what issues do you think one should be attentive here in order to avoid drawingÂ overly broad conclusions? (add â€œscienceâ€ to the terms you are searchingâ€) Can you addÂ an example or two to your graph to illustrate your points here?

d) Change the corpus to English Fiction, and change the dates to 1790-2008. Search the terms phrenology, hypnotism, mesmerism. Describe what you see (as in b above). Describe some ways in which you could change the search string to improve the search.

4. Get the NeopiIQ (fake) dataset that I showed you in class. (If the first column numbers the participants from 1 to 100, this can be removed by the R command NeopiIQ <- NeopiIQ[-1].)Â Column 1 gives each participantâ€™s sex. Columns 2 through 6 give scores on the â€œBig Fiveâ€Â personality factors: neuroticism, extraversion, openness to experience, agreeableness, andÂ conscientiousness. Columns 7 and 8 are two IQ scores. Columns 9 and 10 give each participantâ€™sÂ height and weight. Column 11 gives each participantâ€™s interest in sports on a scale of 1-10. Â

a) Use the â€œdescribeâ€ function in the â€œpsychâ€ package of R to get summary statistics for the entire NeopiIQ dataset. Use the switch in â€œdescribeâ€ to includes the IQR of each variable (seeÂ help page for â€œdescribeâ€). Include the code and the output in your assignment. Â

Assignment 2: Scatterplot of Datasets with Trendlines

b) What noteworthy differences do you notice among the distributions by examining theÂ output closely? Do any stand out from the others? Â

c) Use the â€œsubsetâ€ function in R to separate the menâ€™s and womenâ€™s sports data into two new variables called â€œmsportsâ€ and â€œfsportsâ€ (you may have to look up how to use â€œsubsetâ€Â in R help or on the internet). Include your code in your assignment. Â

d) Using these two new variables, create side-by-side notched boxplots of the menâ€™s and womenâ€™s sports data. Color them blueviolet. Include the code and your plots in yourÂ assignment (you may have to zoom the to make them legible). Â

e) Examine your plots and interpret them (you may have to do a little research to understand what the notches mean and why one of them is slightly â€œfoldedâ€.) Describe the shape of eachÂ distribution. How do they compare to each other? Does there appear to be a gender difference?Â What general considerations might you be cautious about drawing this conclusion? Â

f) There is an easier way to make side-by-side plots of this kind. Use the R command: boxplot(sports ~ sex, data=NeopiIQ). Adapt that method to create side-by-side notchedÂ boxplots of the weight data, one for each sex. Color the plots â€œmediumorchid.â€ Â

g) Describe the shapes of these two distributions and compare them to each other.

5. The following statistical graphic came from USA Today. It displays the results of a public opinion poll on what forms of payment Americans prefer for public roads. Evaluate theÂ quality of the visualization, noting in particular the good (if any) and bad (if any) aspects ofÂ it that we discussed in class (from a statisticianâ€™s perspective).

6. This is a tough one. Be careful. In R, you can obtain the binomial probability of a given number of successes with the function â€œdbinomâ€. This function takes three arguments: the first gives the number of successes, the second gives the total number of events (successes + failures), and the third gives the probability of success on any single event. So, for instanceÂ dbinom(2, size=10, prob=0.5) gives the probability of getting exactly two â€œheadsâ€ on 10Â flips of a fair coin (0.04394531).Â

Now, imagine a chess player who is rated better than only one-third of the other players in her division. She plays 15 matches against randomly selected opponents in her division and wins 10 of them. Using dbinom to conduct a one-tailed null hypothesis test (with ?=.05). OnÂ the basis of your result, do you think that we should we adjust her rating upward? JustifyÂ your answer. Include your R code and result. Â

Get instant help from 5000+ experts for