1. Here are the data for Boringâs histogram of the expansion of psychology journals between 1890 and 1940 (data = journal volumes):
Use R to make a âbarplotâ of these data. Be sure to include a main title, a label on the x-axis, and a label on the y-axis. Color the bars steel blue.  Include your code with your assignment.
2. Using R, enter the following two datasets into two variables:
Use R to create a scatterplot of the two variables.Â
Â
Add a linear trendline. Add a lowess line.Â
Â
What does the difference between the two lines tell you about the relationship between the two variables?
Â
3. a) Go to the Google Ngram Viewer. Set the date  range to 1940-2008, and simultaneously search the phrases psychological science, behavioral science, cognitive science, neuroscience (no âuâ in âbehavioralâ). Copy and paste the resulting graph to your assignment. Â
Â
b) Describe the general pattern of relations between these phrases (e.g., Which phrases lead over which date ranges? When does each rise markedly from 0? What is the pattern of the successive peaks in the curves? Is there anything else of interest that you see?). Â
Â
c) When faced with a narrow but compelling dataset, it is tempting to over-interpret oneâs finding. To what issues do you think one should be attentive here in order to avoid drawing overly broad conclusions? (add âscienceâ to the terms you are searchingâ) Can you add an example or two to your graph to illustrate your points here?
Â
d) Change the corpus to English Fiction, and change the dates to 1790-2008. Search the terms phrenology, hypnotism, mesmerism. Describe what you see (as in b above). Describe some ways in which you could change the search string to improve the search.
Â
4. Get the NeopiIQ (fake) dataset that I showed you in class. (If the first column numbers the participants from 1 to 100, this can be removed by the R command NeopiIQ <- NeopiIQ[-1].) Column 1 gives each participantâs sex. Columns 2 through 6 give scores on the âBig Fiveâ personality factors: neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness. Columns 7 and 8 are two IQ scores. Columns 9 and 10 give each participantâs height and weight. Column 11 gives each participantâs interest in sports on a scale of 1-10. Â
Â
a) Use the âdescribeâ function in the âpsychâ package of R to get summary statistics for the entire NeopiIQ dataset. Use the switch in âdescribeâ to includes the IQR of each variable (see help page for âdescribeâ). Include the code and the output in your assignment. Â
Â
b) What noteworthy differences do you notice among the distributions by examining the output closely? Do any stand out from the others? Â
Â
c) Use the âsubsetâ function in R to separate the menâs and womenâs sports data into two new variables called âmsportsâ and âfsportsâ (you may have to look up how to use âsubsetâ in R help or on the internet). Include your code in your assignment. Â
Â
d) Using these two new variables, create side-by-side notched boxplots of the menâs and womenâs sports data. Color them blueviolet. Include the code and your plots in your assignment (you may have to zoom the to make them legible). Â
Â
e) Examine your plots and interpret them (you may have to do a little research to understand what the notches mean and why one of them is slightly âfoldedâ.) Describe the shape of each distribution. How do they compare to each other? Does there appear to be a gender difference? What general considerations might you be cautious about drawing this conclusion? Â
Â
f) There is an easier way to make side-by-side plots of this kind. Use the R command: boxplot(sports ~ sex, data=NeopiIQ). Adapt that method to create side-by-side notched boxplots of the weight data, one for each sex. Color the plots âmediumorchid.â Â
Â
g) Describe the shapes of these two distributions and compare them to each other.
Â
5. The following statistical graphic came from USA Today. It displays the results of a public opinion poll on what forms of payment Americans prefer for public roads. Evaluate the quality of the visualization, noting in particular the good (if any) and bad (if any) aspects of it that we discussed in class (from a statisticianâs perspective).
6. This is a tough one. Be careful. In R, you can obtain the binomial probability of a given number of successes with the function âdbinomâ. This function takes three arguments: the first gives the number of successes, the second gives the total number of events (successes + failures), and the third gives the probability of success on any single event. So, for instance dbinom(2, size=10, prob=0.5) gives the probability of getting exactly two âheadsâ on 10 flips of a fair coin (0.04394531).Â
Â
Now, imagine a chess player who is rated better than only one-third of the other players in her division. She plays 15 matches against randomly selected opponents in her division and wins 10 of them. Using dbinom to conduct a one-tailed null hypothesis test (with ?=.05). On the basis of your result, do you think that we should we adjust her rating upward? Justify your answer. Include your R code and result. Â