In this assignment, you will characterize three variables in the ISLR College data set to determine whether the normal approximation is appropriate for that variable. You will need to load the ISLR package as well as MASS and stats.
1.Assess the distribution of the three variables (Accept, Grad.Rate, and PhD) and determine if the normal approximation is appropriate for their analysis.
a.Plot a histogram for each variable. Explain what these show you about the variables.
b.Plot a box and whisker plot for each variable. Explain what these show you about the variables.
c.Calculate the central tendencies of mean, median, and mode (s) for each of the variables. Explain what these show you for each variable.
d.Produce a q-q plot for each variable. Explain what these show you about the variables.
e.Calculate the Shapiro-Wilk statistic for each variable. Explain what this value tells you and interpret it for each variable.
f.For each variable, state whether or not a normal assumption is appropriate, and justify this choice based on your central tendencies, histogram, q-q plot, Shapiro-Wilk statistics, and any other values or plots that you find useful.
2.Characterize the variable Grad.Rate using a normal approximation.
a.Calculate the mean. Explain what this tells you.
b.Calculate the standard deviation and variance. Explain what these tell you.
c.Calculate the skew and kurtosis. Explain what these tell you.
d.Calculate the 95% confidence interval for the mean. Explain what this tells you.
3.Characterize the variable PhD using generalized measures.
a.Calculate the median. Explain what this tells you.
b.Calculate the median absolute deviation. Explain what this tells you.
c.Calculate the inter quartile range. Explain what this tells you.
4.Perform a normal transformation for the variable Accept and, using this transformation, characterize the variable Accept.
a.Create a boxcox plot for Accept. Explain what this plot tells you.
b.State the data transformation that you will use and create a new variable called AcceptTrans that applies the transformation.
c.Assess whether a normal approximation is appropriate for the new AcceptTrans variable. Include all plots and calculations used to assess normality and explain your reasoning.
d.Find the mean for the AcceptTrans variable. Explain what this tells you.
e.Use an inverse function to give the transformed mean for Accept. Explain what this tells you.
f.Find a 95% confidence interval for the transformed mean (you will first need the C.I. for AcceptTrans). Explain what this tells you.
5.In a single paragraph, explain why it was necessary to use different methods to characterize each of the three variables.