Statistics Assignment: SAT Data and Insurance Data

SAT Data Analysis

1. Refer to the Excel “SAT” data posted in eLearning. (10)

a. What is the mean, median, and 80^th percentile SAT score amongst all students in the sample? (you might refer to our Descriptive Statistics Chapter Slides on calculating the percentile). Create a histogram of the students based on “bin” widths 200 units wide starting with 1001-1200.

b. What is the mean, median, and 80^th percentile SAT score amongst Ampipe students in the sample? Create a histogram of the students based on “bin” widths 200 units wide starting with 1001-1200.

c. What is the mean, median, and 80^th percentile SAT score amongst Walnut Heights students in the sample? Create a histogram of the students based on “bin” widths 200 units wide starting with 1001-1200.

d. What is the probability that a randomly selected Ampipe student has an SAT score over 1400? What is the probability that a randomly selected Walnut Heights student has an SAT score over 1400? Based on your calculations, comment on whether SAT score is statistically independent of school.

2. Refer to the Excel “Insurance” data posted in eLearning. (12)

a. What is the expected claim payment made per customer in a year?

b. What is the probability that a randomly selected customer receives exactly the expected claim payment you calculated above in “a”?

c. What is the (sample) standard deviation of the claim payments made per customer in a year?

d. If the insurance company charges a $2000 premium to each customer, each year, then what amount does the insurance company make/lose per customer, each year? In a sentence or two, explain why the customers might be willing to pay the $2000 premium, given your calculations.

e. The critical value (or “z value”) for statistical significance at the 10% level is 1.645 (remember?!). Thus, I’d like you to create a 90% confidence interval by taking:

Expected Claim Payment + 1.645(standard deviation of claim payment)

Now that I have the interval, I’ll know that a randomly selected insurance customer will have a 90% chance of receiving claim payments within this confidence interval.

Do you agree with the above, italicized, analysis? In a sentence, explain why or why not.

3. Refer to the Excel “Administrator” data posted on eLearning: (22)

Do NOT use the “finite population” corrections for standard deviation calculation in this problem. Just assume we’re dealing with an infinite population.

a. Provide the formal null and alternative hypotheses for a hypothesis test of the question of whether or not the mean administrator salary is really $84,000.

Assume that the underlying population of administrator salaries is normally distributed.

b. Conduct this hypothesis test based on the 95% significance level. What is the p-value? Do you reject or fail to reject your null hypothesis? Why?

c. Create a 90% confidence interval for the mean administrator salary. Explain in a sentence, in layman’s terms, what this confidence interval implies.

d. Compare your hypothesis test in part “b” to your confidence interval in part “c”. Do these results conflict with each other? (i.e., is it OK to find these two answers simultaneously?...if so, why?...OR, is there a conflict where it should be impossible for these results to occur simultaneously?)

e. If I told you that the underlying population of administrator salaries was NOT normally distributed, how would this change your response to parts “b” and “c”?

f. If I told you that the underlying population of administrator salaries was NOT normally distributed, BUT, your final calculations (p-value for “b”, confidence interval for “c”) were based on double the sample size (a sample of 40 administrators, rather than 20), would this change your response to part “e”? If so, why? If not, why not?

You may ignore the above questions and return to the raw “Administrator” data.

g. The university system boasts that over 60% of its administrators have master’s degrees. Provide the formal null and alternative hypotheses for a test of this question. Is this a one or two-tailed hypothesis?

h. Do you have a large enough sample to conduct the correct hypothesis test? Explain with a calculation why or why not.

i. If your answer to part “h” is yes, then conduct this hypothesis test based on the 95% significance level. Do you reject or fail to reject your null hypothesis? Why?

Answer