Stats: Estimation - Hypothesis

Solutions to Multiple Statistical Problems

Answered

Question 1

Question-1:(6 points) Suppose You have a population following a Normal distribution with population mean Âµ and population variance ?2 . Assume Âµ is unknown and you are interested in estimating it. You decide to randomly draw 5 observations (X1, X2, ..., X5). You havenâ€™t observed them yet, so for now we treat them as variables. Someone proposes these following two estimators for Âµ

T1 = 15(X1 + X2 + ... + X5)
and
T2 = 0.1X1 + 0.2X2 + 0.4X3 + 0.2X4 + 0.1X5

a) [2 points] Using separate calculations, check whether T1 and T2 are unbiased estimator of Âµ or not? If not, calculate the Bias.

b) [2 points] By calculating the variance for each of these estimators calculate their corresponding Mean Squared Errors (MSE).

c) [2 points] If you were to calculate two separate 95% confidence intervals (CI) for Âµ, one based on T1 and the other based on T2, which CI would be wider and why? (no calculation needed) (Show detailed calculation in parts (a) and (b))

Question-2:(7 points) Suppose in a city of 50 households, 5 households are selected randomly without replacement. The number of cars in the sampled households are

## 3 4 2 3 2

The mean and variance of these five numbers are 2.8 and 0.7 respectively.

a) [3 points] Estimate the average number of cars per households in the city and estimate the variance of the estimator.

b) [2 points] Construct a 95% confidence interval for the average number of cars per household.

c) [2 points] Suppose the population variance ?2 is known to be 3. How many households should be collected if you want the estimated average to be within 1 car of the true value with 95% confidence.

(Show detailed calculation in all parts and round your answers to parts (a) and (b) to 2 decimal places)

Question-3:(7 points) In a study, a group of researchers wanted to study whether a one-time fluoride treatment prevents cavities. The variable of interest was DMFS increment (the number of Decay, Missing and Filled Surfaces). The table below gives the summary data.

Does the data provide evidence that the fluoride treatment is beneficial?

a) [5 points] By formulating appropriate null and alternative hypothesis, Conduct a test of your hypothesis. Show all your steps including the test statistic, distribution used, parameters involved and appropriate rejection region.

b) [2 points] You are told that the 95% confidence interval for the mean difference (Fluoride-Control) is (-7.03, 1.36). Interpret the confidence interval in the context of this problem.

Question-4:(6 points) Suppose we have a population with 5 clusters. We randomly select 2 of them. The following two lines of data corresponds to the secondary units in each selected cluster.

## 5 4 4 6
## 3 4 4 4 7

a) [3 points] Estimate the population total and the variance of your estimator of the population total.

b) [2 points] Suppose there are in total 50 secondary units in the population. Give an estimate of the mean per secondary unit and its variance.

c) [1 point] Briefly explain how systematic sampling can be seen as an example of cluster sampling.

calculation aid: variance of 19 and 22 is 4.5

(Show detailed calculation in parts (a) and (b). Round your answers to 2 decimal places.)

Question-5:(7 points) Suppose in a population we have 15 primary units. Each primary unit consists 5 secondary units. First, two primary units are selected randomly. Second, two secondary units are randomly selected from each of the selected primary units. The observed values from the first primary unit are 10 and 20. And the observed values from the second primary unit are 15 and 25.

a) [2 points] Estimate the population total.

b) [3 point] Estimate the variance of the estimator you used in part (a).

c) [2 points] Using your answers from parts (a) and (b) estimate the population mean per secondary unit and and its variance.

(Show detailed work in all parts).

Question-6:(7 points) Suppose we have a dataset where the response variable is a studentâ€™s score in math. We want to see if there is any effect of socio economic status [ses] (â€œlowâ€, â€œmiddleâ€ and â€œhighâ€) on math score. The data set also includes a variable that defines the program [prog] that a student is enrolled in (â€œgeneralâ€, â€œacademicâ€ and â€œvocationâ€).

Here is the summary of the regression of math score on ses and prog.

a) [2 points] Interpret the estimated coefficients of the model and comment on their significance.

b) [1 point] If you are given the chance to fit another model, what other terms you will include in the model and why?

c) [3 points] Complete the following anova table (write the complete table in your answer sheet) and comment on the effect of ses on math score.

d) [1 point] What type of study(observational vs experiment) produced this data? Briefly justify your answer.

(Show detailed calculation and reasoning in part(c). You do not need to round any of your answers.)

Get instant help from 5000+ experts for