Statistical Methods: Sampling & Experiment Questions.

Statistical Methods - Questions and Solutions

This question relates to stratified sampling

You should be familiar with this following dataset from your assignment 2.

Copy these following three lines to R. Make sure you are connected to the internet while running these lines. It will load the “crabs” dataset to your working environment.

if (!require("glmbb")){install.packages("glmbb")}

Loading required package: glmbb

Warning: package 'glmbb' was built under R version 4.0.5

library(glmbb)

data(crabs)

Here is a little summary of the dataset. https://rdrr.io/cran/glmbb/man/crabs.html

Treat this dataset as our population. For this question we will use “satell” as our response variable(Y) and “weight” as an auxiliary variable(X) and “color” as the variable to define strata.

You will find four different colors: dark, darker, light and medium. Treat these as four different strata four your analysis.

We are still interested in the average of the variable “satell” which is the number of satellites around a female crab.

a. Write a function in R that

randomly selects 20 observations from this dataset using stratified sampling (allocating sample size proportional to the number of items in each strata)
calculates the sample mean using equation 11.1 (page 144) of the text book. Let’s call this “y_bar_st” in R.
calculates the sample mean again in the following way: in part (ii) you calculated sample means for each of the four strata (y¯1, y¯2, y¯3, y¯4) which you then plugged in equation 11.1 Instead of calculating sample means, for each of the strata, calculate the sample mean using ratio estimator (equation 7.3, page 94 of the text book). So you should have four different strata specific means. Plug them in equation 11.1 and calculate the sample mean again. Let’s call it “y_bar_st_ratio”.

b. Replicate this function 1000 (or more) times and compare the performance of the two estimators: “y_bar_st” and “y_bar_st_ratio”.

c. What will be your recommendation based on your finding.

This question relates to cluster and multistage sampling

We will continue to use the “crabs” dataset from question 1 as our population

By calculating the means and variances of the variable “satell” for different categories of the “color” variable, comment why or why not observations coming from different colors can be treated as different clusters.
Let’s use “color” to define cluster irrespective of your answer to part(a). By selectingtwo clusters randomly,

estimate the population total.
construct a 95% confidence interval for the population total.

[not related to par(b)] Estimate the population mean of “satell” by mimicking the idea of multistage sampling: first, select two clusters randomly and second, select 10 observations from each of your selected clusters randomly.

This question relates to single factor experiment

Suppose in an experiment, the effect of the amount of baking powder in a biscuit dough upon the rise heights of the biscuits is of interest. Four levels of of baking powder were tested and four replicate biscuits were made with each level in random order. The data are given below.

0.25 tsp	0.5 tsp	0.75 tsp	1 tsp
11.4	27.8	47.6	61.6
11.0	29.2	47.0	62.4
11.3	26.8	47.3	63.0
9.5	26.0	45.5	63.9

Perform an analysis to test the hypothesis of no treatment effect. Make sure to clearly define your null and alternative hypothesis. What conclusion do you make?
By formulating a contrast, check the equality of means for category 0.5 tsp and 1 tsp. What conclusion do you make?
By producing residual plots, comment on whether the assumptions of the linear model are justified.

Get instant help from 5000+ experts for