A local health clinic sent fliers to its clients to encourage everyone, but especially older persons at high risk of complications, to get a flu shot in time for protection against an expected flu epidemic. In a pilot follow-up study, 159 clients were randomly selected and asked whether they actually received a flu shot. A client who received a flu shot was coded Y = 1, and a client who did not receive a flu shot was coded Y = 0. In addition, data were collected on their age (X1) and their health awareness. The latter data were combined into a health awareness index (X2), for which higher values indicate greater awareness. Also included in the data was client gender, where males were coded X3 = 1 and females were coded X3 = 0. Read in the data within R software. The dataset is a textfile posted on Nexus within the “Assignments” tab. You can use the following R code to read in the data; be sure to change the working directory! Answer the following questions.
#Change working directory to the one containing dataset.
setwd("C:/Users/melody/Dropbox/Google Drive/STAT-3701/Assignments")
flu.dat= read.delim("flu-shots.txt", sep="\t", header=FALSE)
#None of the variables are labelled in the original file.
head(flu.dat) #This displays the first six observations of the dataset
flu
#Convert dataframe into a matrix so I can give column titles.
dimnames(flu)
head(flu)
(a) Write down the statistical model for fitting a multiple logistic regression model with the three predictor variables, X1, X2, X3.
(b) Write down the estimated statistical model in part (a). Note: Use the glm function in R; see Chapter 12 R example(s).
(c) Exponentiate each of the slope parameter estimates given in the estimated model of part (b). That is, find exp(βˆ
1), exp(βˆ
2), exp(βˆ
3). Interpret these numbers.
(d) What is the estimated probability that male clients aged 55 with a health awareness index of 60 will receive a flu shot? You can do this part by hand calculation based on the fitted model in part (b).
(e) Use the Wald test to determine whether client gender can be dropped from the regression model; use α = 0.05. State the alternatives, decision rule, and conclusion.
What the approximated P-value of the test?
(f) Use the likelihood ratio test to determine whether client gender can be dropped from the regression model; use α = 0.05. State the alternatives, decision rule, and conclusion. What the approximated P-value of the test? How does the result here compare to that obtained for the Wald test in part (a)?
(g) In R, create a new variable X4 = X1×X2 and fit a multiple logistic regression model with variables X1, X2, X3, X4. Write down the fitted model. Use the following R code after the R code above to create the new variables and a new dataset.
#create the age by health interaction term
age.health
age2
health2
flu2
head(flu2)
(h) Is there any evidence to suggest there is a multiplicative interaction between age and health awareness index? Test at α = 0.05. Be sure to state the null and alternative hypotheses. What is the value of the test statistic and what is it’s approximate P-value.
(i) Now fit a multiple logistic regression model with variables X1, X2, X3, X4 and the square of age, call it X5 and the square of health awareness index, call this one X6. Use the likelihood ratio test to determine whether the X4, X5, X6 can be simultaneously dropped from the model; use α = 0.05. State the null and alternative hypotheses, full and reduced models, decision rule, and conclusion. What is the approximate P-value of the test?