Regression Analysis and Polynomial Fit in SPSS

Moon Cycles and Mental Health

1. Tradition states that some people (so-called “lunatics”) are more likely to become mentally unstable at the time of the full moon. To examine whether there was any truth to this tradition, a researcher examined the admission rates to the emergency room of a Virginia mental health clinic before, during and after the 12 full moons from August 1971 to July 1972. The data are given in the file Lunatics.sav, with Time measured in Months from the start.

Regress Admissions on the two variables Time and During.

Use the option to compute the Durbin-Watson statistic. Save the unstandardized residuals.

(a) Show the regression output including the Durbin-Watson result.

(b) What are the null and alternative hypotheses for this test?

(c) What, if anything, can you conclude from the Durbin-Watson test? Quote a relevant number from the Durbin-Watson tables in support of your answer.

(d) Check your answer about correlation by a runs test of randomness for the residuals. What do you conclude from the test?

(Hint: Use Stat > Nonparametric tests > One Sample. For Objective use Test sequence for randomness, and for the Test Fields dialog only include the unstandardized residuals).

2. This question concerns dataset GirlsGrowth.sav. We consider data on the heights (in inches) for a sample of Boston girls aged between 3 and 19 years old.

(a) Fit a six-degree polynomial regression model of Height on Age Age2 Age3 Age4 Age5 Age6.

(where you need to calculate the variables Age2= Age² i.e age squared, Age3= Age³ (age cubed), etc. Save the unstandardized predicted values.

(i) Show the regression output.

(ii) The SPSS regression analysis has omitted some of the powers of Age. What is the name of the problem that has caused this to happen?

(iii) Using Graphs > Legacy dialogs > Scatter/dot > Overlay Scatter (or some other approach) plot the Heights vs Age and Unstandardized Predicted Values vs Age overlaid on one graph.

(iv) The graph suggests the fitted model is not biologically plausible. What is the problem?

(b) Create centered variables cAge= Age-10, cAge2= (Age-10)², …., cAge6= (Age-10)⁶.

Fit a sequence of regression models using the block button: block 1 with cAge, block 2 with cAge2, block 3 with cAge3, …, block 6 with cAge6. Show output. Which is the best polynomial regression model (or models) to use? Quote evidence in support of your answer.

(i) Use Nonlinear regression to fit this model. (Use starting values D=50, b0=3, b1=0.5 and b2=0.1)

Does this model fit better than the polynomial ones in part (b)? Quote evidence.

(ii) Is the term b₂. cAge² significant, or should it be omitted?

(iii) Save the predicted values and residuals from the regression model. Plot residuals vs predicted values. Is there evidence of heteroscedasticity?

(iv) If one were to use weighted regression to improve the regression model, what do you think the weights should be? (Give a formula for the weights: you do not need to try to carry out the weighted regression, as SPSS’s nonlinear regression routine does not allow fitting weighted models).

3. Consider the data HouseSales2017.sav. Investigate the relationship between a response variable SaleMethod and the predictor DaysonMarket. SaleMethod has two types A- Auction and P – Private Treaty (Neg.) (that is, by negotiation).

(a) Fit a binary logistic regression of SaleMethod on DaysonMarket . Show output.

(b) What is the overall percentage of houses correctly classified using DaysonMarket?

(c.) How many days on the market corresponds to a 0.5 probability of the sale method being P – Private Treaty?

4. This question continues to use the data PunishmentandCrime.sav.

(a) Regress Crime on all the predictor variables together. Include the Collinearity Statistics, and save the Deleted Residuals. What evidence is there, of multicollinearity? Mention what criterion you use, for it being a problem.

(b) Perform backwards elimination performed for the variables in (a).

(i) Which model do you think is best, based on adjusted R² and S?

(ii) Which model is best based on the desire to have all coefficients statistically significant?

Get instant help from 5000+ experts for