Overview
The assignment is an essay on the analysis and consequent findings of the “Body (2).csv”, dataset. The file constitutes of data on body measurements of 252 men and women. The key variable of interest is body fat, measured by means of the Brozek method. The calculation for the task was done using RStudio.
Task 1
To determine whether the mean body fat percentage for males is same as that of females or whether they are different. To test the conjecture that the body fat, as measured using Brozek method, is different for males than females or not, the contesting statistical hypotheses can be written as follows:
H0: The average body fat of men equals the average body fat of females (Null Hypothesis)
H1: The average body fat of men does not equal the average body fat of females (Alternative hypothesis)
Then taking the average of the variable “BFP_Brozek” as the variable of interest, the mean body fat for men and women are computed separately. Then applying t-test for independent samples as the body fat for men is not dependent on body fat of women, with alternative being selected as two-sided, the p-value was found to be 0.46. The p-value was thus found to be greater than the assumed level of significance, that is, 0.05. Thus the test failed to reject the null hypothesis at 5% level of significance. Thus it is inferred that there is no difference between the Brozek body fat for men and women.
Task 2
The 99% confidence interval estimate of the average body fat percentage is to be computed for the population. The variable of interest is thus “BFP_Brozek”. The primary assumption, in the computation of the confidence interval is that the variable of interest follows as normal probability distribution. To test the validity of the normality condition, the Shapiro Wilk test of normality was employed and the p-value was found to be greater than 0.05 and hence the test failed to reject the null hypothesis of the test, that is, the variable follows normal distribution. Hence the variable, body fat percentage, satisfies the condition of normality.
Subsequently, the 99% confidence interval was computed. The confidence interval foe body fat percentage computed using the form: (mean + error margin, mean – error margin) , where error margin is the product of standard error of body fat and the 99th quantile of the t-distribution with 251 degrees of freedom. The 99% confidence interval was hence found to be (17.79534, 20.08165).
Task 3
It is to be verified whether the average body fat percentage of the men and women taken together is lesser than 12.5 or not. The statistic of interest is then the average body fat percentage computed using Brozek’s equation which is represented by the variable “BFP_Brozek”. Then in order to test for validity of the conjecture that the body fat percentage, as measured using Brozek method, less than 12.5, the contesting statistical hypotheses can be written as follows:
H0: The average body fat percentage is equal to 12.5 (Null Hypothesis)
H1: The average body fat percentage is less than 12.5 (Alternative hypothesis)
Then applying one sample t-test for body fat percentage the p-value was found to be greater than the assumed level of significance 0.05. Thus the test failed to reject the null hypothesis at 5% level of significance. Thus it is inferred that the percentage body fat is not less than 12.5.
Task 4
The best prediction model to model the body fat percentage based on the body circumference parameters is required. A subset of the dataset was created, containing the response variable “BFP_Brozek” and body circumference parameters or tentative predictor variables, “Neck”, “Chest”, “Abdomen”, “Hip”, “Thigh”, “Knee”, “Ankle”, “Biceps”, “Forearm”, “Wrist”. Since, body fat was determined to be following normal distribution, a linear model was fit to the data. Not all the predictors were found to be significant. Hence by comparing AIC values the best model was determined. Thus using stepwise regression method, with both forward and backward selection, the final model with least AIC. The predictors that were finalised were Neck, Abdomen, Hip, Forearm, and Wrist or rather neck circumference and abdomen circumference, hip circumference and forearm circumference. The model fitted using these predictors was found to be significant at 0.05 level of significance since p-value was less than 0.05. None of the coefficients for the predictors was found to be insignificant. The model is specified as follows:
Body fat = 3.4768 – 0.55623 Neck circumference + 0.90155 circumference – 0.307 Hip circumference + 0.38761 Forearm circumference – 1.496 Wrist circumference
The adjusted R square for the model was found to be 0.7258 which means that the predictors in the model explain explains about 72% of the variation in the response variable which is body fat as computed using Brozek’s equation.