The distribution of the income variable is not normal as it apparent from the non-zero skew value coupled with a kurtosis value which is significantly different from three. Also, the central tendency measures are not equal, thus indicating the presence of non-normality in the given data. The extent of dispersion of the data is on the lower side as indicated from the various measures of dispersion.
Annual charge ($)
The associated value of R2 with the given model is 0.3979 which signifies that only 39.79% of the alterations in the dependent variable (i.e. annual charged) would be explainable from corresponding changes in independent variable (i.e. income level).
Annual charge ($)
The associated value of R2 with the given model is 0.5668 which signifies that 56.68 % of the alterations in the dependent variable (i.e. annual charged) would be explainable from corresponding changes in independent variable (i.e. household size).
Considering the coefficient of determination is higher for the household size as the independent variable, hence it would be appropriate to draw the conclusion that the superior independent variable is household size.
Computer generated regression output
Annual charge ($)
Based on the output listed above, it is clear that the two significant variables are both significant as is apparent from their p value. Besides, the R2 value is equal to 0.8254 which signifies that 82.54 % of the alterations in the dependent variable (i.e. annual charge) would be explainable from corresponding changes in independent variables. Additionally, the F test for significance of the model also yields the result indicating at the regression model significance.
The relevant regression equation is indicated below.
Annual charge ($)
The various inputs in accordance with the question are summarised below.
Household size
Substituting the above values, we get
Annual charge ($)
In accordance with the regression output and the value of R2, it is apparent that changes in the independent variables i.e. income and household can account for 83% of the dependent variable i.e. annual charge. For boosting the R2 value, it makes sense to include some incremental independent variables.
- Amount of spending by the consumer
- Age of the consumer
- Gender of the consumer
- Underlying economic climate
- Credit terms offered
Activity 1
Data is highlighted in excel sheet.
Activity 2
Histogram
The focus of the first part of the study was on determining the depression index score of healthy individuals across three cities and the descriptive statistics on a city
it is clear that New York tends to have the highest average depression score amongst healthy individuals while the Florida lies at the bottom with North Carolina squeezed in the middle.
The study also has a second part which has been done for those people who have a age in excess of 65 years and also have some chronic illness. The depression score is computed for various participants and the corresponding descriptive statistics is highlighted
The output above clearly reflects on the increase in the mean depression score when compared with healthy individuals which strengthens the belief that age and health are pivotal determinants of depression across all the three cities.
As the mean of more than two variables need to be compared simultaneously, ANOVA may be the preferred option.
Part 1- Healthy individuals
H0: µFlorida = µNew York = µNorth Carolina
H1: The mean depression scores across all three given cities is not the same and therefore at a minimum one of the means would be different.
Excel Output - ANOVA
The critical value of F statistic from the above excel comes out to be 3.159. However, the computed value of F statistic is greater than the critical value which implies that null hypothesis would be rejected while the alternative one would be accepted. Hence, it would be fair to conclude that there is statistically significant difference between the mean scores for depression in the three cities.
Part 2- Aged & Unhealthy individuals
H0: µFlorida = µNew York = µNorth Carolina
H1: The mean depression scores across all three given cities is not the same and therefore at a minimum one of the means would be different.
Excel Output - ANOVA
The critical value of F statistic from the above excel comes out to be 3.159. However, the computed value of F statistic is lower than the critical value which implies that null hypothesis would not be rejected while the alternative one would not be accepted. Hence, it would be fair to conclude that there is no statistically significant difference between the mean scores for depression in the three cities
The hypothesis testing indicated above is indicative that for healthy individuals the differences in city matters as the prevalence of depression is significantly different. However, this is not the case with participants those who are suffering from chronic ailments and aged and despite differences in cities, the prevalence of depression tends to increase as has been represented in the descriptive statistics and also the ANOVA testing.