In this assignment, we compute the consumer characteristics to predict the amount charged by the users of credit card. The data for Consumer information is given below: 
Income ($1000s) 
Household Size 
Amount Charged ($) 
Income ($1000s) 
Household Size 
Amount Charged ($) 
54 
3 
4016 
54 
6 
5573 
30 
2 
3159 
30 
1 
2583 
32 
4 
5100 
48 
2 
3866 
50 
5 
4742 
34 
5 
3586 
31 
2 
1864 
67 
4 
5037 
55 
2 
4070 
50 
2 
3605 
37 
1 
2731 
67 
5 
5345 
40 
2 
3348 
55 
6 
5370 
66 
4 
4764 
52 
2 
3890 
51 
3 
4110 
62 
3 
4705 
25 
3 
4208 
64 
2 
4157 
48 
4 
4219 
22 
3 
3579 
27 
1 
2477 
29 
4 
3890 
33 
2 
2514 
39 
2 
2972 
65 
3 
4214 
35 
1 
3121 
63 
4 
4965 
39 
4 
4183 
42 
6 
4412 
54 
3 
3720 
21 
2 
2448 
23 
6 
4127 
44 
1 
2995 
27 
2 
2921 
37 
5 
4171 
26 
7 
4603 
62 
6 
5678 
61 
2 
4273 
21 
3 
3623 
30 
2 
3067 
55 
7 
5301 
22 
4 
3074 
42 
2 
3020 
46 
5 
4820 
41 
7 
4828 
66 
4 
5149 
The data comprises of household size, annual income and annual charges of credit card for a sample of 50 consumers. Now we move on to the analysis part: 
Descriptive statistics 
Income ($1000s) 
Household Size 
Amount Charged ($) 
Mean 
43.48 
3.42 
3963.86 
Standard Error 
2.057785614 
0.245930138 
132.023387 
Median 
42 
3 
4090 
Mode 
54 
2 
3890 
Standard Deviation 
14.55074162 
1.738988681 
933.5463219 
Sample Variance 
211.7240816 
3.024081633 
871508.7351 
Kurtosis 
1.247719422 
0.722808552 
0.742482171 
Skewness 
0.095855639 
0.527895977 
0.128860064 
Range 
46 
6 
3814 
Minimum 
21 
1 
1864 
Maximum 
67 
7 
5678 
Sum 
2174 
171 
198193 
Count 
50 
50 
50 
Largest(1) 
67 
7 
5678 
Smallest(1) 
21 
1 
1864 
Confidence Level (95.0%) 
4.135274935 
0.494215106 
265.3109241 
The equation for credit card charges can be given as: 
Y_{t} = βX_{t} + u_{i} ......Eq(1)
Here Y_{t }is our dependent variable which is annual charges on credit card and X_{t} is our independent variable which is annual income ($1000s). The regression results are given below: 
SUMMARY OUTPUT 








Regression Statistics 








Multiple R 
0.630781 







R Square 
0.397884 







Adjusted R Square 
0.38534 







Standard Error 
731.9025 







Observations 
50 







ANOVA 









df 
SS 
MS 
F 
Significance F 



Regression 
1 
16991229 
16991229 
31.71892 
9.1E07 



Residual 
48 
25712699 
535681.2 





Total 
49 
42703928 







Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 
Intercept 
2204.241 
329.134 
6.697091 
0.00 
1542.472 
2866.009 
1542.472207 
2866.0088 
Income ($1000s) 
40.46963 
7.185716 
5.631955 
0.00 
26.02178 
54.91748 
26.02177931 
54.917479 
From the regression results, we can say that 38.5% of the variation in annual charges on credit card is explained by the variable annual income (Adjusted R^{2}). The coefficients imply that if there is $1000 (1 unit of the variable annual income) increase in annual income, then there is an increase of 40.47 units in annual credit card charges.
Another equation for credit card charges can be given as: 
Y_{t} = βZ_{t} + u_{i} ......Eq(2)
Here Y_{t }is our dependent variable which is annual charges on credit card and X_{t} is our independent variable which is household size. The regression results are given below:
SUMMARY OUTPUT 








Regression Statistics 








Multiple R 
0.752854 







R Square 
0.566789 







Adjusted R Square 
0.557764 







Standard Error 
620.8163 







Observations 
50 
















ANOVA 









df 
SS 
MS 
F 
Significance F 



Regression 
1 
24204112 
24204112 
62.80048 
2.86E10 



Residual 
48 
18499816 
385412.8 





Total 
49 
42703928 







Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 
Intercept 
2581.644 
195.2699 
13.2209 
0.00 
2189.028 
2974.26 
2189.027669 
2974.2605 
Household Size 
404.1567 
50.99978 
7.924676 
0.00 
301.6148 
506.6986 
301.6147764 
506.69863 
From the regression results, we can say that 55.8% of the variation in annual charges on credit card is explained by the variable household size (Adjusted R^{2}). The coefficients imply that if there is 1 unit increase in the number of household members, then there is an increase of 404.2 units in annual credit card charges.
After viewing the above two variables, we can say that household size is better than annual income in predicting annual credit card charges.
Y_{t} = β_{1}X_{t} + β_{2}Z_{t} + u_{i} .....Eq(3)
The regression results are given below:
SUMMARY OUTPUT 








Regression Statistics 








Multiple R 
0.908502 







R Square 
0.825376 







Adjusted R Square 
0.817945 







Standard Error 
398.3249 







Observations 
50 







ANOVA 









df 
SS 
MS 
F 
Significance F 



Regression 
2 
35246779 
17623389 
111.0745 
1.55E18 



Residual 
47 
7457149 
158662.8 





Total 
49 
42703928 







Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 
Intercept 
1305.034 
197.771 
6.598712 
0.00 
907.17 
1702.898 
907.17 
1702.898 
Income ($1000s) 
33.12196 
3.970237 
8.342563 
0.00 
25.13487 
41.10904 
25.13487 
41.10904 
Household Size 
356.3402 
33.2204 
10.72655 
0.00 
289.5094 
423.171 
289.5094 
423.171 
From the regression results, we can say that 81.8% of the variation in annual charges on credit card is explained by the variables household size and annual income (Adjusted R^{2}). The coefficients imply that if there is 1 unit increase in the number of household members, then there is an increase of 356.34 units in annual credit card charges whereas if there is $1000 (1 unit of the variable annual income) increase in annual income, then there is an increase of 33.12 units in annual credit card charges.
Hence the fitted regression equation can be given as: 
Y_{t} = 33.12X_{t} + 356.34Z_{t}
Y_{t} = 33.12*40 + 356.34*3 = $2393.82.
The descriptive statistics of the variables are given below: 
Descriptive Statistics 
HI001 FINAL EXAM 
HI001 ASSIGNMENT 01 
HI001 ASSIGNMENT 02 
Mean 
31.90909091 
17.34343434 
15.50505051 
Standard Error 
0.700162085 
0.237298066 
0.23564704 
Median 
32 
17 
16 
Mode 
29 
18 
17 
Standard Deviation 
6.966524782 
2.361085949 
2.344658442 
Sample Variance 
48.53246753 
5.57472686 
5.497423212 
Kurtosis 
7.67534849 
10.3018632 
0.698973651 
Skewness 
1.753036803 
0.803185137 
0.464616962 
Range 
50 
22 
13 
Minimum 
0 
8 
8 
Maximum 
50 
30 
21 
Sum 
3159 
1717 
1535 
Count 
99 
99 
99 
Largest(1) 
50 
30 
21 
Smallest(1) 
0 
8 
8 
Confidence Level(95.0%) 
1.389448835 
0.470910278 
0.467633869 
Descriptive Statistics 
HI003 FINAL EXAM 
HI003 ASSIGNMENT 01 
HI003 ASSIGNMENT 02 
Mean 
26.23232323 
18.31313131 
13.60606061 
Standard Error 
0.861918907 
0.408537639 
0.187651228 
Median 
25 
19 
13 
Mode 
25 
20 
13 
Standard Deviation 
8.57598484 
4.064898183 
1.867106141 
Sample Variance 
73.54751598 
16.52339724 
3.486085343 
Kurtosis 
0.474751131 
1.51303057 
3.505251459 
Skewness 
0.027305979 
0.236180187 
1.121313851 
Range 
46 
20 
12 
Minimum 
4 
10 
8 
Maximum 
50 
30 
20 
Sum 
2597 
1813 
1347 
Count 
99 
99 
99 
Largest(1) 
50 
30 
20 
Smallest(1) 
4 
10 
8 
Confidence Level(95.0%) 
1.710449975 
0.810729628 
0.372387745 
Descriptive Statistics 
HI002 FINAL EXAM 
HI002 ASSIGNMENT 01 
HI002 ASSIGNMENT 02 
Mean 
26.73737374 
17.93939394 
12.49494949 
Standard Error 
0.636870612 
0.365435664 
0.213139666 
Median 
27 
19 
13 
Mode 
27 
20 
14 
Standard Deviation 
6.336782578 
3.636038947 
2.120712902 
Sample Variance 
40.15481344 
13.22077922 
4.497423212 
Kurtosis 
3.924830269 
3.372356549 
5.049593179 
Skewness 
0.312442386 
1.183845155 
1.204878419 
Range 
50 
26 
16 
Minimum 
0 
4 
4 
Maximum 
50 
30 
20 
Sum 
2647 
1776 
1237 
Count 
99 
99 
99 
Largest(1) 
50 
30 
20 
Smallest(1) 
0 
4 
4 
Confidence Level(95.0%) 
1.26384897 
0.725195163 
0.42296872 
The 10 different correlations between the pairs of variables are given below: 
The variables HI003 FINAL EXAM and HI002 FINAL EXAM are positively correlated with a correlation coefficient of 0.207867. The pvalue is 0.039 and hence the correlation coefficient is statistically significant. It is a weak correlation.
The variables HI001 FINAL EXAM and HI002 FINAL EXAM are positively correlated with a correlation coefficient of 0.142303. The pvalue is 0.1600 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI001 ASSIGNMENT 01 and HI003 ASSIGNMENT 01 are positively correlated with a correlation coefficient of 0.155602. The pvalue is 0.1241 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI003 ASSIGNMENT 01 and HI003 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.567657. The pvalue is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.
The variables HI001 FINAL EXAM and HI003 FINAL EXAM are positively correlated with a correlation coefficient of 0.187035. The pvalue is 0.0638 and hence the correlation coefficient is statistically significant. It is a weak correlation.
The variables HI001 ASSIGNMENT 01 and HI001 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.648505. The pvalue is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.
The variables HI001 ASSIGNMENT 02 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.035405. The pvalue is 0.7279 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI002 ASSIGNMENT 01 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.603392. The pvalue is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.
The variables HI002 ASSIGNMENT 01 and HI003 ASSIGNMENT 01 are negatively correlated with a correlation coefficient of 0.11055. The pvalue is 0.2760 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI003 ASSIGNMENT 02 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.031706. The pvalue is 0.7554 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The Descriptive Statistics of the first group (Med 1) is given below: 
Descriptive Statistics 
Florida 
New York 
North Carolina 
Mean 
5.55 
8 
7.05 
Standard Error 
0.478347 
0.492041932 
0.634428877 
Median 
6 
8 
7.5 
Mode 
7 
8 
8 
Standard Deviation 
2.139233 
2.200478417 
2.837252192 
Sample Variance 
4.576316 
4.842105263 
8.05 
Kurtosis 
1.06219 
0.626431669 
0.904925496 
Skewness 
0.27356 
0.625687389 
0.056188269 
Range 
7 
9 
9 
Minimum 
2 
4 
3 
Maximum 
9 
13 
12 
Sum 
111 
160 
141 
Count 
20 
20 
20 
Largest(1) 
9 
13 
12 
Smallest(1) 
2 
4 
3 
Confidence Level(95.0%) 
1.001192 
1.029855598 
1.327874898 
The Descriptive Statistics of the second group (Med 2) is given below: 
Descriptive Statistics 
Florida 
New York 
North Carolina 
Mean 
14.5 
15.25 
13.95 
Standard Error 
0.708965146 
0.923024 
0.65884668 
Median 
14.5 
14.5 
14 
Mode 
17 
14 
12 
Standard Deviation 
3.170588522 
4.12789 
2.946451925 
Sample Variance 
10.05263158 
17.03947 
8.681578947 
Kurtosis 
0.340799481 
0.03014 
0.592052134 
Skewness 
0.280721497 
0.525352 
0.041733773 
Range 
12 
15 
11 
Minimum 
9 
9 
8 
Maximum 
21 
24 
19 
Sum 
290 
305 
279 
Count 
20 
20 
20 
Largest(1) 
21 
24 
19 
Smallest(1) 
9 
9 
8 
Confidence Level (95.0%) 
1.483881102 
1.931912 
1.378981946 
By viewing the descriptive statistics, we can say that the depression scores of the healthy group is far less than that of the group suffering from chronic health condition such as arthritis, hypertension, and/or heart ailment. We can also say that according to the sample, the individuals from Florida possess far better health conditions than individuals from New York and North Carolina.
For Med 1: 
Anova: Single Factor 






SUMMARY 






Groups 
Count 
Sum 
Average 
Variance 


Florida 
20 
111 
5.55 
4.576316 


New York 
20 
160 
8 
4.842105 


North Carolina 
20 
141 
7.05 
8.05 


ANOVA 






Source of Variation 
SS 
df 
MS 
F 
Pvalue 
F crit 
Between Groups 
61.03333 
2 
30.51667 
5.240886 
0.00814 
3.158843 
Within Groups 
331.9 
57 
5.822807 



Total 
392.9333 
59 




In this case the F value is higher than the F critical value. So we reject the null hypothesis and state that there is significant difference in depression scores among the healthy individuals.
For Med 2: 
Anova: Single Factor 






SUMMARY 






Groups 
Count 
Sum 
Average 
Variance 


Florida 
20 
290 
14.5 
10.05263 


New York 
20 
305 
15.25 
17.03947 


North Carolina 
20 
279 
13.95 
8.681579 


ANOVA 






Source of Variation 
SS 
df 
MS 
F 
Pvalue 
F crit 
Between Groups 
17.03333 
2 
8.516667 
0.714212 
0.493906 
3.158843 
Within Groups 
679.7 
57 
11.92456 



Total 
696.7333 
59 




In this case the F value is lower than the F critical value. So we accept the null hypothesis and state that there is no significant difference in depression scores among the nonhealthy individuals.
Field, A. (2012). Discovering statistics using SPSS (and sex and drugs and rock 'n' roll). 1st ed. Los Angeles [Calif.]: SAGE.
Hastie, T., Friedman, J. and Tibshirani, R. (2013). The elements of statistical learning. 1st ed. New York [u.a.]: Springer.
Huff, D. and Geis, I. (2006). How to lie with statistics. 1st ed. New York: W.W. Norton & Co.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). Predicting Credit Card Charges. Retrieved from https://myassignmenthelp.com/freesamples/hi6007statisticsforbusinessdecisions/theregressionstatistics.html.
"Predicting Credit Card Charges." My Assignment Help, 2022, https://myassignmenthelp.com/freesamples/hi6007statisticsforbusinessdecisions/theregressionstatistics.html.
My Assignment Help (2022) Predicting Credit Card Charges [Online]. Available from: https://myassignmenthelp.com/freesamples/hi6007statisticsforbusinessdecisions/theregressionstatistics.html
[Accessed 14 September 2024].
My Assignment Help. 'Predicting Credit Card Charges' (My Assignment Help, 2022) <https://myassignmenthelp.com/freesamples/hi6007statisticsforbusinessdecisions/theregressionstatistics.html> accessed 14 September 2024.
My Assignment Help. Predicting Credit Card Charges [Internet]. My Assignment Help. 2022 [cited 14 September 2024]. Available from: https://myassignmenthelp.com/freesamples/hi6007statisticsforbusinessdecisions/theregressionstatistics.html.