In this assignment, we compute the consumer characteristics to predict the amount charged by the users of credit card. The data for Consumer information is given below: -
Income ($1000s) |
Household Size |
Amount Charged ($) |
Income ($1000s) |
Household Size |
Amount Charged ($) |
54 |
3 |
4016 |
54 |
6 |
5573 |
30 |
2 |
3159 |
30 |
1 |
2583 |
32 |
4 |
5100 |
48 |
2 |
3866 |
50 |
5 |
4742 |
34 |
5 |
3586 |
31 |
2 |
1864 |
67 |
4 |
5037 |
55 |
2 |
4070 |
50 |
2 |
3605 |
37 |
1 |
2731 |
67 |
5 |
5345 |
40 |
2 |
3348 |
55 |
6 |
5370 |
66 |
4 |
4764 |
52 |
2 |
3890 |
51 |
3 |
4110 |
62 |
3 |
4705 |
25 |
3 |
4208 |
64 |
2 |
4157 |
48 |
4 |
4219 |
22 |
3 |
3579 |
27 |
1 |
2477 |
29 |
4 |
3890 |
33 |
2 |
2514 |
39 |
2 |
2972 |
65 |
3 |
4214 |
35 |
1 |
3121 |
63 |
4 |
4965 |
39 |
4 |
4183 |
42 |
6 |
4412 |
54 |
3 |
3720 |
21 |
2 |
2448 |
23 |
6 |
4127 |
44 |
1 |
2995 |
27 |
2 |
2921 |
37 |
5 |
4171 |
26 |
7 |
4603 |
62 |
6 |
5678 |
61 |
2 |
4273 |
21 |
3 |
3623 |
30 |
2 |
3067 |
55 |
7 |
5301 |
22 |
4 |
3074 |
42 |
2 |
3020 |
46 |
5 |
4820 |
41 |
7 |
4828 |
66 |
4 |
5149 |
The data comprises of household size, annual income and annual charges of credit card for a sample of 50 consumers. Now we move on to the analysis part: -
Descriptive statistics |
Income ($1000s) |
Household Size |
Amount Charged ($) |
Mean |
43.48 |
3.42 |
3963.86 |
Standard Error |
2.057785614 |
0.245930138 |
132.023387 |
Median |
42 |
3 |
4090 |
Mode |
54 |
2 |
3890 |
Standard Deviation |
14.55074162 |
1.738988681 |
933.5463219 |
Sample Variance |
211.7240816 |
3.024081633 |
871508.7351 |
Kurtosis |
-1.247719422 |
-0.722808552 |
-0.742482171 |
Skewness |
0.095855639 |
0.527895977 |
-0.128860064 |
Range |
46 |
6 |
3814 |
Minimum |
21 |
1 |
1864 |
Maximum |
67 |
7 |
5678 |
Sum |
2174 |
171 |
198193 |
Count |
50 |
50 |
50 |
Largest(1) |
67 |
7 |
5678 |
Smallest(1) |
21 |
1 |
1864 |
Confidence Level (95.0%) |
4.135274935 |
0.494215106 |
265.3109241 |
The equation for credit card charges can be given as: -
Yt = βXt + ui ......Eq(1)
Here Yt is our dependent variable which is annual charges on credit card and Xt is our independent variable which is annual income ($1000s). The regression results are given below: -
SUMMARY OUTPUT |
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
|
Multiple R |
0.630781 |
|
|
|
|
|
|
|
R Square |
0.397884 |
|
|
|
|
|
|
|
Adjusted R Square |
0.38534 |
|
|
|
|
|
|
|
Standard Error |
731.9025 |
|
|
|
|
|
|
|
Observations |
50 |
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
Regression |
1 |
16991229 |
16991229 |
31.71892 |
9.1E-07 |
|
|
|
Residual |
48 |
25712699 |
535681.2 |
|
|
|
|
|
Total |
49 |
42703928 |
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
2204.241 |
329.134 |
6.697091 |
0.00 |
1542.472 |
2866.009 |
1542.472207 |
2866.0088 |
Income ($1000s) |
40.46963 |
7.185716 |
5.631955 |
0.00 |
26.02178 |
54.91748 |
26.02177931 |
54.917479 |
From the regression results, we can say that 38.5% of the variation in annual charges on credit card is explained by the variable annual income (Adjusted R2). The coefficients imply that if there is $1000 (1 unit of the variable annual income) increase in annual income, then there is an increase of 40.47 units in annual credit card charges.
Another equation for credit card charges can be given as: -
Yt = βZt + ui ......Eq(2)
Here Yt is our dependent variable which is annual charges on credit card and Xt is our independent variable which is household size. The regression results are given below:-
SUMMARY OUTPUT |
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
|
Multiple R |
0.752854 |
|
|
|
|
|
|
|
R Square |
0.566789 |
|
|
|
|
|
|
|
Adjusted R Square |
0.557764 |
|
|
|
|
|
|
|
Standard Error |
620.8163 |
|
|
|
|
|
|
|
Observations |
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
Regression |
1 |
24204112 |
24204112 |
62.80048 |
2.86E-10 |
|
|
|
Residual |
48 |
18499816 |
385412.8 |
|
|
|
|
|
Total |
49 |
42703928 |
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
2581.644 |
195.2699 |
13.2209 |
0.00 |
2189.028 |
2974.26 |
2189.027669 |
2974.2605 |
Household Size |
404.1567 |
50.99978 |
7.924676 |
0.00 |
301.6148 |
506.6986 |
301.6147764 |
506.69863 |
From the regression results, we can say that 55.8% of the variation in annual charges on credit card is explained by the variable household size (Adjusted R2). The coefficients imply that if there is 1 unit increase in the number of household members, then there is an increase of 404.2 units in annual credit card charges.
After viewing the above two variables, we can say that household size is better than annual income in predicting annual credit card charges.
Yt = β1Xt + β2Zt + ui .....Eq(3)
The regression results are given below:-
SUMMARY OUTPUT |
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
|
Multiple R |
0.908502 |
|
|
|
|
|
|
|
R Square |
0.825376 |
|
|
|
|
|
|
|
Adjusted R Square |
0.817945 |
|
|
|
|
|
|
|
Standard Error |
398.3249 |
|
|
|
|
|
|
|
Observations |
50 |
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
Regression |
2 |
35246779 |
17623389 |
111.0745 |
1.55E-18 |
|
|
|
Residual |
47 |
7457149 |
158662.8 |
|
|
|
|
|
Total |
49 |
42703928 |
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
1305.034 |
197.771 |
6.598712 |
0.00 |
907.17 |
1702.898 |
907.17 |
1702.898 |
Income ($1000s) |
33.12196 |
3.970237 |
8.342563 |
0.00 |
25.13487 |
41.10904 |
25.13487 |
41.10904 |
Household Size |
356.3402 |
33.2204 |
10.72655 |
0.00 |
289.5094 |
423.171 |
289.5094 |
423.171 |
From the regression results, we can say that 81.8% of the variation in annual charges on credit card is explained by the variables household size and annual income (Adjusted R2). The coefficients imply that if there is 1 unit increase in the number of household members, then there is an increase of 356.34 units in annual credit card charges whereas if there is $1000 (1 unit of the variable annual income) increase in annual income, then there is an increase of 33.12 units in annual credit card charges.
Hence the fitted regression equation can be given as: -
Yt = 33.12Xt + 356.34Zt
Yt = 33.12*40 + 356.34*3 = $2393.82.
The descriptive statistics of the variables are given below: -
Descriptive Statistics |
HI001 FINAL EXAM |
HI001 ASSIGNMENT 01 |
HI001 ASSIGNMENT 02 |
Mean |
31.90909091 |
17.34343434 |
15.50505051 |
Standard Error |
0.700162085 |
0.237298066 |
0.23564704 |
Median |
32 |
17 |
16 |
Mode |
29 |
18 |
17 |
Standard Deviation |
6.966524782 |
2.361085949 |
2.344658442 |
Sample Variance |
48.53246753 |
5.57472686 |
5.497423212 |
Kurtosis |
7.67534849 |
10.3018632 |
0.698973651 |
Skewness |
-1.753036803 |
0.803185137 |
-0.464616962 |
Range |
50 |
22 |
13 |
Minimum |
0 |
8 |
8 |
Maximum |
50 |
30 |
21 |
Sum |
3159 |
1717 |
1535 |
Count |
99 |
99 |
99 |
Largest(1) |
50 |
30 |
21 |
Smallest(1) |
0 |
8 |
8 |
Confidence Level(95.0%) |
1.389448835 |
0.470910278 |
0.467633869 |
Descriptive Statistics |
HI003 FINAL EXAM |
HI003 ASSIGNMENT 01 |
HI003 ASSIGNMENT 02 |
Mean |
26.23232323 |
18.31313131 |
13.60606061 |
Standard Error |
0.861918907 |
0.408537639 |
0.187651228 |
Median |
25 |
19 |
13 |
Mode |
25 |
20 |
13 |
Standard Deviation |
8.57598484 |
4.064898183 |
1.867106141 |
Sample Variance |
73.54751598 |
16.52339724 |
3.486085343 |
Kurtosis |
0.474751131 |
1.51303057 |
3.505251459 |
Skewness |
-0.027305979 |
-0.236180187 |
1.121313851 |
Range |
46 |
20 |
12 |
Minimum |
4 |
10 |
8 |
Maximum |
50 |
30 |
20 |
Sum |
2597 |
1813 |
1347 |
Count |
99 |
99 |
99 |
Largest(1) |
50 |
30 |
20 |
Smallest(1) |
4 |
10 |
8 |
Confidence Level(95.0%) |
1.710449975 |
0.810729628 |
0.372387745 |
Descriptive Statistics |
HI002 FINAL EXAM |
HI002 ASSIGNMENT 01 |
HI002 ASSIGNMENT 02 |
Mean |
26.73737374 |
17.93939394 |
12.49494949 |
Standard Error |
0.636870612 |
0.365435664 |
0.213139666 |
Median |
27 |
19 |
13 |
Mode |
27 |
20 |
14 |
Standard Deviation |
6.336782578 |
3.636038947 |
2.120712902 |
Sample Variance |
40.15481344 |
13.22077922 |
4.497423212 |
Kurtosis |
3.924830269 |
3.372356549 |
5.049593179 |
Skewness |
-0.312442386 |
-1.183845155 |
-1.204878419 |
Range |
50 |
26 |
16 |
Minimum |
0 |
4 |
4 |
Maximum |
50 |
30 |
20 |
Sum |
2647 |
1776 |
1237 |
Count |
99 |
99 |
99 |
Largest(1) |
50 |
30 |
20 |
Smallest(1) |
0 |
4 |
4 |
Confidence Level(95.0%) |
1.26384897 |
0.725195163 |
0.42296872 |
The 10 different correlations between the pairs of variables are given below: -
The variables HI003 FINAL EXAM and HI002 FINAL EXAM are positively correlated with a correlation coefficient of 0.207867. The p-value is 0.039 and hence the correlation coefficient is statistically significant. It is a weak correlation.
The variables HI001 FINAL EXAM and HI002 FINAL EXAM are positively correlated with a correlation coefficient of 0.142303. The p-value is 0.1600 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI001 ASSIGNMENT 01 and HI003 ASSIGNMENT 01 are positively correlated with a correlation coefficient of 0.155602. The p-value is 0.1241 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI003 ASSIGNMENT 01 and HI003 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.567657. The p-value is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.
The variables HI001 FINAL EXAM and HI003 FINAL EXAM are positively correlated with a correlation coefficient of 0.187035. The p-value is 0.0638 and hence the correlation coefficient is statistically significant. It is a weak correlation.
The variables HI001 ASSIGNMENT 01 and HI001 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.648505. The p-value is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.
The variables HI001 ASSIGNMENT 02 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.035405. The p-value is 0.7279 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI002 ASSIGNMENT 01 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.603392. The p-value is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.
The variables HI002 ASSIGNMENT 01 and HI003 ASSIGNMENT 01 are negatively correlated with a correlation coefficient of -0.11055. The p-value is 0.2760 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The variables HI003 ASSIGNMENT 02 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.031706. The p-value is 0.7554 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.
The Descriptive Statistics of the first group (Med 1) is given below: -
Descriptive Statistics |
Florida |
New York |
North Carolina |
Mean |
5.55 |
8 |
7.05 |
Standard Error |
0.478347 |
0.492041932 |
0.634428877 |
Median |
6 |
8 |
7.5 |
Mode |
7 |
8 |
8 |
Standard Deviation |
2.139233 |
2.200478417 |
2.837252192 |
Sample Variance |
4.576316 |
4.842105263 |
8.05 |
Kurtosis |
-1.06219 |
0.626431669 |
-0.904925496 |
Skewness |
-0.27356 |
0.625687389 |
-0.056188269 |
Range |
7 |
9 |
9 |
Minimum |
2 |
4 |
3 |
Maximum |
9 |
13 |
12 |
Sum |
111 |
160 |
141 |
Count |
20 |
20 |
20 |
Largest(1) |
9 |
13 |
12 |
Smallest(1) |
2 |
4 |
3 |
Confidence Level(95.0%) |
1.001192 |
1.029855598 |
1.327874898 |
The Descriptive Statistics of the second group (Med 2) is given below: -
Descriptive Statistics |
Florida |
New York |
North Carolina |
Mean |
14.5 |
15.25 |
13.95 |
Standard Error |
0.708965146 |
0.923024 |
0.65884668 |
Median |
14.5 |
14.5 |
14 |
Mode |
17 |
14 |
12 |
Standard Deviation |
3.170588522 |
4.12789 |
2.946451925 |
Sample Variance |
10.05263158 |
17.03947 |
8.681578947 |
Kurtosis |
-0.340799481 |
-0.03014 |
-0.592052134 |
Skewness |
0.280721497 |
0.525352 |
-0.041733773 |
Range |
12 |
15 |
11 |
Minimum |
9 |
9 |
8 |
Maximum |
21 |
24 |
19 |
Sum |
290 |
305 |
279 |
Count |
20 |
20 |
20 |
Largest(1) |
21 |
24 |
19 |
Smallest(1) |
9 |
9 |
8 |
Confidence Level (95.0%) |
1.483881102 |
1.931912 |
1.378981946 |
By viewing the descriptive statistics, we can say that the depression scores of the healthy group is far less than that of the group suffering from chronic health condition such as arthritis, hypertension, and/or heart ailment. We can also say that according to the sample, the individuals from Florida possess far better health conditions than individuals from New York and North Carolina.
For Med 1: -
Anova: Single Factor |
|
|
|
|
|
|
SUMMARY |
|
|
|
|
|
|
Groups |
Count |
Sum |
Average |
Variance |
|
|
Florida |
20 |
111 |
5.55 |
4.576316 |
|
|
New York |
20 |
160 |
8 |
4.842105 |
|
|
North Carolina |
20 |
141 |
7.05 |
8.05 |
|
|
ANOVA |
|
|
|
|
|
|
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
61.03333 |
2 |
30.51667 |
5.240886 |
0.00814 |
3.158843 |
Within Groups |
331.9 |
57 |
5.822807 |
|
|
|
Total |
392.9333 |
59 |
|
|
|
|
In this case the F value is higher than the F critical value. So we reject the null hypothesis and state that there is significant difference in depression scores among the healthy individuals.
For Med 2: -
Anova: Single Factor |
|
|
|
|
|
|
SUMMARY |
|
|
|
|
|
|
Groups |
Count |
Sum |
Average |
Variance |
|
|
Florida |
20 |
290 |
14.5 |
10.05263 |
|
|
New York |
20 |
305 |
15.25 |
17.03947 |
|
|
North Carolina |
20 |
279 |
13.95 |
8.681579 |
|
|
ANOVA |
|
|
|
|
|
|
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
17.03333 |
2 |
8.516667 |
0.714212 |
0.493906 |
3.158843 |
Within Groups |
679.7 |
57 |
11.92456 |
|
|
|
Total |
696.7333 |
59 |
|
|
|
|
In this case the F value is lower than the F critical value. So we accept the null hypothesis and state that there is no significant difference in depression scores among the non-healthy individuals.
Field, A. (2012). Discovering statistics using SPSS (and sex and drugs and rock 'n' roll). 1st ed. Los Angeles [Calif.]: SAGE.
Hastie, T., Friedman, J. and Tibshirani, R. (2013). The elements of statistical learning. 1st ed. New York [u.a.]: Springer.
Huff, D. and Geis, I. (2006). How to lie with statistics. 1st ed. New York: W.W. Norton & Co.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). Predicting Credit Card Charges. Retrieved from https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html.
"Predicting Credit Card Charges." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html.
My Assignment Help (2022) Predicting Credit Card Charges [Online]. Available from: https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html
[Accessed 14 September 2024].
My Assignment Help. 'Predicting Credit Card Charges' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html> accessed 14 September 2024.
My Assignment Help. Predicting Credit Card Charges [Internet]. My Assignment Help. 2022 [cited 14 September 2024]. Available from: https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html.