Predicting Credit Card Charges

Answer:

In this assignment, we compute the consumer characteristics to predict the amount charged by the users of credit card. The data for Consumer information is given below: -

Income ($1000s)	Household Size	Amount Charged ($)	Income ($1000s)	Household Size	Amount Charged ($)
54	3	4016	54	6	5573
30	2	3159	30	1	2583
32	4	5100	48	2	3866
50	5	4742	34	5	3586
31	2	1864	67	4	5037
55	2	4070	50	2	3605
37	1	2731	67	5	5345
40	2	3348	55	6	5370
66	4	4764	52	2	3890
51	3	4110	62	3	4705
25	3	4208	64	2	4157
48	4	4219	22	3	3579
27	1	2477	29	4	3890
33	2	2514	39	2	2972
65	3	4214	35	1	3121
63	4	4965	39	4	4183
42	6	4412	54	3	3720
21	2	2448	23	6	4127
44	1	2995	27	2	2921
37	5	4171	26	7	4603
62	6	5678	61	2	4273
21	3	3623	30	2	3067
55	7	5301	22	4	3074
42	2	3020	46	5	4820
41	7	4828	66	4	5149

The data comprises of household size, annual income and annual charges of credit card for a sample of 50 consumers. Now we move on to the analysis part: -

The descriptive statistics of the data is given below:

Descriptive statistics	Income ($1000s)	Household Size	Amount Charged ($)
Mean	43.48	3.42	3963.86
Standard Error	2.057785614	0.245930138	132.023387
Median	42	3	4090
Mode	54	2	3890
Standard Deviation	14.55074162	1.738988681	933.5463219
Sample Variance	211.7240816	3.024081633	871508.7351
Kurtosis	-1.247719422	-0.722808552	-0.742482171
Skewness	0.095855639	0.527895977	-0.128860064
Range	46	6	3814
Minimum	21	1	1864
Maximum	67	7	5678
Sum	2174	171	198193
Count	50	50	50
Largest(1)	67	7	5678
Smallest(1)	21	1	1864
Confidence Level (95.0%)	4.135274935	0.494215106	265.3109241

The equation for credit card charges can be given as: -

Y_t = βX_t + u_i ......Eq(1)

Here Y_tis our dependent variable which is annual charges on credit card and X_t is our independent variable which is annual income ($1000s). The regression results are given below: -

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.630781
R Square	0.397884
Adjusted R Square	0.38534
Standard Error	731.9025
Observations	50
ANOVA
	df	SS	MS	F	Significance F
Regression	1	16991229	16991229	31.71892	9.1E-07
Residual	48	25712699	535681.2
Total	49	42703928
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	2204.241	329.134	6.697091	0.00	1542.472	2866.009	1542.472207	2866.0088
Income ($1000s)	40.46963	7.185716	5.631955	0.00	26.02178	54.91748	26.02177931	54.917479

From the regression results, we can say that 38.5% of the variation in annual charges on credit card is explained by the variable annual income (Adjusted R²). The coefficients imply that if there is $1000 (1 unit of the variable annual income) increase in annual income, then there is an increase of 40.47 units in annual credit card charges.

Another equation for credit card charges can be given as: -

Y_t = βZ_t + u_i ......Eq(2)

Here Y_tis our dependent variable which is annual charges on credit card and X_t is our independent variable which is household size. The regression results are given below:-

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.752854
R Square	0.566789
Adjusted R Square	0.557764
Standard Error	620.8163
Observations	50

ANOVA
	df	SS	MS	F	Significance F
Regression	1	24204112	24204112	62.80048	2.86E-10
Residual	48	18499816	385412.8
Total	49	42703928
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	2581.644	195.2699	13.2209	0.00	2189.028	2974.26	2189.027669	2974.2605
Household Size	404.1567	50.99978	7.924676	0.00	301.6148	506.6986	301.6147764	506.69863

From the regression results, we can say that 55.8% of the variation in annual charges on credit card is explained by the variable household size (Adjusted R²). The coefficients imply that if there is 1 unit increase in the number of household members, then there is an increase of 404.2 units in annual credit card charges.

After viewing the above two variables, we can say that household size is better than annual income in predicting annual credit card charges.

The equation for credit card charges taking both the variables household size and annual income can be given as: -

Y_t = β₁X_t + β₂Z_t + u_i .....Eq(3)

The regression results are given below:-

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.908502
R Square	0.825376
Adjusted R Square	0.817945
Standard Error	398.3249
Observations	50
ANOVA
	df	SS	MS	F	Significance F
Regression	2	35246779	17623389	111.0745	1.55E-18
Residual	47	7457149	158662.8
Total	49	42703928
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	1305.034	197.771	6.598712	0.00	907.17	1702.898	907.17	1702.898
Income ($1000s)	33.12196	3.970237	8.342563	0.00	25.13487	41.10904	25.13487	41.10904
Household Size	356.3402	33.2204	10.72655	0.00	289.5094	423.171	289.5094	423.171

From the regression results, we can say that 81.8% of the variation in annual charges on credit card is explained by the variables household size and annual income (Adjusted R²). The coefficients imply that if there is 1 unit increase in the number of household members, then there is an increase of 356.34 units in annual credit card charges whereas if there is $1000 (1 unit of the variable annual income) increase in annual income, then there is an increase of 33.12 units in annual credit card charges.

Hence the fitted regression equation can be given as: -

Y_t = 33.12X_t + 356.34Z_t

The predicted annual credit card charge for a three-person household with an annual income of $40,000 would be: -

Y_t = 33.12*40 + 356.34*3 = $2393.82.

The other factors that affect the annual charges of credit card are interest rate, level of education of the individual and past credit history of the individual. If the interest rate is high, the annual charges will increase proportionately and this would tend the individual to decrease the frequency of credit card usage.

The descriptive statistics of the variables are given below: -

Descriptive Statistics	HI001 FINAL EXAM	HI001 ASSIGNMENT 01	HI001 ASSIGNMENT 02
Mean	31.90909091	17.34343434	15.50505051
Standard Error	0.700162085	0.237298066	0.23564704
Median	32	17	16
Mode	29	18	17
Standard Deviation	6.966524782	2.361085949	2.344658442
Sample Variance	48.53246753	5.57472686	5.497423212
Kurtosis	7.67534849	10.3018632	0.698973651
Skewness	-1.753036803	0.803185137	-0.464616962
Range	50	22	13
Minimum	0	8	8
Maximum	50	30	21
Sum	3159	1717	1535
Count	99	99	99
Largest(1)	50	30	21
Smallest(1)	0	8	8
Confidence Level(95.0%)	1.389448835	0.470910278	0.467633869

Descriptive Statistics	HI003 FINAL EXAM	HI003 ASSIGNMENT 01	HI003 ASSIGNMENT 02
Mean	26.23232323	18.31313131	13.60606061
Standard Error	0.861918907	0.408537639	0.187651228
Median	25	19	13
Mode	25	20	13
Standard Deviation	8.57598484	4.064898183	1.867106141
Sample Variance	73.54751598	16.52339724	3.486085343
Kurtosis	0.474751131	1.51303057	3.505251459
Skewness	-0.027305979	-0.236180187	1.121313851
Range	46	20	12
Minimum	4	10	8
Maximum	50	30	20
Sum	2597	1813	1347
Count	99	99	99
Largest(1)	50	30	20
Smallest(1)	4	10	8
Confidence Level(95.0%)	1.710449975	0.810729628	0.372387745

Descriptive Statistics	HI002 FINAL EXAM	HI002 ASSIGNMENT 01	HI002 ASSIGNMENT 02
Mean	26.73737374	17.93939394	12.49494949
Standard Error	0.636870612	0.365435664	0.213139666
Median	27	19	13
Mode	27	20	14
Standard Deviation	6.336782578	3.636038947	2.120712902
Sample Variance	40.15481344	13.22077922	4.497423212
Kurtosis	3.924830269	3.372356549	5.049593179
Skewness	-0.312442386	-1.183845155	-1.204878419
Range	50	26	16
Minimum	0	4	4
Maximum	50	30	20
Sum	2647	1776	1237
Count	99	99	99
Largest(1)	50	30	20
Smallest(1)	0	4	4
Confidence Level(95.0%)	1.26384897	0.725195163	0.42296872

The 10 different correlations between the pairs of variables are given below: -

The variables HI003 FINAL EXAM and HI002 FINAL EXAM are positively correlated with a correlation coefficient of 0.207867. The p-value is 0.039 and hence the correlation coefficient is statistically significant. It is a weak correlation.

The variables HI001 FINAL EXAM and HI002 FINAL EXAM are positively correlated with a correlation coefficient of 0.142303. The p-value is 0.1600 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.

The variables HI001 ASSIGNMENT 01 and HI003 ASSIGNMENT 01 are positively correlated with a correlation coefficient of 0.155602. The p-value is 0.1241 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.

The variables HI003 ASSIGNMENT 01 and HI003 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.567657. The p-value is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.

The variables HI001 FINAL EXAM and HI003 FINAL EXAM are positively correlated with a correlation coefficient of 0.187035. The p-value is 0.0638 and hence the correlation coefficient is statistically significant. It is a weak correlation.

The variables HI001 ASSIGNMENT 01 and HI001 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.648505. The p-value is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.

The variables HI001 ASSIGNMENT 02 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.035405. The p-value is 0.7279 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.

The variables HI002 ASSIGNMENT 01 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.603392. The p-value is 0.000 and hence the correlation coefficient is statistically significant. It is a strong correlation.

The variables HI002 ASSIGNMENT 01 and HI003 ASSIGNMENT 01 are negatively correlated with a correlation coefficient of -0.11055. The p-value is 0.2760 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.

The variables HI003 ASSIGNMENT 02 and HI002 ASSIGNMENT 02 are positively correlated with a correlation coefficient of 0.031706. The p-value is 0.7554 and hence the correlation coefficient is statistically insignificant. It is a weak correlation.

The Descriptive Statistics of the first group (Med 1) is given below: -

Descriptive Statistics	Florida	New York	North Carolina
Mean	5.55	8	7.05
Standard Error	0.478347	0.492041932	0.634428877
Median	6	8	7.5
Mode	7	8	8
Standard Deviation	2.139233	2.200478417	2.837252192
Sample Variance	4.576316	4.842105263	8.05
Kurtosis	-1.06219	0.626431669	-0.904925496
Skewness	-0.27356	0.625687389	-0.056188269
Range	7	9	9
Minimum	2	4	3
Maximum	9	13	12
Sum	111	160	141
Count	20	20	20
Largest(1)	9	13	12
Smallest(1)	2	4	3
Confidence Level(95.0%)	1.001192	1.029855598	1.327874898

The Descriptive Statistics of the second group (Med 2) is given below: -

Descriptive Statistics	Florida	New York	North Carolina
Mean	14.5	15.25	13.95
Standard Error	0.708965146	0.923024	0.65884668
Median	14.5	14.5	14
Mode	17	14	12
Standard Deviation	3.170588522	4.12789	2.946451925
Sample Variance	10.05263158	17.03947	8.681578947
Kurtosis	-0.340799481	-0.03014	-0.592052134
Skewness	0.280721497	0.525352	-0.041733773
Range	12	15	11
Minimum	9	9	8
Maximum	21	24	19
Sum	290	305	279
Count	20	20	20
Largest(1)	21	24	19
Smallest(1)	9	9	8
Confidence Level (95.0%)	1.483881102	1.931912	1.378981946

By viewing the descriptive statistics, we can say that the depression scores of the healthy group is far less than that of the group suffering from chronic health condition such as arthritis, hypertension, and/or heart ailment. We can also say that according to the sample, the individuals from Florida possess far better health conditions than individuals from New York and North Carolina.

Here the hypothesis that needs to be tested is whether there is a significant difference in the depression scores among various regions. The ANOVA test for both the groups are given below: -

For Med 1: -

Anova: Single Factor
SUMMARY
Groups	Count	Sum	Average	Variance
Florida	20	111	5.55	4.576316
New York	20	160	8	4.842105
North Carolina	20	141	7.05	8.05
ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	61.03333	2	30.51667	5.240886	0.00814	3.158843
Within Groups	331.9	57	5.822807
Total	392.9333	59

In this case the F value is higher than the F critical value. So we reject the null hypothesis and state that there is significant difference in depression scores among the healthy individuals.

For Med 2: -

Anova: Single Factor
SUMMARY
Groups	Count	Sum	Average	Variance
Florida	20	290	14.5	10.05263
New York	20	305	15.25	17.03947
North Carolina	20	279	13.95	8.681579
ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	17.03333	2	8.516667	0.714212	0.493906	3.158843
Within Groups	679.7	57	11.92456
Total	696.7333	59

In this case the F value is lower than the F critical value. So we accept the null hypothesis and state that there is no significant difference in depression scores among the non-healthy individuals.

According to me, the best way to treat depression is to arrange for a good setup of counselling. This cannot be cured in the short run and hence requires time to develop. We also need to see to the health conditions of the individuals for controlling their depression scores.

References

Field, A. (2012). Discovering statistics using SPSS (and sex and drugs and rock 'n' roll). 1st ed. Los Angeles [Calif.]: SAGE.

Hastie, T., Friedman, J. and Tibshirani, R. (2013). The elements of statistical learning. 1st ed. New York [u.a.]: Springer.

Huff, D. and Geis, I. (2006). How to lie with statistics. 1st ed. New York: W.W. Norton & Co.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2022). Predicting Credit Card Charges. Retrieved from https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html.

"Predicting Credit Card Charges." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html.

My Assignment Help (2022) Predicting Credit Card Charges [Online]. Available from: https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html
[Accessed 20 May 2025].

My Assignment Help. 'Predicting Credit Card Charges' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html> accessed 20 May 2025.

My Assignment Help. Predicting Credit Card Charges [Internet]. My Assignment Help. 2022 [cited 20 May 2025]. Available from: https://myassignmenthelp.com/free-samples/hi6007-statistics-for-business-decisions/the-regression-statistics.html.

Get instant help from 5000+ experts for

Writing Rewriting Editing

Subject/course code

❮ ❯

Pages

250 words

Description (write/attach)

Attach file

I accept the T&C, agree to receive offers & updates

Have a coupon?