Your objective is to minimize the risk and maximize the profit of the bank. 1000 customer profiles are given in “dataset.csv”. Variable descriptions are given in “descripton.txt”.
1. Merge the levels of the categorical variables meaningfully in order to get considerable proportions for respective levels. Categorize continuous variables where necessary. (eg. age into age groups, etc.)
2. Split the dataset randomly into two datasets; training (80%) and testing (20%).
3. Using the training dataset, carry out a thorough descriptive analysis to summarize the dataset and identify potential associations of the explanatory variables with the response variable (creditability).
4. Consider the entire dataset (i.e. all 1000 customer profiles) Is there any difference between the creditability of female customers and male customers?
5. Fit a suitable regression model to predict the creditability of a customer using the training dataset.
6. Assess the validity of your model using the testing dataset.
7. Conduct a cost profit analysis considering a 35% profit and 100% loss for correct and wrong decisions respectively.
The report is based on banking sector. When a client of a bank gives the application for loan, the bank decides whether to approve the loan on the basis of applicant’s profile or not. There exist two types of risks related to the decision of the bank. First, if the bank does not permit an applicant with a good credit risk from the people who are likely to repay the loan, then it causes a loss of business to the bank. Second, if the bank permits an applicant with a bad credit risk who are not likely to repay the loan, then it results a financial loss to the bank.
The analysis highlights the profit scenario of the bank. Not only that, it indicates the true components and factors of the financial trade of the bank.
The data analysis software “R” and “R-studio” is used here for the financial analysis.
The categorical variable “age” is categorized in continuous variable and created as a new variable “Age groups”. There are six age groups according to the “age”. “Credit Amount” is categorised as categorical variables. Here, the people whose credit amount is greater than 5000, are treated as “rich” (label=1) and the people whose credit amount is lesser than 16000 and greater than 5000, are treated as “poor” (label=0). “Duration of Credit” is also divided in 6 groups that are denoted as 1, 2, 3, 4, 5 and 6.
The summary statistics displays the exploratory data analysis of all the categorised variables.
- The Creditability average is 0.7025.
- The account balance is 2.581.
- The average duration of credit of month is 2.534.
- The payment status is 2.513.
- The average of purpose level is 2.776.
- The average level of credit amount is 0.1875.
- The average level of saving stock is 2.11.
- Length of current employment is 3.361.
- The average level of instalment per cent is 2.965.
- The average level of sex and marital status is 2.675.
- The average level of Guarantors is 1.144.
- The average level of Duration in Current Address is 2.819.
- The average level of most valuable available asset is 2.355.
- The average level of age in years is 2.502.
- The average level of concurrent credits is 2.679.
- The mean level of types of apartment is 1.929.
- The level of average number of credits at the bank is 1.384.
- The mean level of occupation is 2.896.
- The average level of number of dependents is 1.163.
- The mean level of telephones is 1.387.
- The average level of foreign worker is 1.034.
The box plots of all the categorical variables infer that most of the variables have outliers. These box plots show the location measures of all the variables of the data set.
Pearson’s Correlation coefficient helps to find out the positive or negative, strong or moderate or weak association between two sets of variables.
The correlation coefficient matrix indicates that-
- Creditability is weakly and positively related with account balance (r = 0.36) (Altman and Krzywinski 2015).
- Creditability is weakly and negatively associated with duration of monthly credit (r = -0.223).
- Creditability is weakly and positively associated with payment status of previous credit (r = 0.221).
- Creditability is uncorrelated with level of purpose (r = -0.0289).
- Creditability is weakly and negatively associated with Credit Amount (r = -0.1077).
- Creditability is weakly and positively associated with value savings stocks (r = 0.1603).
- Creditability is weakly and positively correlated with length of current employment (r = 0.1154).
- Creditability is uncorrelated with instalment per cent (r = -0.084).
- Sex and Marital status is uncorrelated with Creditability (r = 0.0825).
- Guarantors is uncorrelated with Creditability (r = 0.0184).
- Creditability is uncorrelated with Duration in Current Address (r = 0.012).
- Creditability is weakly and negatively associated with most valuable available asset (r = -0.1485).
- Creditability is uncorrelated with types of apartment (r = 0.0409).
- Creditability is uncorrelated with Concurrent credits (r = 0.1031).
- Levels of Age in years is weakly and positively correlated with Creditability (r = 0.1332).
- The levels of number of credits at this bank is uncorrelated with Creditability (r = 0.0508).
- The levels of Occupation are uncorrelated with Creditability (r = -0.019).
- Creditability is uncorrelated with number of dependents (r = 0.03465).
- Creditability is uncorrelated with levels of telephone (r = 0.0461).
- The levels of Foreign Worker are uncorrelated with Creditability (r = 0.0762).
Hypotheses:
Null hypothesis (H0): The difference of averages of credibility of males and females is 0.
Alternative hypothesis (HA): The averages of credibility of males and females are different to each other.
Independent sample t-test assuming unequal variances:
Test applied: Two-sample t-test assuming un-equal variances
Level of significance: 5%
Calculated degrees of freedom: 111.4
Calculated t-statistic: (0.63433)
Two-tailed p-value: 0.5272
Interpretation: 0.5272>0.05. Therefore, null hypothesis is accepted at 5% level of significance. It is 95% evident that the average values credibility of males and females have no difference.
Inferential Analysis
Decision Making: The average values of credibility of males and females are equal.
Two-sample t-test assuming equal variances:
Test applied: Independent sample t-test assuming equal variances
Level of significance: 5%
Calculated degrees of freedom: 998
Calculated t-statistic: (0.62026)
Two-tailed p-value: 0.5352
Interpretation: 0.5352>0.05. Therefore, null hypothesis is accepted at 5% level of significance. It is 95% evident that the average values credibility of males and females have no difference.
Decision Making: The average values of credibility of males and females are equal.
Conclusion: According to the both types of t-test, it is concluded that the average scores of credibility for males is equal to the average scores of credibility for females.
Here, Creditability is the dependent variable which is dichotomous in nature. It has two levels that are “0” for Creditability worthy or success and “1” for non-creditability worthy or failure. In this case, logistic regression model would be the perfect model to be fitted.
Hypotheses:
Null hypothesis (H0): The predictors are significantly associated with the single response variable “Creditability”.
Alternative hypothesis (HA): The predictors are not significantly associated with the single response variable “Creditability”.
From the calculated logistic regression model, it is observed that Creditability is assumed as lone response variable, whereas all the other variables are assumed to be predictor or independent variables.
The coefficients of logistic regression provide the change in the odds of log of the results for one unit increase in the independent variable.
- For one-unit increase in Account balance, the log odds of Creditability enhance by 0.58 units.
- For one-unit increase in Duration in Credit Month, the log odds of Creditability decrease by 0.35 units.
- For one-unit increase in Payment Status of Previous Credit, the log odds of Creditability increase by 0.33 units.
- For one-unit increase in Purpose level, the log odds of Creditability increase by 0.009 units.
- For one-unit increase in Credit Amount, the log odds of Creditability decrease by 0.247 units.
- For one-unit increase in Value Saving Stock, the log odds of Creditability increase by 0.202 units.
- For one-unit increase in Length of current employment, the log odds of Creditability increase by 0.165 units.
- For one-unit increase in instalment per cent, the log odds of Creditability decrease by 0.2664 units.
- For one-unit increase in Sex & Marital status, the log odds of Creditability increase by 0.22 units.
- For one-unit increase in Guarantors, the log odds of Creditability increase by 0.307 units.
- For one-unit increase in duration in current address, the log odds of Creditability decrease by 0.007 units.
- For one-unit increase in most valuable available asset, the log odds of Creditability decrease by 0.2285 units.
- For one-unit increase of age in years, the log odds of Creditability increase by 0.14 units.
- For one-unit increase of Concurrent credits, the log odds of Creditability increase by 0.2277 units.
- For one-unit increase in type of apartment, the log odds of Creditability increase by 0.3634 units.
- For one-unit increase in number of credits in that bank, the log odds of Creditability decrease by 0.2427 units.
- For one-unit increase in Occupation, the log odds of Creditability increase by 0.0478 units.
- For one-unit increase in number of dependents, the log odds of Creditability increase by 0.0235 units.
- For one-unit increase in levels of telephone, the log odds of Creditability increase by 0.2489 units.
- For one-unit increase in levels of foreign workers, the log odds of Creditability increase by 1.03 units.
The null deviance of the model is 973.97 with 799 degrees of freedom. However, the residual deviance is calculated as 762.53 with 779 degrees of freedom. Therefore, fitting of the logistic regression model is not bad (Madan et al. 2015).
According to the calculated p-values, the very significant factors of the logistic regression model to estimate response variable “Creditability” are – Account Balance, Payment Status of Previous Credit and Value Saving Stocks. The significant predictors at 0.1% confidence levels are Duration of Credit Month and Instalment per cent. The significant predictors at 5% level of significance are Credit Amount, Guarantors, Most valuable available asset, Concurrent credits, Type of Apartment and Telephone (Tranmer and Elliot 2008).
Most of the predictors are found well fit. AIC (Akaike’s Information Criterion) value is not also too high (804.53). Therefore, the predictability of the logistic regression model is very good.
Therefore, the null hypothesis of significant association between dependent variable and independent variables is accepted with at least 95% probability.
Logistic Regression Model
The outcomes of regression model on training data is used in testing data and then new logistic model is executed.
Hypotheses:
Null hypothesis (H0): The logistic regression model applied on training data has good fitting for testing data also.
Alternative hypothesis (HA): The logistic regression model applied on training data does not have good fitting for testing data.
The pseudo-R2 of regression model on testing data is 0.415. Although, pseudo-R2 is not an authentic measure of the validity of the logistic model, but, still it could be said that the fitting of training model on testing data is not good. The Wald-statistic of “Account Balance”, “Payment status of previous credit” and “Value saving stocks” are 0.0017, 0.0023 and 0.0037 respectively. Hence, these variables can be removed from the logistic regression model as a predictor.
The Wald-test statistic is 0.005. The Wald-test is also known as Wald Chi-square test. It helps to find whether explanatory variables in a model are significant or not. The 0.5% value of Wald-test statistic refers that the logistic model fitted on training data has insignificant validity on the logistic model of testing data (Source: Www3.amherst.edu. 2018).
Therefore, the null hypothesis is rejected at 5% level of significance. It is 95% evident that fitting of logistic model in training data on testing data is not at all good.
In cost profit analysis, we consider a 35% profit and 100% loss for correct and wrong decisions.
Table 7: Table showing overall cost and revenue
The Creditability has two levels where “1” refers worthy creditability of success and “0” refers non-worthy creditability of failure. As, per instruction, the success creditability earns 35% profit in revenue. The failure credibility faces 100% loss in revenue. Hence, the customers whose creditability is “0”, provides 0 amount in revenue. The customers whose credibility is “1”, provides a total of $2821244 as revenue. Conversely, the total cost accounted as credit amount is $3271248.
It could be concluded that the company faces a significant amount of loss due to non-worthy creditability (Boardman et al. 2017).
Conclusion:
The objective is to minimize the risk and maximize the profit of the bank. The significant factors that bring down the credibility level towards 0 are to be certainly decreased (Kruikemeier et al. 2015). The positive or negative association among the response variable and predictors should also be considered.
References:
Altman, N. and Krzywinski, M., 2015. Points of Significance: Association, correlation and causation.
Boardman, A.E., Greenberg, D.H., Vining, A.R. and Weimer, D.L., 2017. Cost-benefit analysis: concepts and practice. Cambridge University Press.
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Lin, C.J., 2008. LIBLINEAR: A library for large linear classification. Journal of machine learning research, 9(Aug), pp.1871-1874.
Kruikemeier, S., Aparaschivei, A.P., Boomgaarden, H.G., Van Noort, G. and Vliegenthart, R., 2015. Party and candidate websites: A comparative explanatory analysis. Mass Communication and Society, 18(6), pp.821-850.
Madan, J., Lönnroth, K., Laokri, S. and Squire, S.B., 2015. What can dissaving tell us about catastrophic costs? Linear and logistic regression analysis of the relationship between patient costs and financial coping strategies adopted by tuberculosis patients in Bangladesh, Tanzania and Bangalore, India. BMC health services research, 15(1), p.476.
Ross, A. and Willson, V.L., 2017. Independent Samples T-Test. In Basic and Advanced Statistical Tests (pp. 13-16). SensePublishers, Rotterdam.
Sidel, J.L., Bleibaum, R.N. and Tao, K.W., 2018. Quantitative Descriptive Analysis. Descriptive Analysis in Sensory Evaluation, pp.287-318.
Tranmer, M. and Elliot, M., 2008. Multiple linear regression. The Cathie Marsh Centre for Census and Survey Research (CCSR), 5, pp.30-35.
Www3.amherst.edu. (2018). Hypothesis Testing and the Wald Test. [online] Available at: https://www3.amherst.edu/~fwesthoff/webpost/Old/Econ_360/Econ_360-10-17-Chap.pdf.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2020). Financial Analysis Of Banking Sector And Creditworthiness In An Essay.. Retrieved from https://myassignmenthelp.com/free-samples/cmm723-statistics-for-business-analytics.
"Financial Analysis Of Banking Sector And Creditworthiness In An Essay.." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/cmm723-statistics-for-business-analytics.
My Assignment Help (2020) Financial Analysis Of Banking Sector And Creditworthiness In An Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/cmm723-statistics-for-business-analytics
[Accessed 23 January 2025].
My Assignment Help. 'Financial Analysis Of Banking Sector And Creditworthiness In An Essay.' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/cmm723-statistics-for-business-analytics> accessed 23 January 2025.
My Assignment Help. Financial Analysis Of Banking Sector And Creditworthiness In An Essay. [Internet]. My Assignment Help. 2020 [cited 23 January 2025]. Available from: https://myassignmenthelp.com/free-samples/cmm723-statistics-for-business-analytics.