Data analytics plays a significant role to organizational managers and strategic thinkers because it provides insights that help in making profitable well-informed decisions. It also helps other stakeholders of an organization such as shareholders and investors before making decisions on where to invest their money. In this case study, Triss Merigold is an investor and she is interested in investing in Peer-to-Peer lending loans but she is stranded whether this is the best decision. Therefore, the study will go through Loan Club Investment Report and draw insights to help Triss make well-informed decisions.
- Decide on which loan listing is highly profitable and invest in it, for instance, loans that have high fully paid percentage status.
- Deciding which loans are less or more likely to default before making an investment strategy. For instance, deciding to invest in loans that have minimum default rate percentage status.
- Deciding on the most effective and efficient data to use for analysis.
- Calculating loans default status percentages as well as fully paid percentages and deciding on which loan to invest in based on the calculations.
- The objective of deciding and investing in a highly profitable loan is to ensure that Triss investment is profitable or has higher chances of being profitable.
- The objective of making decisions based on the default rate status is to enable Triss invest in a loan that has the least default rate status which will increase the chances of making returns and profits.
- Making decision on a data set to use is a significant aspect because the data set with the reliable and valid results will be very helpful in making well-informed investment decision to Triss.
- Determining loan default status and fully paid status is objectively meant to help Triss know which loan has high fully paid status and low default status and invest in it.
- In all of the above instances, the primary objective is to ensure that the loan that Triss will select to invest in has higher probability of being profitable and successful.
- Better decisions will be known when the loan chosen has low default rate and high full paid rate while bad decisions will be known when the loan chosen has high default rate and low fully paid rate status.
- Past data is significantly important in this case study because the historical data will enable Triss to determine trends and patterns of each loan listing in the Loan Club Inc. (Coleman et al., 2016, pp. 2151-2164). The historical data will enable her to know which loan has a positive trend and positive past features so that she can invest in that particular loan.
- Additionally, from the historical data we can make predictions on the most future promising loan for Triss to invest in.
How Triss will use past data to make decisions
- Performing descriptive analytics of loan features such as default rate, fully paid rate, expiry, charged-off, current and late to get a summary understanding of each loan listing and decide which one has positive features (Cao et al., 2017, pp. 4546-4554).
- Performing exploratory analysis to graphical visualize the data using for instance, line graphs, scatter plots to determine trends of the various loan listings in terms of profitability.
- Performing correlation analysis to determine the relationship of the loan features among each other to check if there is presence of multicollinearity. This will help in determining if indeed the loan features are significant at explaining the various loan listings.
- Performing a regression analysis to determine which loan features are significant at predicting a loan as well as the best loan to invest. Also, the regression analysis will enable Triss to have a model for predicting the best loan listing to invest.
Attribute |
Description |
ID |
A unique loan classifier assigned to each loan listing |
Loan_amnt |
A borrower’s loan applied amount |
Funded_amnt |
Amount committed or given to the applied loan. |
Term |
Payment numbers on loan and can be 36 or 60 months |
Int_rate |
Loan’s interest rate |
Grade |
Assigned loan grade |
Emp_length |
Borrower’s years of employment between 0 & 10 |
Home_ownership |
Borrower’s status of home ownership, for instance can be mortgage, rent. |
Annual_Inc |
Borrower’s annual income |
Verification status |
Shows whether the Loan Club verified or not verified the borrower’s income |
Issue_d |
Month on which loan was funded |
Loan status |
Status of the loan currently |
Purpose |
Indicates reason why the loan was requested. |
Zip_code |
Borrower’s address code provided during application of the loan |
Addr_state |
State of the borrower provided during the application |
dti |
This is a ratio that is determined by calculating the monthly debt payments of the borrower on the total debt obligations. |
Delinq_2yrs |
Number of 30+ days past-due delinquency incidences in the credit file of the borrower in the past 2 years. |
Earliest_cr_line |
Month in which earliest reported credit line of the borrower was opened |
Open_acc |
This is total open credit lines in the credit file of the borrower. |
Pub_rec |
Derogatory public records |
Revol_bal |
Total credit revolving balance |
Revol_util |
Credit amount the borrower is using relative to all available revolving credit |
Total_pymnt |
Payment received to date for the total funded amount |
Recoveries |
Post charge off gross recovery |
Last_fico_range_high |
Range of upper boundary of last FICO of the borrower |
Last_fico_range_low |
Range of lower boundary of last FICO of the borrower |
Term, Grade, Emp_length, Home_ownership, Verification_status, Issue_d, loan_status, purpose, zip_code, addr_state, earliest_cr_line.
ID, Loan_amnt, funded_amnt, Int_rate, Annual_Inc, dti, delinq_2yrs, open_acc, pub_rec, revol_bal, revol_util, total_pymnt, recoveries, last_fico_range_high, last_fico_range_low.
- The following variables are important to Triss as investor: loan_status, Emp_length, Home_ownership, purpose.
- The variables’ total_pymnt and loan_status are related. This is because the variable total_pymnt shows the amount of loan paid by a borrower and in this case it can be fully paid, defaulted or even expired yet at the same time, the variable loan_status indicates whether a loan is fully paid, defaulted, expired. Consequently, we can see that the two variables are reporting the same features of a loan and hence they are highly related.
- Definitely this affects the usability of the variables because when performing statistical analysis or modelling, then two strongly related explanatory variables cannot be both included in the modeling as only one variable has to be included. Only one variable of the two strongly related variables will have to be included in the modelling to avoid the effect of multicollinearity.
The following are data analytics tasks that will help Triss
- Data Preparation; This will entail data cleaning and processing to ensure that the raw data set is ready and suitable for various statistical analytics.
- Descriptive analytics: This involves descriptive summary analysis and exploratory analysis. This will help Triss to gain a summary analysis of the data set. Descriptive summary analysis will involve measures such as mean, median, mode, standard deviation, skewness. On the other hand, exploratory analysis will involve visualization of the data using graphs and charts such as line graphs, scatter plots, histograms.
- Correlation analysis. This analysis will help Triss determine the strength of association between variables (Gogtay and Thatte, 2017, pp. 78-81). Another significant merit of this task is that it will help Triss determine presence of multicollinearity between the factor variables for instance, between total_pymnt and loan_status. Multicollinearity occurs when two strongly related variables are included in modelling such as regression analysis. In such cases only one variable of the two is supposed to be included.
- Regression analysis. This task is required to enable Triss to come up with a model that will enable her to predict or explain loan status (target variable) (Sarstedt and Mooi, 2019, pp. 209-256). Therefore, logistic regression analysis will have to performed to determine which measure variables significantly explain and predict loan status of a particular loan listing.
- Target variable is loan_status.
Data mining stages are data preparation and data mining (Martin-Garcia et al., 2019, pp. 2484-2500).
- Purification of data: This stage involves cleaning a data set for instance, completion of missing values, consistency maintenance and the overall completion of data.
- Integration of data: Involves preparing data to set it free from reputations while at the same time maintaining reliability of the data set.
- Selection of data: This will entail picking only attributes of interest for analysis as per Triss who is the investor.
- Transformation of data: Entails transforming data into suitable forms for statistical analysis and data mining.
- Pattern evaluation: Determining patterns in the data set for instance, through exploratory analysis (Data Visualization). Also entails determining insights and relationships
- Knowledge representation: This will involve making a report to a client from the data outputs.
Triss can choose between models through use of various statistics depending on the task.
- Regression analysis; For instance, if its logistic regression, simple or multiple linear regression task, then Triss can use the R-Squared value to pick the best model. The higher the R-Squared value, the better the model.
- ARIMA/ARMA/GARCH Modeling: In this type of task or modelling, the best model can be picked using AIC-Akaike Information Criteria where the best model is one with the least AIC value.
To investigate stability of the models, Triss can perform the following (Gao et al., 2018, pp. 1501-1510);
- Divide the data set into a training and test data set. For instance, training data set to be from 2009 to 2016 while the test data set to be from 2016 to 2017.
- She should then use the training data set to make a model.
- After finding the model, she should use the model to predict future response values from 2016 to 2017.
- Then she has to compare the predicted values to the test data set.
- If there will be no significant deviation between the two, then the model is stable and when the deviation is significant, then the model is not stable.
Conclusion
Analyzing Loan Club Investment report and data set will significantly help Triss make well-informed investment decisions. For instance, getting information on the rates of fully paid loans and defaulted loan rate will be of great help. These statistical analytics; data preparation, descriptive analytics, correlation and regression analysis will help Triss develop an effective model to her in making an appropriate decision.
References
Cao, H., Wachowicz, M. and Cha, S., 2017, December. Developing an edge computing platform for real-time descriptive analytics. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 4546-4554). IEEE.
Coleman, S., Göb, R., Manco, G., Pievatolo, A., Tort?Martorell, X. and Reis, M.S., 2016. How can SMEs benefit from big data? Challenges and a path forward. Quality and Reliability Engineering International, 32(6), pp.2151-2164. https://doi.org/10.1002/qre.2008.
Gao, M., Wang, K. and He, L., 2018. Probabilistic model checking and scheduling implementation of an energy router system in energy Internet for green cities. IEEE Transactions on Industrial Informatics, 14(4), pp.1501-1510. https://ieeexplore.ieee.org/abstract/document/8252726.
Gogtay, N.J. and Thatte, U.M., 2017. Principles of correlation analysis. Journal of the Association of Physicians of India, 65(3), pp.78-81. https://www.kem.edu/wp-content/uploads/2012/06/9-Principles_of_correlation-1.pdf.
Martín?García, A.V., Martínez?Abad, F. and Reyes?González, D., 2019. TAM and stages of adoption of blended learning in higher education by application of data mining techniques. British Journal of Educational Technology, 50(5), pp.2484-2500. https://doi.org/10.1111/bjet.12831
Sarstedt, M. and Mooi, E., 2019. Regression analysis. In A Concise Guide to Market Research (pp. 209-256). Springer, Berlin, Heidelberg.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). Essay: Data Analytics In Investment Decisions - A Case Study On P2P Lending Loans.. Retrieved from https://myassignmenthelp.com/free-samples/be883-data-and-analytics/loan-club-investment-report-file-A1DCF53.html.
"Essay: Data Analytics In Investment Decisions - A Case Study On P2P Lending Loans.." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/be883-data-and-analytics/loan-club-investment-report-file-A1DCF53.html.
My Assignment Help (2022) Essay: Data Analytics In Investment Decisions - A Case Study On P2P Lending Loans. [Online]. Available from: https://myassignmenthelp.com/free-samples/be883-data-and-analytics/loan-club-investment-report-file-A1DCF53.html
[Accessed 21 November 2024].
My Assignment Help. 'Essay: Data Analytics In Investment Decisions - A Case Study On P2P Lending Loans.' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/be883-data-and-analytics/loan-club-investment-report-file-A1DCF53.html> accessed 21 November 2024.
My Assignment Help. Essay: Data Analytics In Investment Decisions - A Case Study On P2P Lending Loans. [Internet]. My Assignment Help. 2022 [cited 21 November 2024]. Available from: https://myassignmenthelp.com/free-samples/be883-data-and-analytics/loan-club-investment-report-file-A1DCF53.html.