Load the data in the file CreditDefault.RData. The data frame dfdefault contains fictitious data on variable rate mortgage loans made by a bank to borrowers. The dataset contains the following variables:
InDefault: an indicator variable whose value is 1 if the borrower defaulted on its loan (i.e. did not provide payments due) last year, or zero otherwise.
MaritalStatus: an indicator variable whose value is 1 if the borrower is married, or zero otherwise.
LTV: the loan-to-value ratio, which is the ratio of the amount that was lent by the bank at the loan initiation over the total value of the property that was purchased.
DiffRate: the difference between the current interest charged on the loan and the interest rate that was charged at the initiation of the loan. Part (a),Perform a logistic regression of the response variable InDefault using predictors MaritalStatus, LTV, DiffRate. In other words, you try to predict if a borrower will default on his payments based on the loan’s and borrower’s characteristics. Don’t use the built-in glm function to estimate your logistic regression parameters; you should build your own code. To perform the optimization of coefficients, you can use for instance the function nlm or optim.
What are the coefficients β you obtain?
Qualitatively interpret the impact of each parameter (except the intercept): are their sign consistent with the intuition? • Plot an histogram for the sample distribution of the probability of default you obtain for all borrowers using your logistic regression model. Part (b), 10 points:
Calculate the training error rate of the estimated Bayes classifier (the classifier minimizing the error rate using estimated probabilities of default with the logistic regression).
Is the error rate large? Does this mean the classifier is a good one? Note: To answer this question, you might calculate the number of false positives (borrowers which are classified as default but did not default) and the number of false negatives (borrowers which are classified as non-default but did default).
Recall that the two classes are G = 1 for defaults and G = 0 for non-defaults. Use your logistic regression model to obtain a classifier which minimizes the estimated expected loss function L summarized in Table 1: Table 1: Loss function L(i, j) for different values of i and j i\j 0 1 0 0 1 1 ` 0 Recall that L(i, j) denotes the cost of classifying an observation in class j when its true class is i. Do this experiment for the following two values: ` = 30 and ` = 60. For each of these two values for `, provide