The PCA analysis for the given data on US utilities data has been carried out with the aid of XLMiner and the relevant matrix consisting of the applicable Eigen values has been presented in the output below.
Result Analysis and Component Analysis
One of the key requirements to analyze the above output is to decide the extent of variation that the researcher aims to explain. This is imperative since the significant principal components are decided on this basis. For instance, assuming that 80% variance is considered, the critical principal components would be the first four which cumulatively account for 79.91% of the total variance. Further, the key features pertaining to each of the principal components along with their interpretation are highlighted in the table below.
From the above analysis, it may be inferred that the key features which are of significance are and x8. This list can be further shortened to reflect on even more significant features of utility for comparison.
With regards to normalization, the first aspect is to understand the need for the same. Normalization as a process tends to take the standard deviation and mean into consideration and normalizes the given values so that the higher value scales of any given variable does not have any distorting effect on the analysis output. In the given case the post normalisation output is not different or better in comparison to the non-normalised data as highlighted below.
Considering the above output, no need of normalisation arises.
Advantages of principal component analysis
- It minimizes the dimension of complex dataset which is termed as automate reduction technique of PCA.
- The new set of dataset is called as principal components which do not show any association with the original dataset. Therefore, the correlation coefficient between the original variable and principal component is zero.
- Orthogonal mapping is used in PCA which is simpler method to analyse the dataset.
Disadvantages of principal component analysis
- Hard to examine the accurate direction of principal component vector
- Not applicable when the nature of the data set variable is categorical
- Not applicable when the components are having complex nonlinear relations
- The principal data points exist in m-dimensional plane on which each axis is at the 90 degree of another axis and hence, difficult to examine the real direction of principal vector.
Total data set = 5000 customers
Number of variables = 14
Number of predictor = 2 (CC and Online)
XLMiner is used to make partition of the data into 60% training data set and 40% validation data set. The output of XLMiner after standard partition is highlighted below:
s means the customer is not holding the credit card issued by universal bank. Same is used for variable online and loan.
- Probability (Customer who accepts loan, have credit card, use online service)
Favorable events = 51 and total events = 522
Probability = 51/522 = 0.097 or
There is 0.097 or 9.7% probability that a customer, who accepts loan, have credit card and uses online service.
- Design of pivot table and computation of the given quantities P (A|B)
Pivot table (Online column variable, loan row variable)
In the above pivot table, zero indicates “NO” and one indicates “Yes.” It means when Online = 1, this means the customer is user of online net banking service of universal bank. Same is used for variable cc and loan.
Based on the red highlighted values in the above pivot table, the requisite quantities would be determined.
- The proportion of total number of customer who has owned the credit card of bank and would also take the offer of personal loan. = 93 /304 = 0.305
- The probability that customers who are active user of online banking service of universal bank would also take the offer of personal loan
- The proportion of total number of customer who would take the offer of personal loan from Universal Bank
- The probability that the total number of customer who has owned the credit card of bank would not take the offer of personal loan
- The probability that the total number of customer who are the user online banking service of universal bank but would not take the offer of personal loan
- The probability that number of customers would not take the offer of personal loan
Naïve Bayes Probability
The numerator is the multiplication of the first three quantities of above calculation which the customer would take the offer of personal loan (Loan =1) i.e.
In addition to this, the numerator would also have some conditional quantities with respect to the loan offer and hence this is the multiplication of the last three quantities of above calculation i.e.
The favorable event of Naïve Bayes Probability Total possible event of Naïve Bayes Probability = sum of these two numerators = Thus, Naïve Bayes Probability is 10.6%.
- The customer can take the following measures in order to enhance the chances of getting loan.
- Getting a credit card issued from the bank
- Actively using the online bank services on offer by the bank
By doing the above two, chances of getting loan would improve for the customers. This is evident from the conditional probability of customers having credit cards and issued loan being higher than the corresponding conditional probability of customers not having credit cards and being given loan. Similar observation is noticed for online services as well.