Question:
This task assesses your progress towards meeting Learning Outcomes 1, 2 and 3:
1. Be able to identify and analyse business requirements for the identification of patterns and trends in data sets
2. Be able to appraise the different approaches and categories of data mining problems.
3. Be able to compare and evaluate output patterns
Answer:
By analysing the PCA Component table, it can be seen that the variance asscooation of the first six principal components are caputrued around 95% portion of the total variance. Hence, it can be suggested to reduce the provided principal component table into a reduced matrix showcasing only those six pricnipal components.
Reduced principal component matrix
Further, analysis of the PCA matrix can be done to narrow down the above parameters to even more accurate results.
In regards to normalization of the data, it can be seen that total variance is not dominated by contribution from a single variable only and it is apparent that the contribution is coming from other variables also. Hence, normalization of data is not recommended in the given case, as it makes sense when the individual variance in a particular variable is high which tends to distort the total variance matrix as it becomes dominated by just one variable.
Major Advantages
- Reduction of multidimensional variables into lower dimensional variable
- Easy visualization in the cloud point
- Easy to interpret the spread of mean
- Orthogonal matrix form of matrix would provide freedom to analyze the result
- Direction of principal vector would be recognized easily because each axis is at the right angle of other respective axis Major disadvantages
- Cannot use for non-linear and complex relations existing between variables
- Difficult to examine the actual direction of the principal component vector
- Cannot use when the data is obtained from unknown distributions which are unlike Gaussian
- Cannot use for “categorical variable”
Direction of principal vector would be recognized easily because each axis is at the right angle of other respective axis
Major disadvantages
- Cannot use for non-linear and complex relations existing between variables
- Difficult to examine the actual direction of the principal component vector
- Cannot use when the data is obtained from unknown distributions which are unlike Gaussian
- Cannot use for “categorical variable”
Credict card (CC) = 0 (Customer does not has bank’s credit card)
Credict card (CC) = 1 (Customer is having bank’s credit card)
Loan = 0 (Customer would not take the loan offer)
Loan = 1 (Customer would take the loan offer)
The objective is to determine the probability that a cusomter who is utilizing the online banking service and also having bank’s credit card would take the loan offer.
Favorable cases = 44
Total cases = 441
Probability = Favorable cases / Total cases = 44/441 = 0.0997
Therefore, the probability that a cusomter who is utilizing the online banking service and also having bank’s credit card would take the loan offer is 9.97%.
Computation of following quantities P (A|B)
S. no.
|
Given Probability function
|
Favorable cases
|
Total cases
|
Favorable cases / Total cases
|
Probability or
proportion
|
(i)
|
|
75
|
256
|
75/256
|
0.292
|
(ii)
|
|
158
|
256
|
158/256
|
0.617
|
(iii)
|
|
256
|
3000
|
256/3000
|
0.0853
|
(iv)
|
|
652
|
2244
|
652/2244
|
0.2905
|
(v)
|
|
1336
|
2244
|
1336/2244
|
0.5953
|
(vi)
|
|
2244
|
2500
|
2244/2500
|
0.8976
|
On the basis of the quantities determined in part ©, the Naïve Bayes Probability can be determined and is shown below:
Favorable cases = (0.292)*(0.617)*(0.0853) = 0.01536
Total cases = (0.292)*(0.617)*(0.0853) + (0.2905)*(0.5953)*(0.8976) = 0.17059
Probability = Favorable cases / Total cases
Hence, the value of Naïve Bayes Probability has been determined as 9.004%.
- The chances of the customer for personal loan being offered can be optimized by ensuring the following two steps.
- Active usage of online services
- Possession of credit card issued by the bank