In this task, you need to use Principal Component Analysis (PCA) to understand the characteristics of the datasets.
When using the C-SVC SVM with the Gaussian radial basis kernel there are two tunable parameters, C (cost) and γ (gamma). To achieve the highest classifification rate possible it is very important to search for an optimal pair of these values.
You have been given the following combinations: [C=50, γ=0.01], [C=50, γ=10], [C=5, γ=1], [C=100, γ=0.01], [C=100, γ=10] and [C=100, γ=1].
You should train an SVM model for each combination from the given 6 combinations and then test it on the normalised validation set. The accuracy rate for each combination on the validation set should be reported. Finally, you need to select the best combination of parameters and report your result.
You should now be in a position to further test your model with the selected parameters by classifying the test data. With the normalised whole training set as the input fifile, you will need to train an SVM model with the suitable parameter values discovered for C and γ during Task 3. When the classifification model is built you will then need to use it to classify the normalised testing set, and report the accuracy rate.
Write a Python function that can locate false-positives, that is those patterns originally labeled as non-defective which are incorrectly predicted as defective, and report the results on the test set (3 marks).
3Summarize your fifindings and write your conclusions in critical thinking. For example, can you fifind any reason/reasons as to why you think those instances are misclassifified? (3 marks)