Red Wine Dataset Analysis

Task 1: Data pre-processing and data exploration

a. Use Pandas to load data

b. Merge all the data with “quality” labels between 6-10 into Class 1 and similarly form Class 2 for the data with “quality” labels between 1-5.

c. Report the number of features and number of rows in each class

d. Choose an attribute and generate a boxplot for the two pre-defined classes.

e. Show one scatter plot, that is, one feature against another feature. It is your choice to show which two features you want to use.

Task 2: Computing probabilities using Python code for the given red wine dataset

f. Prior probability:

i. What is the probability of a wine classified as Class 1 (P(Class 1))?

ii. What is the probability of a wine classified as Class 2 (P(Class 2))?

g. Conditional probability:

i. What is the probability of a wine having a pH value greater than 3.6 given it is classified as Class 1 (P(pH>3.6|Class 1))?

h. Posterior probability

i. What is the probability of a wine classified as Class 1 when it has a pH value greater than 3.6?

Task 3: Writing a report to summarize what you have done. Explain figures you have put into your report clearly and report your findings and conclusions. The maximum number of pages is two and it should include less . (This report counts for 0 marks: this is a chance for you to practice on how to write a report and to obtain feedback from a tutor.) Please use a single column format. The font size should be set to 11 or 12-point size.