Apply matrix algebra to summarize and process multivariate data.
Construct linear and logistic regression models to solve business prediction problems.
In answering questions of the assignment, show clearly the steps you take in arriving at your solutions. Keep at least four decimal places in the final answer for statistical computations or otherwise specified.
Except Question 5 which require you to utilize R, the rest of the questions must be answered manually.
The soft copy of handwritten or typed answers of Questions 1 to 4 and the analysis report for Question 5 (in Word) must be uploaded to OLE by the due date. The R program of Question 5 must also be uploaded to “Assignment 1 – R program”. (Note: The assignment and R program will be checked by Turnitin and zero mark will be given to plagiarized works.)
Question 1
Question 2
Solve the following system of equations by Matrix method:
(Note: zero mark will be given for non-matrix method.)
Question 3
Question 4
Question 5. A researcher wishes to predict human wine taste preferences that is based on easily available analytical tests at the certification step. A dataset “winequality-red.csv” is considered with 1599 white wine samples (from Portugal). The independent variables include the values of the following 11 physicochemical tests:
1. Fixed acidity (g(tartaric acid)/dm3)
2. Volatile acidity (g(acetic acid)/dm3)
3. Citric acid (g/dm3)
4. Residual sugar (g/dm3)
5. Chlorides (g(sodium chloride)/dm3)
6. Free sulfur dioxide (mg/dm3)
7. Total sulfur dioxide (mg/dm3)
8. Density (g/cm3)
9. pH
10. Sulphates (g(potassium sulphate)/dm3)
11. Alcohol (vol.%)
The dependent variable ‘Quality’ is sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). (Note: The dependent variable is ordinal but an “ordinal approximation of a continuous variable” can be made for ordinal variables with five or more categories. The “winequality-red.csv” dataset can be downloaded from the OLE.)
(a) Utilize R to determine the multiple linear regression model to predict the quality of red wine by considering which independent variable(s) be included in the model among the other given variables using stepwise regression (forward). You are expected to perform relevant model checking including relevant graphs plotting after the desired model is formulated. All R programs must be included in the answer and marks will be deducted if failing to do so.
(40 marks)
(b) Perform relevant hypothesis testing to assess the validity of the multiple linear regression model obtained as well as the validity of individual regression coefficients. (5 marks)
(c) Interpret the regression coefficients of the model. (5 marks)
(d) Write a reflective journal of not more than 200 words that summarizes your learning experience in applying knowledge and skills acquired in the course to build the regression model for the given problem, and that explain how this experience could enrich your ability to apply course knowledge to real life applications. (10 marks)