Late Submission Penalties & Data Science Essay: Analysis

Penalties for Late Submissions and Module Learning Outcomes

Late submission penalties for coursework

• Late submission of any item of coursework for each day or part thereof (or for hard copy submission only, working day or part thereof) for up to five days after the published deadline, coursework relating to modules at Levels 0, 4, 5, 6 submitted late (including deferred coursework, but with the exception of referred coursework), will have the numeric grade reduced by 10 grade points until or unless the numeric grade reaches or is 40. Where the numeric grade awarded for the assessment is less than 40, no lateness penalty will be applied.

• Late submission of referred coursework will automatically be awarded a grade of zero (0).

• Coursework (including deferred coursework) submitted later than five days (five working days in the case of hard copy submission) after the published deadline will be awarded a grade of zero (0).

• Where genuine serious adverse circumstances apply, you may apply for an extension to the hand-in date, provided the extension is requested a reasonable period in advance of the deadline.

This Assignment assesses the following module Learning Outcomes (Take these from the module DMD):

1. Have knowledge and understanding of the fundamental mathematical ideas behind data science;

2. Have knowledge and understanding of relevant computational algorithms and the fundamentals of probability, information and statistical methods;

3. Have knowledge and understanding of producing and appreciating algorithmic definitions to provide useful data science analysis;

4. Be able to apply basic mathematical skills to simple data science problems;

5. Be able to implement algorithms and programs to analyze a given dataset;

6. Be able to make sensible recommendations of the nature of the data analyzed.

1. For undergraduate modules, a score above 40% represent a pass performance at honours level.

2. For postgraduate modules, a score of 50% or above represents a pass mark.

3. Modules may have several components of assessment and may require a pass in all elements. For further details, please consult the relevant Module Guide or ask the Module Leader.

Preparation: To do tasks set in this piece of work, you need to load the data using Pandas and change the labels: change ‘positive’ to the value of 1 and ‘negative’ to the value of 0.

Task 1: Divide the data set into a training set (from now on we shall refer to this as training set (I)) and a test set: write Python code to make the first 500 rows in the original data comprise the training set (I); the rest of rows in the original dataset will form the test set. Use Python code to check and report how many data points are labelled as 0’s (negative) in the training set and the test set, respectively, and how many data points are labelled as 1’s (positive) in the training set and the test set, respectively.

Task 2: PCA Analysis on the training set

a) Normalise the training set and the test set using StandardScaler() (Hint: the parameters should come from the training set only) .

b) Perform a PCA analysis on the training data set (I) and plot a scree plot to report variances captured by each principal component

c) Plot two subplots in one figure: in one subplot project the training set in the first two principal components’ projection space and label the training data using different colours in the picture according to its class; in the other subplot project the training set in the third and fourth principal components’ projection space and also label the test data using different colours according to its class

Task 3: Do a classification using the logistic regression model with a regularisation term

a) In your report, describe the model you have used, including :

What is the cost function? You need to give a mathematical expression describing it.

Which optimization algorithm has been used in your code?

Which regularisation term have you used?

Define your own function ([num1, index1, num2, index2]=misPatterns(predictions, labels)) using Python. The inputs of this function should be the predictions and labels in the test set; and the outputs of this function should ne the number (num1) of misclassified patterns whose label is 1 but was given prediction of 0 and their indices (index1) in the test set, and the number (num2) of misclassified patterns whose label is 0 but was given a prediction of 1 and their indices (index2) in the test set.

Get instant help from 5000+ experts for