MS4S08 Applied Statistics for Data Science

This assessment is worth 50% of your overall mark for this module. This is to be submitted via Turnitin before the deadline in PDF format. You have to use SAS to perform your statistical analyses and a formal report summarising your output and results should be produced. Your findings should be interpreted and valid conclusions drawn.

You are required to consider one or several topics of interest to you and find 3 data-sets from different sources relevant to these topics. The data sets need not be merged but must each provide a different source of information and provide meaningful insights.
1- Modelling (30%)
  • Using at least one of the datasets you have selected, you should use at least one modelling method (linear or logistic regression).
  • State clearly the method you use and the assumptions you make.
  • Write the final model, the results you produce including the residual analysis or the sensitivity-specificity analysis and your conclusions.
2 Analysis (50%)
  • Using at least two of the data-sets you have selected, you should do at least one different multivariate technique (PCA or Factor Analysis).
  • Using at least two of the data-sets you have selected, you should do a Cluster Analysis. Analyse the obtained clusters, using Descriptive Statistics.
  • State clearly the research question you want to answer, the methods you use, the results you produce and your conclusions. 

