Please note this is not the assessment task. The task to be completed is detailed on the next page.
This CA will assess student attainment of the following minimum intended learning outcomes:
3. Critically appraise aggregation methods to process and manipulate data from multiple data structures. (Linked to PLO 3).
4. Formulate and evaluate a test and optimisation strategy for programmatic solutions. (Linked to PLO
2. Determine whether a given data analysis problem requires the use of supervised, semi-supervised or unsupervised learning methods. Develop and implement the chosen learning method. (Linked to PLO 1, PLO 2, PLO 4)
3. Implement a range of classification and regression techniques and detail / document their suitability for
a variety of problem domains. (Linked to PLO 5)
4. Critically evaluate the performance of Machine Learning models, propose strategies to optimise performance. (Linked to PLO 3)Â
1. Discuss the concepts, techniques and processes underlying data visualisation to critically evaluate visualisation approaches with respect to their suitability for different problem areas. (linked to PLO 1)
3. Engineer new features selection in data with the goal of improving the performance of machine learning models. (linked to PLO 2, PLO 4)
Attainment of the learning outcomes is the minimum requirement to achieve a Pass mark (40%). Higher
marks are awarded where there is evidence of achievement beyond this, in accordance with QQI
Assessment and Standards, Revised 2013, and summarised in the following table:
Students are advised to review and adhere to the submission requirements documented after the assessment task
A large amount of data has been collected by Dublin City Council (DCC) regarding Transport and Infrastructure in the Greater Dublin Area, This data is available at: https://data.gov.ie/organization/dublin-city-council?tags=Transport+and+Infrastructure You are required to choose a particular area of interest and formulate the appropriate questions for modeling and analysis. For Example (but not limited to):
? Clamping Appeals
? Multistorey Car Parking Space Availability
? Telecoms Underground Infrastructure DCC
? etcâ¦
Â
You are required to collect, process, analyse and interpret the data in order to identify possible issues/ problems at present and make predictions/ classifications in regards to the future. This analysis will rely on the available data from DCC and any additional data you deem necessary (with supporting evidence) to support your hypothesis for this scenario.
This will require you to employ critical analysis of not only the domain of choice but also for the regression and or classification that you undertake.
Statistics: (Graded out of 100) You need to analyse the data using statistical logic and statistical techniques. You are required to:
1. Summarize your data using Descriptive Statistics: Central Tendency, Measures of variability and graphs. You are required to plot at least two graphs. [0-40]
2. Use at least one discrete distribution (Binomial/Poisson) to explain/identify some information about your data.[0-30]
3. Use at least one Normal Distribution to explain/identify some information about your data. You must justify the use of the measures you calculated and the techniques you used. You are allowed to work with Python, but your mathematical reasoning must be documented in your report.[0-30
1. You must perform appropriate EDA on your dataset, rationalizing and detailing why you chose the specific methods and what insight you gained. [0-30]
2. You must also rationalise and detail all the methods used to prepare the data for ML. [0-20]
3. Appropriate visualizations must be used to engender insight into the dataset and to illustrate your final insights gained in your analysis. [0-30]
4. All design and implementation of your visualizations must be justified and detailed in full. [0-20] Machine learning for Data Analytics:(Graded out of 100)
1. Explain the reasoning for selecting one of the following machine learning approaches for the chosen dataset (supervised/ unsupervised/ semi-supervised). Discuss and explain the rationale for choosing the appropriate project management framework/ activities (CRISP-DM, KDD or SEMMA). [0 - 20]
2. There is a wide range of applications of Machine Learning models, including Prediction, Classification, and Clustering. It is recommended that you evaluate multiple approaches (at least two) with proper parameter selection based on hyperparameters and portray an analysis of the selected approaches. [0 - 30]
3. Perform the training and testing of the machine learning models, with cross validation/ GridsearchCV, to demonstrate the authenticity of the modelling outcomes. Show a comparison of two or more ML modelling outcomes using a Table or graph visualization. Critically evaluate and examine the performance of the Machine Learning models. [0 - 20]
4. Demonstrate the similarities and differences between your Machine Learning modelling results using tables or visualizations. Provide a report along with an explanation and interpretation to convince DCC of the relevance and effectiveness of your findings. [0 - 30]Â
The project must be explored programmatically, this means that you must implement suitable Python tools (code and/or libraries) to complete the analysis required. All of this is to be implemented in a Jupyter Notebook. Your codebook should be properly annotated. The project documentation must include sound justifications and explanation of your code choices. (code quality standards should also be applied) [0-10
All assessment submissions must meet the minimum requirements listed below. Failure to do so may have implications for the mark awarded.
All assessment submissions must:
? 5000 (+/- 10%) words in report (not including code, code comments, titles, references or citations)
? Report submission MUST be a word document; Code in a Jupyter Notebook file only but may be referenced in the word document.
? Be submitted by the deadline date specified or be subject to late submission penaltiesBe submitted via Moodle upload
? Use Harvard Referencing when citing third party material
? Be the studentâs own work.
? Include the CCT assessment cover page