Academic Integrity Statement: You must adhere to the university regulations on academic conduct. Formal inquiry proceedings will be instigated if there is any suspicion of plagiarism or any other form of misconduct in your work. Refer to the University’s Assessment Regulations for Northumbria Awards if you are unclear as to the meaning of these terms. The latest copy is available on the University website.
· Do NOT submit code from other people or web sources as your own, this is plagiarism.
· Do NOT buy your assignments on the Internet or submit work written for you by others. This is ghosting.
· For the individual element Do NOT work with other students and submit identical code, this is collusion.
· Both plagiarism, ghosting, and collusion are academic misconduct, which is not allowed.
Failure to submit: The University requires all students to submit assessed coursework by the deadline stated in the assessment brief. Where coursework is submitted without approval after the published hand-in deadline, penalties will be applied as defined in the University Policy on the Late Submission of Work.
The aim of this assignment is to introduce a practical application of Big Data and Cloud Computing using a realistic big data problem. Students will implement a solution using an industry leading Cloud computing provider together with the distributed processing environment Apache Spark. This will involve the selection of problem appropriate Machine Learning algorithms and methods.
LO 1. Apply big data analytic algorithms, including those for visualization and cloud computing techniques to multi-terabyte datasets.
LO 2. Critically assess data analytic and machine learning algorithms to identify those that satisfy given big data problem requirements
LO 3. Critically evaluate and select appropriate big data analytic algorithms to solve a given problem, considering the processing time available and other aspects of the problem.
LO 4. Design and develop advanced big data applications that integrate with third party cloud computing services
LO 5. Critically assess the relationship between knowledge and the ethical and social interpretation of primary research using big data.
Portfolio Assignment: A collection of pieces of work
Individual Work: Work carried out by one person only
Group Work: Work carried out collaboratively seeking to improve each other’s elements
Peer Review: Critical analysis and subsequent grading of a social equal’s work
Semi-Formative: Training tasks assigned course credit to reward and ensure engagement.
Assignment Overview
The portfolio assignment is divided into components as follows:
Training Tasks (30%) |
Semi-formative elements of the portfolio constitute 30% of the assessment for this module and include, group, individual, and peer assessed work |
Combined Big Data Product and Report: (70%) |
Individual work – Combined Big Data Product and Report: This practical element is the final module assessment. |
|
|
Training Tasks
Training Task 1: Peer Reviewed Task (24%)
The objective of this task is to ensure that students have mastered these skills which are required for final module assessment:
1. Process a data set using the recommended software environment for the module.
2. Explaining the logical reasoning behind your code.
This work will be peer assessed as recommended the British Computer Society. That is, you will critically assess the work of fellow students (your peers) and THEY will assess yours.
In detail:
1. You will create a Jupyter notebook based the scenario below (which is derived from weekly worksheets 1-4) explaining your code using notebook embedded Markdown (i.e. formatted text, not just comments)
2. You will post your notebook to the module discussion board on Blackboard
3. You will then mark (i.e. peer review) the submission preceding yours on the discussion board, and the one following it, using the marking scheme below and post these mark sheets
4. Your mark for this task will be the average of your peer marks.
Scenario:
Suppose you are a police department with a limited budget. You plan to reduce road-traffic accidents by a one-month targeted advertising campaign.
Using the given dataset, which gender, age group, and month would be the largest target group as indicated by positive breath tests?