7COM1018 Data Mining
This Assignment assesses the following module Learning Outcomes (from Definitive Module Document)
1.Successful students will typically:
2. be able to appreciate the strengths and limitations of various data mining models.
3. be able to critically evaluate, articulate and utilise a range of techniques for designing data mining systems.
4. be able to understand and reflect on the underlying ethical and legal issues and constraints on the holding and the use of data;
5. be able to critically evaluate different algorithms and models of data mining.
In the workplace, you have been assigned to a new project. For this assignment you can chose what this project is for example it could be “indentifying poison and edible mushrooms”, “recognizing different groups of customers” or “recognising purchasing habits in supermarket” or “tracking different types of land use around rainforests” … you may use one of these examples or come up with your own.
At your next meeting with management, you have been asked to explain how ONE of the following algorithms works:
• ID3 (Information Gain Decision Tree) (classification)
• Naïve Bayes (Multinomial Naïve Bayes) (classification)
• K-means (clustering)
• DBSCAN (clustering)
• Apriori (Association Mining)
Your response must include:
1. A technical explanation, articulating how the algorithm works, showing how to work out the algorithm example by hand, using your own small example (14 marks)
2. Comments on the strength and limitations of the algorithm (6 marks)
3. Critically evaluate the algorithm for your given use case and compare with other similar algorithms and use-cases in research, the papers should be referenced, how you do this your choice
4. Describe and reflect on the ethical considerations for using this algorithm, for example could the algorithm produce bias results; how would this happen?
In summary, the assignment is not to complete a data science project. Your task is to create a piece of work explaining an algorithm (for example a video) while considering a hypothetical data science project (which can be of your choosing). The flexibility in the selection of algorithm, topic and response allows you perform at your best.
In summary, your task is to explain how one the data mining algorithms (listed above) works and comment on it fitness for a particular problem (which you may chose).
You may choose from the below on how you respond to this assignment,
• Video featuring a whiteboard / drawing app / pen and paper /
• Voiced over PowerPoint
• Large Poster with an Audio RecordingSchool of Physics, Engineering and Computer Science
• Technical Document
All length limits are flexible (+/- 10% and do not include figures, captions, and references). There are no marks for production quality although we kindly ask that make sure the video and audio quality is fit for purpose, (standard built in webcam and microphones should be suitable). For advise please speak to the module leader. The videos or documents are intended for a professional environment. Accepted formats for videos: mp4, webm, flv, mkv, avi, mov and wmv. Accepted formats for voiced over PowerPoints: pptx. Accepted formats voice over if separate to PowerPoint: mp3, wav, ogg, aac, wma and m4a. Accepted formats for posters and documents: pdf, docx, odt, png and svg. Referencing format is flexible, when using a video, references can appear on screen or be spoken either will be accepted (please identify the title, author, and the year).
Marks awarded for:
This assignment is worth 40% of the overall assessment for this module.
Marks will be awarded out of 40 in the proportion:
See marking scheme below.
A reminder that all work should be your own.
Videos/reports exceeding the maximum length may not be marked beyond length limit.