7COM1018 Data Mining
This Assignment assesses the following module Learning Outcomes (from Definitive Module Document):
Successful students will typically:
2. be able to appreciate the strengths and limitations of various data mining models.
3. be able to critically evaluate, articulate and utilise a range of techniques for designing data mining systems.
4. be able to understand and reflect on the underlying ethical and legal issues and constraints on the holding and the use of data;
5. be able to critically evaluate different algorithms and models of data mining.
In the workplace, you have been assigned to a new project. For this assignment you can choswhat this project is for example it could be “indentifying poison and edible mushrooms”, “recognizing different groups of customers” or “recognising purchasing habits in supermarket” or “tracking different types of land use around rainforests” … you may use one of these examples or come up with your own.
At your next meeting with management, you have been asked to explain how ONE of the following algorithms works:
• ID3 (Information Gain Decision Tree) (classification)
• Naïve Bayes (Multinomial Naïve Bayes) (classification)
• K-means (clustering)
• DBSCAN (clustering)
• Apriori (Association Mining)
Your response must include:
1. A technical explanation, articulating how the algorithm works, showing how to work out the algorithm example by hand, using your own small example
2. Comments on the strength and limitations of the algorithm (6 marks)
3. Critically evaluate the algorithm for your given use case and compare with other similar algorithms and use-cases in research, the papers should be referenced, how you do this your choice.
4. Describe and reflect on the ethical considerations for using this algorithm, for example could the algorithm produce bias results; how would this happen? (5 marks) In summary, the assignment is not to complete a data science project. Your task is to create a piece of work explaining an algorithm (for example a video) while considering a hypothetical data science project (which can be of your choosing). The flexibility in the selection of algorithm, topic and response allows you perform at your best.
In summary, your task is to explain how one the data mining algorithms (listed above) works and comment on it fitness for a particular problem (which you may chose).
You may choose from the below on how you respond to this assignment,
• Video featuring a whiteboard / drawing app / pen and paper / PowerPoint (max. 16 minutes)
• Voiced over PowerPoint (max. 16 minutes)
• Large Poster with an Audio Recording (max. 16 minutes)