In the course assignments you were instructed to access a dataset compatible with supervised machine learning based classification or regression. You are free to explore unsupervised learning as applied to your supervised learning dataset, should you so choose to do so. You will either reuse your dataset from the earlier assignments, or if you want something more interesting, challenging, or just want more experience, you can access a new dataset (or use both your old dataset and new dataset. In this project, students are expected to demonstrate creativity in the application of the pattern recognition techniques taught in this course, as well as techniques that build on the concepts taught in this course. Many example course projects have been discussed in class lectures. You will be graded based on the quantity and quality (correctness, challenge, etc.) of the techniques you implement in your project analyzing the dataset(s)
Question 1: Provide a point-by-point summary of very brief (1 sentence) statements that outline what you’ve completed as part of your course project. Examples: (you can choose any task under the sun, it doesn’t have to be these ones and bonus points for being creative and performing challenging coding tasks)
a) Obtained astronomical (or other) dataset for supervised machine learning (SL)
b) Validation of AdaBoost with varying base learner models performed
c) Implemented deep learning on my dataset with varying degrees of depth, and included a comparative analysis between them
d) Detailed analysis of random forest parameter variability effect on performance
e) Application of K-Means unsupervised learning with comparison to SL
f) Implemented (from scratch) the code for an existing learning algorithm and validated its performance on this dataset
g) Accessed a new dataset for regression, and implemented a deep learning network targeting my application’s regression variable
h) Accessed a new dataset compatible with Recurrent Neural Networks (natural language processing, time series analysis, etc.), and implemented a Long Short Term Memory RNN architecture for that application
i) Accessed a dataset where we want to localize something of interest within an image and implemented a UNet deep learner for that application
Question 2: This is identical to Assignment 1/2, Question 2. If you are using the same dataset, just reuse your previous answer (paste it here), if using a new dataset, describe it here. Describe the dataset you have collected: total number of samples, total number of measurements, brief description of the measurements included, nature of the group of interest and what differentiates it from the other samples, sample counts for your group of interest and sample count for the group(s) not of interest. Write a program that analyzes each measurement individually. For each measurement, compute the area under the receiver operating characteristic curve (AUC). Provide an output of the 10 leading measurements(highest AUC – furthest from 0.5), making it clear what those measurements represent in your dataset (these are the measurements with the most obvious potential to inform prediction in any given machine learning algorithm), and what the corresponding AUC values are. Provide this code. Note: if you use an advanced dataset, such as an imaging dataset or a natural language processing dataset, etc., you might not be able to provide a listing of the 10 leading AUC values as there may not be such specific numerical feature measurements to report on with an AUC analysis. In such a situation, please do your best to describe the dataset verbally very clearly while reporting on all the above dataset parameters that you are capable of given your dataset’s constraints.
Question 3: Provide a detailed description of what you’ve done for each point from Question 1 (keep them labelled clearly so they can be matched to the list in Question 1). Provide code and sensibly organized results (output) for us to assess what you’ve done for each sentence / bullet point from question 1. Providing insights into why your machine learning models and your experiments in Question 3 are behaving the way that they do is required and will be highly beneficial to your project grade (i.e. the equivalent to providing insightful answers to verbal questions from the assignments, without us being able to pose the questions to you since we don’t yet know what you will choose to
pursue – for example verbally describe why you think one thing you tried outperformed another thing that you tried!).
As mentioned in class repeatedly, a good strategy for this course is to consistently make efforts towards your course project throughout the term. Since students are typically only capable of the easier tasks earlier on in the term, it is recommended to get started as soon as possible with self-directed extensions to what was asked of you in assignments 1, 2 and 3, which you will be capable of working on immediately after completing those assignments. In pursuit of a very strong grade, I recommend transitioning to a more challenging project goal later in term, once deep learning topics have been introduced.