To participate in the competition, you must provide a list of predicted outputs for the instances on the
Kaggle website. To solve the problem, we expect you to try the following methods:
• A baseline of SVM or logistic regression: using your own implementation or using a library.
• Any other ML method of your choice. Be creative! Some suggestions are neural network trained by back-propagation, k-NN, random forests, kernelized SVM, CNN’s, etc.
For the Kaggle competition, you can submit results from your best performing system.
Note: We suggest you to start early, allowing yourself enough time to submit multiple times and get a sense of how well you are doing.
In addition to your methods, you must write up a report that details the pre-processing, validation, algorithmic, and optimization techniques, as well as providing your Kaggle results that we compare them with. The report should contain the following sections and elements:
• Feature design: Describe and justify your pre-processing methods, and how you designed
and selected your features.
• Algorithms: Give an overview of the learning algorithms used without going into too much detail in the class notes (e.g. SVM derivation, etc.), unless you judged necessary.
• Methodology: Include any decisions about training/validation split, distribution choice for Naive Bayes, regularization strategy, any optimization tricks, setting hyper-parameters, etc.
• Results: Present a detailed analysis of your results, including graphs and tables as appropriate.
This analysis should be broader than just the Kaggle result: include a short comparison of the most important hyper- parameters and all the methods you implemented.
•Discussion: Discuss the pros/cons of your approach methodology and suggest areas of future work.
•References (optional).
• Appendix (optional). Here you can include additional results, more detail of the methods, etc.
The main text of the report should not exceed 6 pages. References and appendix can be in excess of
the 6 pages.
We are expect you to follow these rules:
• You must submit the code developed during the project. The code must be well documented.
The code should include a README file containing instructions on how to run the code.
• Make sure to fix the random seeds so that the generated predictions are exactly matching your submitted prediction file.
• You should submit your result in .csv format. More information about the correct structure and format could be found in Kaggle website (go to : Overview→ Evaluation).
•You must submit a written report according to the general layout described above.
Marks will be attributed based on 50% for performance on the private test set in the competition and 50% for the written report. For the competition, the performance grade will be calculated as follows: The top team, according to the score on the private test set, will receive 100%. If the team doesn’t cross the basic baseline, entered by the instructor, will score 0%. All other grades will be calculated according to the interpolation of the private test set scores between those two extremes.
For the written report, the evaluation criteria include:
• Technical correctness of the description of the algorithms (may be validated with the submitted code).
•Meaningful analysis of final and intermediate results.
•Clarity of descriptions, plots, figures, tables.
• Organization and writing. Please use a spell-checker and don’t underestimate the power of a well- written report.
Do note that the grading of the report will emphasize the rationale behind the pre-processing and optimization techniques. The code should be clear enough to reflect the logic articulated in the report.
We are looking for a combination of insight and clarity when grading the reports.