IntroductionÂ
Introduce the background of the selected project problem, write something about the current challenge for this problem, why data mining is useful to help and solve this problem, or why it is meaningful to perform data mining on this data or topic. In the introduction, you should make a short literature review and also cite the related references in literature that related to this problem. In the end of the introduction, clearly summary the objective of the study.Â
Â
Data ExplorationÂ
Detailed description of the data set used in this study
Data preprocessing performed, such as data cleaning, missing data processing, outlier detection/removal, data or variable transformation, categorical variable processing, PCA, signal processing, etc.
Data visualization: any useful graphical/visualization analysis of the data to help understand the data
Â
Prediction Methods
Describe the methods you have tried for tackling the data mining problem. You can describe each method in a separate sub-section with full information of how you explored the model, for example parameter tuning and optimization.
You may present in more details on any novel idea(s) or data analysis steps you explored to solve the data mining problem. This will bring your report to a higher level.
Describe your model evaluation process (training/testing procedure you employed)Â
Results
Summarize your experimental results in details for each method
Rank the performance of the explored data mining methods
Discuss the results and method comparisonsÂ
Â
Conclusion
Summarize your project work with emphasis on your findings. List if any potential problems as future works.Â
Â
Acknowledgement
This project is to fulfill a requirement of IE 6318 Data Mining and Analytics course in the University of Texas at Arlington. We would like to thank
Â
References
1. List the relevant articles if cited above
2. List the websites of online materials
3. List the resource link of dataset
4. Any related references related to the data mining problem and machine methods
Â
This document provides guidelines for your data mining project. Note that the project is a significant portion of your grade (50% of total grade), so you are expected to devote a reasonable amount of time to it in the second half of the semester and complete a clear project report for your data mining work. Data mining requires significant time of practice to accumulate hand-on experiences and improve your implementation skills.Â
You will implement the learned data mining methods and models to perform data mining tasks using your most familiar programming language, Matlab, Python or R. The project is a mandatory âkey assignmentâ, and you must complete the project in order to receive a grade in the class. Not submitting the project will result in an âincompleteâ grade for the course.
For graduate students (MS & PhD), each one needs complete an individual project.
For undergraduate students in session IE 4314, you can select to do an individual project, or you can form a team of 2 or 3 members for the data mining project.
The DataCamp Projects can be good templates for your data mining project:Â
Â
One-Page Project Proposal
Your project proposal must be typed and should be approximately one-page long. The purpose of the proposal is to help you sort and summarize your project ideas, and select your most interested data mining topic for project. We have selected a list of projects with datasets for you to choose. If you have particular research or study interests, you can also select a topic in interested areas of your research/study. In this case, you can select a public dataset within the topic. The possible data resources include Mendeley Data, Google Dataset Search, UCI Machine Learning Repository, Kaggle Datasets, or other public datasets on your interested areas, such as healthcare, energy, manufacturing, etc.Â
Â
We will review your project proposal and make sure you are on the right track. After submitting the project proposal, you will need to discuss with me to confirm and finalize your project topic and directions at Office Hours. We will give you project feedback comments so you can complete a high-quality data mining project.Â
Â
In your proposal you should cover the following items:
Tentative title of the project.
Abstract for your project topic. It should be one paragraph long, and should provide a high level summary of your project and outline your main goals. What is the major data mining problem and why it is meaningful to perform data mining on this data or topic?
Brief description of project plan.Â
Â
1. Describe the data briefly and provide the information of the data sources. We do not require significant effort on data collection in this project.
2. If you need do significant work to process raw data and convert it into the proper format for data mining. You can describe the expected data processing step.
3. What programming languages do you plan to use (Matlab/Python)? What other machine learning tools do you also plan to use (e.g., WEKA, Tableau, SAS, etc.
This is optional.)
4. How do you formulate the data mining problem? E.g., is it a classification task for discrete class labels, or a regression/prediction task for continuous response
variables? You can also do both classification and regression on one dataset. For example, you can discretize continuous response variable into multiple categories (such as low, medium, high), then we can convert the problem into a classification problem, and implement classification models.
5. Note describe what exactly are you trying to predict or classify. It is critical that your problem is well-defined.
6. What data mining methods tentatively to be implemented for the project? (e.g.,decision trees, KNN, Bayesian decision rules, LDA, neural networks, SVM,
Neural Networks etc.) We would like you to practice different classification/prediction models on your project, and compare the performance of different models. This is just a draft plan, and you can add more models later when you make more progress on your project.
7. Indicate what types of projects you are going to do. Research project or application-based project.
Â
Types of Projects
There are two main types of projects.
Research Project: you can decide to do a research project, where you look at a research issue. This could be original research, but could also be something straightforwardâsuch as an empirical evaluation of data mining methods or strategies for improving performance (e.g., a study about strategies for removing missing values, evaluate different feature selection algorithms using simulated and real-world datasets, explore recent machine learning and deep learning methods on some research data). If you would like to do research project, we could provide some research dataset for you to explore. And also provide some new data mining ideas to explore. This option mainly applies to PhD students and senior MS students with good programming skills.
Â
Application Based Project: this is the most common project format and many of you will select application-based project to explore some real-world data sets using learned data mining models and methods. You can select something interesting for data mining, practice essential data mining steps, including data preprocessing, data visualization, variable selection (optional), classification/prediction modeling, model parameter tuning, and model performance evaluation. You should make sure that your analysis is not trivial, and explore some meaning data mining tasks. For example, running a data set through WEKA and spending an hour on the analysis and then doing a quick write-up would be considered trivial. You should study the dataset, determine the issues, address any preprocessing issues, try multiple modeling techniques, and perhaps take some creative steps to try to improve the classification or predictive performance.Â
Â
Project Report
Each team will complete a data mining report at the end of the semester. It is very important for everyone to learn scientific writing for technical report. This is an
important skill for your future work. The project report need be well organized and clearly written. The following report sections can be taken as a reasonable template for your project report writing.
Abstract: summarizes the project and the goals of the data mining work (required)
Introduction: Introduces the project and what you are trying to do. Also include relevant information to introduce the data mining problems and why it is a meaningful topic. What are the motivations people do data mining on this topic.
Background: you may want a separate background section to provide domain information for the topic that you are studying. You can describe with citations to relevant papers, documents, or web recourses. For public datasets on an interesting topic, you can always find a lot of related work. Assume you are writing a technical paper to public readers, you can introduce the domain knowledge and problem background information clearly to help readers understand the problem and the filed. You can also combine background and introduction into one section (with subsections).
Dataset Description: Describes the experiments and the experimental setup for data collection based on the documents from data recourses. Will describe the explored data sets in details.
Data Mining Experiments: in this section, describe data mining experiments you have done, such as data processing, feature extraction, feature selection, data mining models and tools, data mining strategies you explored, the evaluation metrics, and any other work related to the data mining experiments.
Experimental Results: summarize the experiment results of different models and methods/ideas. A discussion of the results may be included.
Conclusion: Provide your conclusion. For example, comment on the quality of your results. You may also want to include some material on future work, whether or not you intend to do such work. A high quality data mining project may generate a conference or journal paper after the class.
References: you may cite some papers and documents/website in the sections above. Make a reference list with clear index.Â
Â
Project Presentation
Each one is required to present your data-mining project at the end of the semester during the exam week. The presentation will be 10-15 mins. You will introduce your studied data mining topic to the whole class audience and answer questions from the audience. All can clean from each other during the project presentation session.Â
Â
Final Project Submission
You will submit your final project package on Canvas. Final Project Submission Guidelines:
You can compress your project package into a .zip file, and name the file as âIE6318_Project_Fall2021_XXX_YYYâ, where XXX is your first name and YYY is your last name. Your submission package should be well organized and include:
1) Project report
2) Presentation slides
3) Raw dataset files you used in a sub-folder named âDataâ.
4) Source codes of the project in a sub-folder named âCodesâ. You should include the data files that loadable by your codes.
Â
If your project folder is very large and has issues with online submission, you can give your whole project package to me or GTA using a USB, hard drive, or a cloud drive (dropbox, google drive, etc.). We will put all your project files into our server the class records.Â