Data Mining: Selecting Data - Applying Descriptive Analytics.

Data Mining Techniques

Selecting Non-Trivial Data Sets

Select non-trivial data set for mining. You can use one of the web sites where the real-world and large data sets are available to download or use data set which is available in your company. Data set should have only input attributes. They could also have important temporal or spatial component (not required). Therefore, you will concentrate on solving descriptive data mining task for this project 3. Check with TA and get approval for problem and data set for this project. First, become familiar with data set, and specify more precisely the problem you would try to solve (for example: clustering, pattern discovery, association rules, outliers’ detection, etc.). Then, go through the detailed process of data preparation. Apply all preprocessing and data reduction techniques you assume they are necessary and explain why. For every preprocessing technique: Explain what the technique is; Explain how it is applied; Show the summary results of the preprocessing. Do not include raw code or raw output or raw screen capture When the data set is with enough quality, perform several descriptive based techniques (minimum is 3 techniques). For every technique you applied: Explain what the technique is and how the technique works Explain what the parameters of the technique are and how the parameters are chosen and tuned. Explain and discuss the predictive results and performances of the technique. Do not include raw code or raw output or raw screen capture After you analyze individual techniques, you need to compare the performance of your applied techniques and discuss the pros and cons of each technique. Comparison of data mining results (and obtained descriptive models) with additional discussion will be very important part of your report. A conclusion section will be used to summarize your findings throughout the report. Give example of data samples, discuss how you would use your model on your samples and what result can be expected from your model. Illustrative Example of Topics in Previous Years Bike Sharing Analysis Dataset: https://www.citibikenyc.com/system-data Activity patterns reveal imbalances in the distribution of bikes and lead to a better understanding of the system structure. The paper tries to solve the problem of maintain system balance during peak rush-hour usage as well as rebalancing overnight to prepare the system for rush-hour usage. We analyze system data to discover the best placement of bikes to facilitate usage. Moreover, try find some association rules between the location of the stations and the flow in and flow out and what time of the day and flow of the stations. The second task here is turn data into different clusters and explain what the cluster means. The paper tries spatial clustering, temporal clustering and the combination of both. Price of used cars Dataset: Pricing second-hand cars is one of the most interesting problems in the world of data science. Used cars are typically valued based on the experts’ opinion. They usually evaluate some technical features to determine the price of a car. These features are typically including technical measures like mileage and the body condition which a practitioner can easily assess them and determine value of a car. Association rule mining are used to find what features of the car correlates to the price of the car. Customer Segmentation of Instacart Dataset: https://www.instacart.com/datasets/grocery-shopping-2017 The project is concerned with analyzing how the combination of clustering and association rules mining can be used to find customer segments in Instacart’s data-set. It uses K-Means clustering to group clients that buy similar items and the apriori algorithm to find association rules that help discover product buying patterns within each cluster. x

Get instant help from 5000+ experts for