Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
In this lab you will learn to:
1.build a decision tree from the dataset,
Anaconda has scikit-learn package.
We will use scikit-learn package for decision tree analysis and prediction.
If we want to use 80% of original dataset as the training set, and 20% as the test set, which statement can help to do the right sampling?
Visualize the tree trained on your training set?
We first set a seed which is a number used by Python's random number generator. The major advantage of setting a seed is that we can get the same sequence of random numbers whenever we use the same seed.
Building a Decision Tree with Scikit-Learn Package
The “Bedrock UrbanVega” has been into the furniture business in Australia for a considerable amount of time. However, due to continuous competition in the industry in Australia, the company is not able to meet up the consistent change in customer expectations. Although, the process has been streamlined and as per the capability and skills household items data is collected over a period of time. In household items, furniture retailing has been a challenging market. As stated, customer sentiment is an issue and retail spending across the domestic has not only adversely affected the trading conditions but also the annualised growth of the industry as a whole (Ibisworld.com.au 2017).
The revenue for furniture retailing industry is projected to fall by 2%, and this primarily affects the discretionary income. Industry profitability posts varied results as the profits margins are hindered by softer retail economy and with products and sales it forecasts that the trading conditions can be improved if the growing volatility is met (PRWeb 2018). Also, retail spending will be affected by projected volatility in consumer sentiment through continued competition that are affected by rising interest rates from the external players.
As a data scientist, there are eight categories that are defined in the product segment across 1000 records for “Bedrock UrbanVega”. The attributes for the furniture product segment can be given with the nature of the attribute.
- product name – Furniture (categorical)
- product price – Ranging from $125 - $800 approximately (numerical)
- shipping type -Free or customer paid (categorical)
- monthly sales ($) – Ranging from $1500 - $26000 approximately (numerical)
- geographic region – all the regions of Australia (categorical)
- Of customers who bought the product (numerical)
- Customer type - New or existing (categorical)
Primarily these variables are taken because this can be studies across the classification technique undertaken for the study.
Data Source: The data sources of the variables are defined through the market undertaken across the eight attributes. The entries made in the records are over a period of 2-3 months. All the attributes are cleared and those entries are only taken that do not have any missing variables for a category. On the whole, the company has maintained its records to analyse the performance trend of the industry as well.
Methods Used: Rapid Miner Studio, one of the efficient software for data mining classifications is opted so that monthly sales for the geographic region could b predicted. The specific classifications that constitute for the comparative analysis used is decision tree and KNN (K-Nearest Neighbours) classification. These algorithms are used for simplest classification and regression problems. Also, optimal structure obtained is to examine the small changes in the data.
2.1 Sample Data
3.1 K-NN classification
The company “Bedrock UrbanVega” is interested knowing its profitability and whether it is able to meet up the changing demand and competition. As a result, for this the detailed information has helped in calculating the credit card scores and furniture products classification so that a decision could be taken whether a discount based on geographic area and the sales generated by the sales person is constituting for profitable results in long run or not. In response to this the K-NN classification has been cross validated to retrieve information from the applied model to analyse the performance of the company. The process can be further illustrated in the images below.
The KNN classification deals with eight categories across different variables for examining the company’s profitability for the sales data (Chauhan and Gautam 2015). This model has been used because it is not only simple but even converges to the correct decision surface as 1000 records of data had been used. The following class is for 1- Nearest Neighbour model across two values that are “True and False” in response to the model used. The prediction value of the model states that the results for the model is 95% accurate which further elaborates on the performance to rise based on the given attributes.
The graph below highlights the changes in performance of the company for the given model. H upward line graph depicts that Bedrock UrbanVega with continuous performance leads to increase in sales in the coming months. However, the performance vector given the contingency table of true and false and the results that with certain improvement in the system, the company will be able to yields the results of changes in customer demands and will help in establishing better trading conditions across the states of Australia.
3.2 Decision Tree
The decision tree model known b classical model uses “splitting rules” for the organized tree structure across the eight variables in the data (Rangra and Bansal 2014). The results of the test are assigned by the predictions values is based on geometric illustration. The conversion from binomial to numerical is to illustrate the outcomes and the chances of occurrence of the sales data and its predictability.
The results of decision tree is analysed on the 7 types of furniture undertaken for analysis.
To start with the discount option is analysed based on the “True or False” classification as suggested in K-NN model. The furniture products like chairs, console tables, and secretary desks, tables and writing desks leads to “False” results that discount is not applied on these items. Moreover, if customer types are new customers then discounts will not be given as they cannot be trusted after first buy whereas the same will be considered for existing customers as it is assumed that they will buy more than one products. The New customer types are further divided in different areas of Australia (geographic region) that are NSW, Queensland, Victoria, South Australia, Tasmania and Western Australia. However, when discounts for the sales are undertaken on performance scale then only New South Wales is seen to be a profitable region to increase the trade with new customers. Further, NSW has been classified on the monthly sales only when the sales are greater than $22,372. The decision tree below illustrates the results.
The possible recommendations that can be made to enhance the profitability and performance of the company to measure its sales can be given as:
- Firstly, the results need to be in favour for new customers and to expand in different geographic regions so initially a discount plan can be started.
- Secondly, the salesman getting the maximum product price, in this Jhon should be given those geographical regions where sales of the furniture products are less as he is a sales person with good communicative skills.
- Thirdly, except NSW, other areas with products of Sofas and game tables should be promoted.
- Fourthly, a common discount is needed to establish a large customer with change in customer demands.
The implementation needs to be based on the recommendations made to enhance the performance of Bedrock UrbanVega.
Targeted Time Period
Jhon and Maria
Tasmania, Western Australia, South Australia, Queensland Victoria
More than 1.5%
Less than 2.4%
The table above depicts the corresponding areas, discounts that can further enhance the performance of the company. However, a word of mouth/ advertisement will further enhance the scope of getting new clients and retaining them for a longer time. The competition will be closer to the existing competitors in the market and it will update on enlarging the customer base (Myrodia, Kristjansdottir and Hvam 2017).
However, with further incidence of sales person as per the previous months data ensures greater revenue from the untouched areas of Australia. On the contrary, with more diversification of products and styles can initiate and attract customers on their purchasing decisions.
To conclude, it can be stated that growth of Bedrock UrbanVega will aid the industry growth through the initiation of recommendations at an early stage. The analysis done on the 1000 records of the previous monthly sales and discounts offered to the customers resulted to be better for existing customers but not for new customers for different areas in Australia. However, the results were 95% accurate but a further affirmation of certain areas had led to static sales. The market share is bound to improve if the potential operation of the company sales is carried out for a certain period of time. In addition, with reference to data mining tools, the examination of the scenario was simple to interpret and validates the results later.
Chauhan, N. and Gautam, N., 2015. Parametric comparison of data mining tools. international journal of advanced technology in engineering and science, 3.
Ibisworld.com.au. (2017). Furniture Retailing – Australia Industry Research Reports | IBISWorld. [online] Available at: https://www.ibisworld.com.au/industry-trends/market-research-reports/retail-trade/other-store-based-retailing/furniture-retailing.html [Accessed 21 May 2018].
Myrodia, A., Kristjansdottir, K. and Hvam, L., 2017. Impact of product configuration systems on product profitability and costing accuracy. Computers in Industry, 88, pp.12-18.
PRWeb. (2018). Furniture Retailing in Australia Industry Market Research Report Now Updated by IBISWorld. [online] Available at: https://www.prweb.com/releases/2013/10/prweb11273968.htm [Accessed 21 May 2018].
Rangra, K. and Bansal, K.L., 2014. Comparative study of data mining tools. International journal of advanced research in computer science and software engineering, 4(6).
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2020). Decision Trees Essay For Classification And Regression Analysis With Scikit-Learn.. Retrieved from https://myassignmenthelp.com/free-samples/ict706-data-analytics/decision-tree-analysis-and-prediction.html.
"Decision Trees Essay For Classification And Regression Analysis With Scikit-Learn.." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/ict706-data-analytics/decision-tree-analysis-and-prediction.html.
My Assignment Help (2020) Decision Trees Essay For Classification And Regression Analysis With Scikit-Learn. [Online]. Available from: https://myassignmenthelp.com/free-samples/ict706-data-analytics/decision-tree-analysis-and-prediction.html
[Accessed 07 December 2023].
My Assignment Help. 'Decision Trees Essay For Classification And Regression Analysis With Scikit-Learn.' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/ict706-data-analytics/decision-tree-analysis-and-prediction.html> accessed 07 December 2023.
My Assignment Help. Decision Trees Essay For Classification And Regression Analysis With Scikit-Learn. [Internet]. My Assignment Help. 2020 [cited 07 December 2023]. Available from: https://myassignmenthelp.com/free-samples/ict706-data-analytics/decision-tree-analysis-and-prediction.html.