Joe is the owner of a hardware store. Amongst the many items he sells are lawnmowers. He has just started selling a new model of lawnmower – the Green101, which has proved to be popular with his customers, even though it is relatively expensive. A problem for Joe is that there is often insufficient quantity of the mowers in stock, and means that customers must wait until he can have some new stock delivered. Even worse, some customers may not wish to wait for an order to come through, and purchase the mower from one of Joe’s competitors. Also, Joe is severely short of storeroom space, so he cannot afford to have too many of the mowers in stock. He would like to develop an inventory policy for the Green101.
The problem contains a number of probabilistic variables, and thus Joe would like to set up a simulation model to help him explore a number of possibilities.
Daily demand for the Green101 is subject to variability, and is thus a probabilistic variable. Table I shows the daily demand for the Green101 over the past 300 days. From this table, Joe can estimate, for example, that the probability of selling exactly two units of the Green 101 on any particular day is 0.20.
Table I: Demand and frequency for Green101
Demand 0 1 2 3 4 5
15 30 60 120 45 30
When Joe places an order to replenish his inventory of the Green101, it can take anywhere between 1 and 3 days for the stock to be delivered to his store; i.e., there is a 1 to 3 day lead time. Thus, lead time can also be considered a probabilistic variable. If the lead time for the order is 1 day, the order will not arrive the next morning, but at the beginning of the following working day. For example, assuming an order is placed on a Monday, if the lead time is 1 day the stock will arrive on the Wednesday, if the lead time is 2 days then the stock will arrive on the Thursday, and so on. Table II shows the lead time for the last 50 orders that Joe has placed. From this table, Joe can estimate, for example, that the probability of receiving new stock exactly two days after an has been ordered is 0.50.
Table II: Lead time and frequency for Green101 orders
Lead time 1 2 3
10 25 15
Joe is considering the following inventory policy. Whenever the day’s ending inventory reaches the re-order point of 5 units and there are no outstanding orders which have not yet arrived, Joe requests an additional 10 units from his supplier (i.e., the re-order quantity is 10). A 6-day snippet of the simulation is shown in Table III.
Table III: Simulation of Green101 inventory for 6 days
Here is an explanation of the simulation in the above table. It is assumed that the beginning inventory is 10 units. Since the demand on day 1 is 3 units, the ending inventory on day 1 is 7. This is above the re-order point of 5 units, so no order is placed on day 1. Since the demand on day 2 is 5 units, the ending inventory will be 2 units, and thus an order for 10 units will be placed. The lead time for the order is 2, which means that the 10 ordered units will not be received until day 5. There is a lost sale of 1 unit on day 3 because the demand on day 3 is 3 units, but the beginning inventory is only 2 units. Similarly, there are lost sales on day 4. Note that although the ending inventory on days 3 and 4 is below the re-order level, no orders are placed on these days because there is an outstanding order from day 2 which has not yet arrived.
There are various costs associated with the inventory policy. The cost of placing an order is $35 (this is a fixed cost and does not depend on the number of items in the order). The cost of holding a Green 101 in stock is $2,000 per mower per year (or $10 per day, over a 200-day year). Joe estimates that the cost of each lost sale is $150. Joe can easily calculate these costs from the spreadsheet above. For example, it can be seen that over the 6 days, 2 orders have been placed (2 x $35 = $70); 21 mowers have been held in stock (21 x $10 = $210); and there have been 4 lost sales (4 x $150 = $600). The cost over the 6 days is thus $880.
(a) Joe would like to know the yearly cost of this inventory policy.
Implement the policy using a spreadsheet, run a 200-day simulation (Joe’s store is open for 200 days a year), and estimate the yearly inventory cost. Note that you will probably observe considerable variability between different simulation trials. A solution to this is to run several trials (say, 10), and to calculate the average yearly inventory cost.
(b) Joe would like to experiment with some other values for re-order point and re-order quantity.
Complete the table below with the estimated cost corresponding to each combination of values for re-order point and re-order quantity. Once again, each of these should be the average over a sufficient number of trials. (NOTE: In order to avoid having to make many changes to your formulas, it will be much easier if you design your spreadsheet in such a way that re-order point and re-order quantity are accessed from cells containing these values; that is, you should be able to simply change the value in the cells containing these parameter values, and instantly see the results of the new simulation).
5 10 15
What to submit:
A report that includes: (i) a screen shot of your spreadsheet showing the first 15 days of your simulation for (a); and (ii) your calculations for the total inventory cost in (a), and the completed table for (b).
Marking criteria: ï‚· Correct implementation of the simulation model in Excel. ï‚· Averaged yearly inventory costs are correct. ï‚· Table is complete and all values are correct.
Problem 5: Mining a Bank Marketing dataset (10 marks)
You have just started working at a bank, and your boss has recently become interested in data mining, and particularly the opportunities that it might provide for direct marketing of some new investment products that his bank has created. Your boss knows that you have taken a course in decision support systems that included a component on data mining, and he would like you to provide him with some information on data mining and its use in direct marketing. He has referred you to the following paper, which he recently became aware of, but, given his lack of background knowledge in this area, finds difficult to understand: “Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology”, by Moro, Laureano and Cortez (2011). He would like you to access the datasets used in this paper, apply a number of data mining algorithms to this data, and to write a report on your investigation and findings.
(a) Obtaining the datasets
The datasets used in the paper by Moro et al can be found in the file bank.zip, which you will find at the URL: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. Note that the file bank.zip contains a number of files: ï‚· bank-names.txt, which contains, amongst other information, a description of the fields contained in the dataset; ï‚· bank-full.csv, which is the full dataset, containing 45,212 examples, and ï‚· bank.csv, which is the reduced dataset, containing 4,521 examples (10% of the samples in the full dataset).
For this exercise, you are to use the reduced dataset bank.csv. Note, however, that even though this file contains the extension ‘.csv’, it is not, in fact, a comma-separated file. You will need to do some pre-processing before you will be able to open this in WEKA. It is suggested that you open the file in a text editor that has find-and-replace capabilities, and replace the semicolon characters (i.e., ‘;’) with commas (i.e., ‘,’). You will probably also need to remove the quotation marks.
(b) Preliminary questions
Answer the following questions: i. How many features or attributes does the data contain? ii. How many examples does the data contain? iii. What is the name of the attribute that describes the class variable? iv. How many possible values can the class variable take? v. How many examples are affiliated with each of the classes?
After pre-processing the dataset appropriately, use the WEKA data mining toolkit to apply each of the following classifiers to it: ï‚· J48 (this is the WEKA version of Quinlan’s C4.5) ï‚· Logistic Regression ï‚· Naïve Bayes
Remember that we are mainly interested in the capability of the classifier to correctly predict the class of examples which have not been used in model construction, so you will have to choose your test options carefully.
Present the following results for each of the three classifiers: i. The confusion matrix using the format below (note that WEKA may present this differently):
ii. The accuracy measure iii. The precision measure iv. The recall measure
(e) Conclusions i. Which of accuracy, precision or recall do you think is the more important measure of performance for this problem? Why? ii. Recommend one of the three classifiers for this problem. Justify your answer.