Purpose
This assessment is designed to allow you to demonstrate your understanding of big data and machine learning. A scenario is provided to help you explain your understanding using examples. You demonstrate your understanding by explaining various aspects of:
- How big data affects machine learning in terms of which algorithms may or may not be suitable
- How these algorithms may be used to provide useful insights into big data
- What other big data techniques may use useful, e.g. parallel processing, and how we make use of these ideas
- The difference between creating and using our machine learning model.
You work for a company that specialises in providing data insights for business using machine learning.
You have been approached by a large multi-national business with locations in many countries. Their products are all priced identically worldwide. The companyâs products are all manufactured locally and, as a result, the product manufacturing costs vary between countries.
Your client has computerised records of every sale. The records are stored using the fields in the table below. The client has heard about machine learning and the benefits it may bring to their business, but have no knowledge of how it works, or how it may benefit them.
Your manager has tasked you with producing a report for the clients that outlines the issues surrounding the volume of data they have, and how different types of machine learning may help them. Your manager has also asked that you outline some of the caveats associated with machine learning.
- They wish to analyse the data to adjust prices by country or region. The aim is to maximise profits by understanding how much customers pay for their goods and services, which items are profitable, and which may be kept as premium, âshowcaseâ items.
- They wish to analyse the data to group the sales, based on sales price, into the quality ranges:
- Premium
- Standard
- Budget
- They wish to change their manufacturing so that:
- High Profit per item products are to be manufactured centrally and shipped around the world. This will minimise the impact of shipping costs. These will be considered premium items use to attract high end customers. Total profits are not important.
- High total profit, low cost. These will continue to be manufactured locally. The total profit can be medium, or high
- Products with a low profit per item and low total profit will no longer be manufactured.
- The client carries 2 million different products and makes 750 000 sales per day.
The data stored for each item is given in Table 1 Data entry per record. Items that are not in stock have the same data storage but use null values. This means, that with no stock at all, they would have 2 million entries. This record is kept for each individual item. On average there are 1 500 000 different items in stock, and 250 of each item.
Record
|
Bytes
|
Item Purchase Date
|
10
|
Purchase Time
|
8
|
Supplier
|
255
|
Supplier Part No
|
255
|
Our Part Number
|
255
|
Cost
|
9
|
Tax and Duty (%)
|
5
|
Storage Costs
|
9
|
Overheads
|
9
|
Sales Price (RRP)
|
9
|
Min Sales Price (Discounted)
|
9
|
Item Sales Date
|
10
|
Sales Time of Day
|
8
|
Storage location
|
8
|
Sales Location
|
255
|
Total
|
1,361
|
You have been provided with some Matlab code to use for this assessment. You are required to run this code to obtain data and plots for your assessment. The code will ask for your student ID number. This ensures that the data generated is unique to you. You must enter your valid student ID number. If your assessment contains data and plots that are not created by your ID number this may be used as evidence for academic misconduct.
By default, the code will produce data 855 000 different sales and 25 different product types. The product types are identified by a number 1-25.
The code produces some machine learning based graphs which you should use for your assessment. After running the code, you should save your plots using the âfile -> save asâ from each plot figure window. Saving them as Matlab figures allows you to reopen and manipulate them. Saving them as png will allow you to insert them into your report when finished.
- Show the cost price vs sales price for every item sold.
- Shows the cost price vs profit for each product number. You will see small groups of data, in 25 different colours.
- Shows each of the 25 items and their profit on each sale. The items have been grouped using machine learning into high, medium and low profit per item sold and coloured accordingly.
- Shows each of the 25 items and the mean profit per item of that type. The sales have been further grouped by the total profit per each item, i.e. if a product sells many items, but a low profit each, then it may be in the âhigh total profitâ group, but low on the âprofit per single itemâ axis.
- Show the mean cost price per item type, again coloured by the total profit for each item type.
- Shows the total profit for each item type. The items have been grouped by total profit using machine learning and coloured accordingly.
- Shows the sales prices for every sale of each item type. No machine learning has been applied.
- Shows the cost price vs sales price for every item sold.
- The data has been grouped into budget, standard and premium items using machine learning, according to the rules above (Scenario 2) and show a box plot of the profits.
- The data has been grouped into budget, standard and premium items using machine learning, according to the rules above (Scenario 2) and show a box plot of the cost prices.
These plots are provided to aid you in the assessment. They are not generated in the order in which you might use them. You are NOT required to use all the plots, you should select only those that help you.