This assessment is designed to allow you to demonstrate your understanding of big data and machine learning. A scenario is provided to help you explain your understanding using examples. You demonstrate your understanding by explaining various aspects of:
You work for a company that specialises in providing data insights for business using machine learning.
You have been approached by a large multi-national business with locations in many countries. Their products are all priced identically worldwide. The companyâs products are all manufactured locally and, as a result, the product manufacturing costs vary between countries.
Your client has computerised records of every sale. The records are stored using the fields in the table below. The client has heard about machine learning and the benefits it may bring to their business, but have no knowledge of how it works, or how it may benefit them.
Your manager has tasked you with producing a report for the clients that outlines the issues surrounding the volume of data they have, and how different types of machine learning may help them. Your manager has also asked that you outline some of the caveats associated with machine learning.
The data stored for each item is given in Table 1 Data entry per record. Items that are not in stock have the same data storage but use null values. This means, that with no stock at all, they would have 2 million entries. This record is kept for each individual item. On average there are 1 500 000 different items in stock, and 250 of each item.
Record |
Bytes |
Item Purchase Date |
10 |
Purchase Time |
8 |
Supplier |
255 |
Supplier Part No |
255 |
Our Part Number |
255 |
Cost |
9 |
Tax and Duty (%) |
5 |
Storage Costs |
9 |
Overheads |
9 |
Sales Price (RRP) |
9 |
Min Sales Price (Discounted) |
9 |
Item Sales Date |
10 |
Sales Time of Day |
8 |
Storage location |
8 |
Sales Location |
255 |
Total |
1,361 |
You have been provided with some Matlab code to use for this assessment. You are required to run this code to obtain data and plots for your assessment. The code will ask for your student ID number. This ensures that the data generated is unique to you. You must enter your valid student ID number. If your assessment contains data and plots that are not created by your ID number this may be used as evidence for academic misconduct.
By default, the code will produce data 855 000 different sales and 25 different product types. The product types are identified by a number 1-25.
The code produces some machine learning based graphs which you should use for your assessment. After running the code, you should save your plots using the âfile -> save asâ from each plot figure window. Saving them as Matlab figures allows you to reopen and manipulate them. Saving them as png will allow you to insert them into your report when finished.
These plots are provided to aid you in the assessment. They are not generated in the order in which you might use them. You are NOT required to use all the plots, you should select only those that help you.