\$20 Bonus + 25% OFF

0 Download11 Pages / 2,541 Words Add in library Click this icon and make it bookmark in your library to refer it later. GOT IT

Question

Choice 1: Programming ID3

The first option is to program ID3 using the algorithm described in class. You need to develop software to solve a supervised learning problem (ie. to build a model against a training set), then run the software against a test dataset and report the accuracy of the model. Your program should do the following things:

1. Read a training dataset and a test dataset. The datasets are in the form of text files. See below.
2. Build a model using the training data as
3. Print out a representation of the model (ie. the tree orsimilar).
4. Run the test data against the model, work out the accuracy of the model (ie. How many samples it classified correctly) and print out a confusion matrix to summarise the results.

The ID3 algorithm

You should build a decision tree using the ID3 algorithm given in the 3rd lecture (it is a pretty simple algorithm, feel free to learn it yourself if you choose to start this assignment before Week 3). This algorithm uses the information gain measure to calculate the splits. You should build the decision tree using the training data supplied, then calculate the error on the supplied test/validation data. Since the mushroom dataset is categorical, you will not need to consider the complexities added with real–valued attributes. There is missing data in the mushroom dataset (flagged by “?” values). Don’t treat the missing data specially. Just pretend that “?” is just another value for the attribute in question. Also, do not worry about pruning the tree.

The program must display a text representation of the decision tree. You are free to display the tree in any way you think makes sense, so long as it shows what attributes are tested at each node in the tree. It is acceptable to utilise diagnosis tools provided by machine learning packages for the display of the tree ** as long as the tree is built by your own program, i.e. it is NOT acceptable to form a 2nd tree using the package, and display the 2nd tree directly **.

Hint #1: The trick with building the decision tree is not really the ID3 algorithm which is fairly straightforward. The tricky bit is managing the dataset. Remember that you need to be able to easily split the dataset based on the value of a specific attribute. That means you need to devise a suitable data structure to easily do this split and to work out class frequencies.

Hint #2: Think carefully about the entropy function you need to use when calculating information gain. It’s not quite so simple as in our theoretical discussion. Specifically, what happens when all of the dataset you’re looking at has only one of the two class values? ie. all the mushrooms are edible or all are poisonous? How will you deal with this?

Hint #3: Follow carefully the online learning materials provided Week 3.

Choice 1-alternative: Programming an algorithm of your choice

The second option allows you to choose another algorithm to program, so long as you seek approval from me. One potential method is a multilayer perceptron neural network. You may use a supporting mathematical library to help with the details so long as you code the machine learning algorithm part yourself. Note: It is not acceptable to simply write code to call the Java Weka algorithm or the Python scikit-learn code for the algorithm. I expect you to write the main algorithm yourself. The dataset to be used for the classification (or regression) problem will need to be determined in consultation with me, but as a default we would probably use the mushroom dataset from choice 1 if it makes sense.

Choice 2: Doing a data mining project

The third choice is to use an existing package to solve a data mining problem. If you want to do this it will not be enough to just use one classification algorithm and copy the output. You need to explore the data, systematically try several algorithms and parameter settings to find the best (by evaluating the quality of the classifiers) and then provide a recommendation.

Cite This Work

[Accessed 19 April 2021].

My Assignment Help. 'Advanced Data Analytics' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html> accessed 19 April 2021.

My Assignment Help. Advanced Data Analytics [Internet]. My Assignment Help. 2021 [cited 19 April 2021]. Available from: https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html.

Latest Programing Samples

ICSI 402 Systems Programming

Answer: Introduction The main aim of this project is to develop a naive command line text based user interface to access the memory database. The c source code is used have used for the command line interface. All the data must be stored in the directory of memory database. The user can enter all the commands on a single line command to interact with the memory database. Description To create the naïve interface, the text based command...

QAC020C152A Object Oriented Programming

Answer: To develop the Bright College Management system, Java programming language is used. Java is an object-Oriented language and is the most appropriate language to use to develop the proposed system. By using different Object Oriented design patterns, the proposed application will be able to take advantage of most important object oriented design patterns like encapsulation and method overriding. To demonstrate the design of the propose...

IMAT5101 Object Oriented Programming

Answer: The implemented program is a product in a shop hhaving a particular quantity in stock, minimum stock level, and when this is reached it indicates that the product needs to be reordered and a reorder amount.The class product is designed to model a product where by it has the following attributes which are defined as varibles in the product class.    String name- which is the name of the product    int quantity- which...

COMP5813M Games Engines And Workflow

Answer: Bigelow et al. (2015) opined that memory management is one of the big issues in fundamental programming. Though, it's an important aspect to manage memory in the programming environment using C++ [1].  Lakhotia, Harman and Gross (2013) stated that smart pointers are the class objects which look as well as feel like pointer, but they are smarter [2].  This report is designed to explain the use of C++ language in memory manage...

DECO1400 Introduction To Web Design

Answer Introduction The internet has moved on by many a miles over the past years. People now use the internet from various devices that range from desktop computers, laptops, tablets and smartphones. Responsive web design or RWD is one of the most commonly used web designing approaches in the modern technological era (Mohammad & Tomberg, 2013). Through this approach only one website interface is designed and it is meant to suffice the ne...

Next

Just share requirement and get customized Solution.

Orders

Overall Rating

Experts

Our Amazing Features

On Time Delivery

Our writers make sure that all orders are submitted, prior to the deadline.

Plagiarism Free Work

Using reliable plagiarism detection software, Turnitin.com.We only provide customized 100 percent original papers.

24 X 7 Live Help

Feel free to contact our assignment writing services any time via phone, email or live chat. If you are unable to calculate word count online, ask our customer executives.

Services For All Subjects

Our writers can provide you professional writing assistance on any subject at any level.

Best Price Guarantee

Our best price guarantee ensures that the features we offer cannot be matched by any of the competitors.

Our Experts

5/5

610 Order Completed

100% Response Time

Cheryl Zhao

Singapore, Singapore

5/5

755 Order Completed

95% Response Time

Douglas Cowley

Wellington, New Zealand

5/5

1265 Order Completed

97% Response Time

James Cook

Wellington, New Zealand

4/5

1309 Order Completed

100% Response Time

FREE Tools

Plagiarism Checker

Get all your documents checked for plagiarism or duplicacy with us.

Essay Typer

Get different kinds of essays typed in minutes with clicks.

Chemical Equation Balancer

Balance any chemical equation in minutes just by entering the formula.

Word Counter & Page Calculator

Calculate the number of words and number of pages of all your academic documents.

Refer Just 5 Friends to Earn More than \$2000

1

1

1

Live Review

Our Mission Client Satisfaction

THANK YOU SO MUCH FOR YOUR ASSISTANCE AND HELP DURING THESE TIMES! YOU ARE ALL AMAZING

User Id: 628956 - 19 Apr 2021

Australia

happy with the work happy with the work happy with the work happy with happy with the work h the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy wi...

User Id: 618488 - 19 Apr 2021

Australia

happy with the work ,happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy wit...

User Id: 618488 - 19 Apr 2021

Australia

Written really well and answered the questions well. Straight to the point and citations are placed well. Well informated.

Australia

Order on the go!

Say hello to our new app