The first option is to program ID3 using the algorithm described in class. You need to develop software to solve a supervised learning problem (ie. to build a model against a training set), then run the software against a test dataset and report the accuracy of the model. Your program should do the following things:
You should build a decision tree using the ID3 algorithm given in the 3rd lecture (it is a pretty simple algorithm, feel free to learn it yourself if you choose to start this assignment before Week 3). This algorithm uses the information gain measure to calculate the splits. You should build the decision tree using the training data supplied, then calculate the error on the supplied test/validation data. Since the mushroom dataset is categorical, you will not need to consider the complexities added with real–valued attributes. There is missing data in the mushroom dataset (flagged by “?” values). Don’t treat the missing data specially. Just pretend that “?” is just another value for the attribute in question. Also, do not worry about pruning the tree.
The program must display a text representation of the decision tree. You are free to display the tree in any way you think makes sense, so long as it shows what attributes are tested at each node in the tree. It is acceptable to utilise diagnosis tools provided by machine learning packages for the display of the tree ** as long as the tree is built by your own program, i.e. it is NOT acceptable to form a 2nd tree using the package, and display the 2nd tree directly **.
Hint #1: The trick with building the decision tree is not really the ID3 algorithm which is fairly straightforward. The tricky bit is managing the dataset. Remember that you need to be able to easily split the dataset based on the value of a specific attribute. That means you need to devise a suitable data structure to easily do this split and to work out class frequencies.
Hint #2: Think carefully about the entropy function you need to use when calculating information gain. It’s not quite so simple as in our theoretical discussion. Specifically, what happens when all of the dataset you’re looking at has only one of the two class values? ie. all the mushrooms are edible or all are poisonous? How will you deal with this?
Hint #3: Follow carefully the online learning materials provided Week 3.
The second option allows you to choose another algorithm to program, so long as you seek approval from me. One potential method is a multilayer perceptron neural network. You may use a supporting mathematical library to help with the details so long as you code the machine learning algorithm part yourself. Note: It is not acceptable to simply write code to call the Java Weka algorithm or the Python scikit-learn code for the algorithm. I expect you to write the main algorithm yourself. The dataset to be used for the classification (or regression) problem will need to be determined in consultation with me, but as a default we would probably use the mushroom dataset from choice 1 if it makes sense.
The third choice is to use an existing package to solve a data mining problem. If you want to do this it will not be enough to just use one classification algorithm and copy the output. You need to explore the data, systematically try several algorithms and parameter settings to find the best (by evaluating the quality of the classifiers) and then provide a recommendation.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Advanced Data Analytics. Retrieved from https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html.
"Advanced Data Analytics." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html.
My Assignment Help (2021) Advanced Data Analytics [Online]. Available from: https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html
[Accessed 19 April 2021].
My Assignment Help. 'Advanced Data Analytics' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html> accessed 19 April 2021.
My Assignment Help. Advanced Data Analytics [Internet]. My Assignment Help. 2021 [cited 19 April 2021]. Available from: https://myassignmenthelp.com/free-samples/31005-advanced-data-analytics/exploration-of-the-dataset.html.
The world's leading assignment help service, MyAssignmenthelp.com, offers iconic assignment assistance at some of the most amazing prices ever. With an army of brilliant minds, we are capable of providing science, Mathematics, literature, statistics, finance, etc. assignment help of exceptional quality for all academic levels. For more than ten years, we have been delivering flawless assignment writing assistance to students from all around the world. Our native writers are some of the best in the industry and the best people to help you reach the heights of academic success!
Answer: Introduction The main aim of this project is to develop a naive command line text based user interface to access the memory database. The c source code is used have used for the command line interface. All the data must be stored in the directory of memory database. The user can enter all the commands on a single line command to interact with the memory database. Description To create the naïve interface, the text based command...Read More
Answer: To develop the Bright College Management system, Java programming language is used. Java is an object-Oriented language and is the most appropriate language to use to develop the proposed system. By using different Object Oriented design patterns, the proposed application will be able to take advantage of most important object oriented design patterns like encapsulation and method overriding. To demonstrate the design of the propose...Read More
Answer: The implemented program is a product in a shop hhaving a particular quantity in stock, minimum stock level, and when this is reached it indicates that the product needs to be reordered and a reorder amount.The class product is designed to model a product where by it has the following attributes which are defined as varibles in the product class. String name- which is the name of the product int quantity- which...Read More
Answer: Bigelow et al. (2015) opined that memory management is one of the big issues in fundamental programming. Though, it's an important aspect to manage memory in the programming environment using C++ . Lakhotia, Harman and Gross (2013) stated that smart pointers are the class objects which look as well as feel like pointer, but they are smarter . This report is designed to explain the use of C++ language in memory manage...Read More
Answer Introduction The internet has moved on by many a miles over the past years. People now use the internet from various devices that range from desktop computers, laptops, tablets and smartphones. Responsive web design or RWD is one of the most commonly used web designing approaches in the modern technological era (Mohammad & Tomberg, 2013). Through this approach only one website interface is designed and it is meant to suffice the ne...Read More
Just share requirement and get customized Solution.
Our writers make sure that all orders are submitted, prior to the deadline.
Using reliable plagiarism detection software, Turnitin.com.We only provide customized 100 percent original papers.
Feel free to contact our assignment writing services any time via phone, email or live chat. If you are unable to calculate word count online, ask our customer executives.
Our writers can provide you professional writing assistance on any subject at any level.
Our best price guarantee ensures that the features we offer cannot be matched by any of the competitors.
Get all your documents checked for plagiarism or duplicacy with us.
Get different kinds of essays typed in minutes with clicks.
Calculate your semester grades and cumulative GPa with our GPA Calculator.
Balance any chemical equation in minutes just by entering the formula.
Calculate the number of words and number of pages of all your academic documents.
Our Mission Client Satisfaction
THANK YOU SO MUCH FOR YOUR ASSISTANCE AND HELP DURING THESE TIMES! YOU ARE ALL AMAZING
happy with the work happy with the work happy with the work happy with happy with the work h the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy wi...
happy with the work ,happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy with the work happy wit...
Written really well and answered the questions well. Straight to the point and citations are placed well. Well informated.