### Question:

You are given a training dataset, “trainDataset.csv”, and a testing dataset, “testDataset.csv”, which will be provided in electronic form. The data are extracted and pre-processed from the original Titanic dataset. The attributes of each object (a passenger in this case) are defined as follows:

• Survived: represent whether the passenger survived (1) or not survived (0);
• PC (Passenger Class): the class of the passenger on ship;
• Sex: indicate the passenger’s sex;
• Age: indicate the passenger’s age group at the time of ship departure;
• SS (Sibling Spouse): indicate the number of Siblings/Spouses that the passenger has on the ship;

You are required to apply decision tree classification technique and the association rule evaluation to the above case appropriately. Specifically, you are required to:

1. Use the training dataset, apply the basic Hunt’s Algorithm to train a fullygrown decision tree model, where the selection of attributes should follow the sequence: If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction).
2. Use the training dataset, apply the Greedy strategy combined with the Gini impurity measure to rebuild a fully-grown decision tree. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction). Samples of the calculations and explanations should be provided to demonstrate the application process of the Greedy strategy and Gini impurity measure.
3. Use the test dataset to test two fully-grown decision tree models, and discuss the results.
4. Perform the post-pruning activities to two fully-grown decision trees by applying the following rules: (i) prune any sub-tree if its leaf nodes have the same class label, and (ii) prune any sub-tree if the number of objects (passengers) at each leaf node is not more than one. After pruning, please test two pruned decision trees using the test dataset. Discuss the results.
5. From two pruned decision trees, extract the association rules for each leaf node based on the information on the path from the root node to the leaf node in the decision trees. Evaluate the support, confidence, and lift of the identified association rules using the training dataset. Discuss the results.

As the majority of the tasks in this assignment is problem-solving based, the word count will be treated as flexible in the sense that if all the required tasks have been appropriately addressed, you will not be penalised for having a word count few than should be treated as an upper limit.

#### Assessment Criteria:

The assessment criteria will generally follow the marking guidelines provided in the Management School Student Handbook. Specific assessment criteria are highlighted below:

1. Demonstrate understanding and knowledge of the relevant concepts, theories and techniques in data mining and machine learning;
2. Demonstrate ability to apply relevant techniques and tools of data mining and machine learning to solve the given problem and tasks;
3. Critically and analytically discuss results in a structured and logical manner;
4. Demonstrate ability to support your arguments with evidences and references;
5. Appropriate structure, presentation, use of English and use of the Harvard referencing style, e.g. figures and tables should be displayed legibly at the 100% zoom scale in a full-screen mode.

### EBUS537 Data Mining And Machine Learning

05 Dec 2021

#### Project Management

Programming: 2.4 Pages, Deadline: 9 hours

Assignment was nicely made with all the requirements. it was worth spending on this

05 Dec 2021

#### Psychology

Essay: 5 Pages, Deadline: 4 days

I read through the paper and I had no criticism. The assignment was finished before the due date and well worked.

User ID: 8***28 United Arab Emirates

05 Dec 2021

#### It Write Up

Assignment: 1 Page, Deadline: 23 hours

Great work assignment was completed on time . This website most of the time is reliable and do great work.

User ID: 3***54 United States

05 Dec 2021

#### Assignment

Essay: 3.2 Pages, Deadline: 7 hours

the work was good i am satisfied i got good grades i am passed in the course i am very happy i was very worried for the assignment but you helped me a ...

04 Dec 2021

#### Statistics

Home Work: 6 Pages, Deadline: 6 hours

The Writer has done and proved he/her academic writing skill.I had passed my module. Thanks

User ID: 4***50 Singapore

04 Dec 2021

#### Management

Programming: 7.4 Pages, Deadline: 12 days

The writer has done decent and fast work. Quality was acceptable .I got 19/30. Keep it up!

User ID: 4***50 Singapore

04 Dec 2021

#### HRM

Programming: 3 Pages, Deadline: 6 hours

good work thankyou great job great job thankyou god bless you. loved the work l

04 Dec 2021

#### English

Assignment: 2 Pages, Deadline: 25 days

My assignment scored off the charts. Thank you for your help to accomplish what I couldn\\'t do on my own. Your work and efforts has been an amazing ...

User ID: 6***28 Tarboro, United States

04 Dec 2021

#### Management

Assignment: 2 Pages, Deadline: 1 day

he assignment was well done. ThanksThe content was accurate and the structure was to the best.

04 Dec 2021

#### Medical

Assignment: 5 Pages, Deadline: 5 days

excellent I had exactly what i want The presentation is clearthanks very much .....

User ID: 7***16 United Arab Emirates

