Securing Higher Grades Costing Your Pocket? Book Your Assignment at The Lowest Price Now!

Weka: A Machine Learning Workbench Add in library

49 Download7 Pages 1,699 Words


Task 1

In WEKA load the data set DIABETES.arff. Perform rule classification using the following methods

  • JRip
  • Ridor

For each method produce a summary of the rules produced and comment on the accuracy of the method. 

Task 2

In WEKA load the data set supermarket.arff. Perform association rule learning using the following methods

  • Apriori
  • FPGrowth

For each method produce a summary of the rules produced and comment on the accuracy of the method. 

Task 3

In WEKA load the data set breat-cancer.arff. Perform Bayesian classification using the following methods

  • AODE
  • BayesNet

For each method produce a summary of the classification produced and comment on the accuracy of the method.



Task 1

Diabetes.arff contains points of interest of Pima Indians Diabetes Database gathered by National Institute of Diabetes and Digestive and Kidney Diseases

All patients here are females of no less than 21 years of age of Pima Indian legacy. The quantities of Instances are 768 with 9 qualities

  1. Number of times pregnant
  2. Plasma glucose focus a 2 hours in an oral glucose resilience test
  3. Diastolic circulatory strain (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-Hour serum insulin (mu U/ml)
  6. Body mass record (weight in kg/(stature in m)^2)
  7. Diabetes family work
  8. Age (years)
  9. Class variable (0 or 1)

Class qualities are 1 then it is tried positive for diabetes and 0 for negative.

This is a two-class issue with class esteem 1 being deciphered as "tried positive for diabetes". There are 500examples of class 1 and 268 of class 2.

Weka classifier tenets uses  a propositional principle learner, which is Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as a streamlined rendition of IREP. It is situated in affiliation rules with lessened blunder pruning (REP), an exceptionally regular and compelling strategy found in choice tree calculations. In REP for principles calculations, the preparation information is part into a developing set and a pruning set. Initially, an introductory tenet set is framed that over it’s the developing set, utilizing some heuristic system. This overlarge standard set is then more than once streamlined by applying one of a set of pruning administrators run of the mill pruning administrators would be to erase any single condition or any single principle. At each one phase of improvement, the pruning administrator picked is the particular case that yields the best lessening of slip on the pruning set. Rearrangements closes when applying any pruning administrator would expand lapse on the pruning set.

Rehashed Incremental Pruning to Produce Error Reduction (RIPPER) is one of the fundamental and most mainstream calculations. Classes are analyzed in expanding size and a starting set of tenets for the class is created utilizing incremental decreased lapse pruning. In this study, we assessed

RIPPER through JRip, a usage of RIPPER in WEKA with the parameters: folds = 10; minNo = 2; enhancements = 2; seed = 1; usePruning = genuine.

JRip is moderate classifier with 96.5% exactness.


It produces a default administer first and after that the exemptions for the default standard with the slightest (weighted) slip rate. At that point it creates the "best" exemptions for every exemption and emphasizes until unadulterated. In this way it performs a tree-like extension of exceptions.The special cases are a situated of tenets that anticipate classes other than the default. IREP is utilized to produce the special cases. Swell down Rules produce models which are simpler to keep up and overhaul than different choices


Task 2

​Apriori Association rule:

Apriori is a count for customary thing set mining and association principle adjusting over worth based databases. It pushes forward by recognizing the perpetual individual things in the database and extending them to greater and greater thing sets the length of those thing sets show up sufficiently oftentimes in the database. The unremitting thing sets directed by Apriori can be used to center association rules which highlight general examples in the database.

Supermarket.arff this information set portrays the shopping propensities for market clients.

A large portion of the qualities stand for one specific thing gathering.

The quality is't' if the client had purchased a thing out of a thing range and missing generally. There is one example every client. The information set contains no class quality, as this is not needed for learning affiliation rules.

Load the information set "supermarket.arff" and change into the Associate Panel. Select "Apriori" as associator. In the wake of pressing begin Apriori begins to fabricate its model and composes its yield into the yield field. The main piece of the yield ('Run data') depicts the choice that have been set and the information set utilized.

Affiliation principles are fundamentally planned to backing exploratory information examination. Use Apriori to create standards and utilization them to say something in regards to the shopping propensities for grocery store clients.

The information contains 4,627 occurrences and 217 qualities. The information is denormalized. Each one trait is double and either has a quality ("t" for genuine) or no worth ("?" for missing). There is an ostensible class quality called "aggregate" that shows whether the exchange was short of what $100 (low) or more noteworthy than $100 (high).


Output To Apriori

The output for Apriori association rule for super market is

The rules discovered where: 

biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92)

baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92)

baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 conf:(0.92)

biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92)

party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91)

biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725 conf:(0.91)

baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701 conf:(0.91)

biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91)

frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757 conf:(0.91)

frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91)

Standards are displayed in precursor => resulting arrangement. The number connected with the predecessor is irrefutably the scope in the dataset (for this situation a number out of a conceivable aggregate of 4,627). The number by the subsequent is indisputably the quantity of occurrences that match the forerunner and the resulting. The number in sections on the end is the backing for the tenet (number of precursor partitioned by the quantity of matching consequents). You can see that a cutoff of 91% was utilized as a part of selecting principles, specified in the "Associator yield" window and demonstrated in that no tenet has scope under 0.91.


Few key perceptions:

  • We can see that all introduced standards have a resulting of "bread and cake".
  • All introduced standards demonstrate a high aggregate exchange sum.

"rolls" a "solidified nourishments" show up in a number of the introduce.

FP Growth

In essential words, this figuring goes about as takes after: first it packs the information database making a FP-tree event to address progressive things. After the first step, it disengages the layered database into a set of prohibitive databases, everybody associated with one constant sample. Finally, every such database is mined autonomously. Using this system, the FP-Growth reduces the journey costs scanning.

Apriori, visits each trade when creating an alternate candidate sets; FP-Growth does not can use data s things; Apriori makes contender sets • FP-Growth uses more confounded data structures & mining Method


Task 3

Breast cancer data set

Breast disease information set has 699 cases with 10 characteristics. The class conveyance is encircled as Benign and dangerous. There are 1 ward variable and 9 free variables. The qualities for the free variables ranges from 1 - 10 and for class variable 2 for Benign and 4 for dangerous tumor. The base conceivable outcomes for an individual to get breast tumor are 1 and the most extreme potential outcomes are spoken to by the quality 10.


Arrived at the midpoint of one-reliance estimators (AODE) are a probabilistic order learning procedure. It was created to address the quality autonomy issue of the mainstream guileless Bayes classifier. It oftentimes creates considerably more exact classifiers than guileless Bayes at the expense of an unassuming increment in the measure of calculation.

Bayes Net

Results for: Naive Bayes

=== Run information ===

Scheme: weka.classifiers.bayes.NaiveBayes

Relation: breast

Instances: 683

Attributes: 10

Test mode: 10-fold cross-validation

Time taken to build model: 0.08 seconds

=== Summary ===

Correctly Classified Instances 659 96.4861 %

Incorrectly Classified Instances 24 3.5139 %

Kappa statistic 0.9238

K&B Relative Info Score 62650.9331 %

K&B Information Score 585.4063 bits 0.8571 bits/instance

Class complexity | order 0 637.9242 bits 0.934 bits/instance

Class complexity | scheme 1877.4218 bits 2.7488 bits/instance

Complexity improvement (Sf) -1239.4976 bits -1.8148 bits/instance

Mean absolute error 0.0362

Root mean squared error 0.1869

Relative absolute error 7.9508 %

Root relative squared error 39.192 %

Total Number of Instances 683

Kappa measurement is utilized to survey the exactness of any specific measuring cases, it is normal to recognize the unwavering quality of the information gathered and their legitimacy. The normal Kappa score from the Bayes Net calculation is around 0.6-0.7.



Ian H. Witten; Eibe Frank; Mark A. Hall (2011). "Data Mining: Practical machine learning tools and techniques, 3rd Edition" . Morgan Kaufmann, San Francisco. Retrieved 2011-01-19.

Holmes; A. Donkin and I.H. Witten (1994)."Weka: A machine learning workbench" . Proc Second Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia. Retrieved 2007-06-25.
R. Garner; S.J. Cunningham, G. Holmes, C.G. Nevill-Manning, and I.H. Witten (1995)."Applying a machine learning workbench: Experience with agricultural databases" . Proc Machine Learning in Practice Workshop, Machine Learning Conference, Tahoe City, CA, USA. pp. 14–21. Retrieved 2007-06-25.
Reutemann; B. Pfahringer and E. Frank (2004)."Proper: A Toolbox for Learning from Relational Data with Propositional and Multi-Instance Learners" . 17th Australian Joint Conference on Artificial Intelligence (AI2004). Springer-Verlag. Retrieved 2007-06-25.
"weka - How do I use the package manager?". Retrieved 20 September 2014.
Ian H. Witten; Eibe Frank; Len Trigg; Mark Hall; Geoffrey Holmes; Sally Jo Cunningham (1999)."Weka: Practical Machine Learning Tools and Techniques with Java Implementations" . Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems. pp. 192–196. Retrieved 2007-06-26. 

OR delivers assignment help to millions of students of USA. We have in-house teams of assignment writers who are experts on wide ranges of subjects. We have appointed teams of native writers who provide assignment help to students in New York City and all over the USA. They are skilled assignment writers who successfully cater to search terms like do my assignment in the USA

Most Downloaded Sample of Management

271 Download1 Pages 48 Words

Toulin Method Of Argumentation

You are required to write a researched argument essay that convinces persuades the reader of your position / stance. This is an academic, researched and referenced do...

Read More Tags: Australia Arlington Management Management University of New South Wales Management 
202 Download9 Pages 2,237 Words

Consumer Behavior Assignment

Executive Summary The purpose of this report is to elaborate the factors which are considered by individuals before selecting an occupation. Choosing an occupati...

Read More Tags: Australia Arlington Management Management University of New South Wales Management 
367 Download13 Pages 3,112 Words

Internet Marketing Plan For River Island

Introduction With the increase enhancement in the field of technology, it has been considered essential by the businesses to implement such technology in their b...

Read More Tags: Australia Arlington Management Management University of New South Wales Management 
326 Download9 Pages 2,203 Words

Strategic Role Of HR In Mergers & Acquisitions

Executive Summary In a merger & acquisition, role of an HR has emerged as a very critical function. At each stage of merger and acquisition process, HR plays a s...

Read More Tags: Australia Arlington Management Management University of New South Wales Management 
354 Download7 Pages 1,521 Words

Relationship Between Knowledge Management, Organization Learning And HRM

Introduction In this competitive business environment where every business organization is trying to attract the customers of each other, it becomes essential for ...

Read More Tags: Australia Arlington Management Management University of New South Wales Management 
Free plagiarismFree plagiarism check online on mobile
Have any Query?