Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave
Question:

Download BBC sports dataset from the Cloud. This dataset consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas from 2004-2005. There are 5 class labels: athletics, cricket, football, rugby, tennis. The original dataset and raw text files can be downloaded from here

  1. There are 3 files in the dataset corresponding to the feature matrix, the class labels and the term dictionary. You need to read these files in Python notebook and store in variables X, trueLabels, and terms.
  2. Next perform K-means clustering with 5 clusters using Euclidean distance as similarity measure. Evaluate the clustering performance using adjusted rand index and adjusted mutual information. Report the clustering performance averaged over 50 random initializations of K-means
  3. Repeat K-means clustering with 5 clusters using a similarity measure other than Euclidean distance. Evaluate the clustering performance over 50 random initializations of K-means using adjusted rand index and adjusted mutual information. Report the clustering performance and compare it with the results obtained in step 2
  4. For clustering cases (Euclidean distance and the other similarity measure), visualize the cluster centres using Tag cloud using Python package WordCloud.
Answer:
A Clustering

The assignment of separating the information focuses into different gatherings along with the end goal that information focuses in the same gatherings are exact like the other information focuses that are in the same gathering when compared to those in the different gatherings. Basically, it points to isolate the bunch with a comparable attributes and allocates them into groups, which is called as clustering. Comprehensively, grouping can be isolated into two subgroups. They are as follows (Aggarwal & Reddy, 2016):

  • Hard Clustering: In hard Clustering, every datum point either has a place with a group totally or not. For instance, in the above case every client is put into one gathering out of 10 gatherings.
  • Soft Clustering:In soft Clustering, rather than putting every datum point into a different group, a likelihood or probability of that information point to be in those bunches is appointed.
Dataset

This task aims to perform clustering on provided data set that is, BBC sports data set from the cloud. This data set contains 737 documents from BBC sports’ website according to the sports news articles. Here, we open the provided data set. The provided data set contains three files like, BBC sports classes, BBC sports matrix and BBC sport terms. These files are shown below.

K-Means Clustering

K-Means is likely the most understood bunching calculation. It is educated in a considerable measure of starting information science and machine learning classes. It is straightforward and can be actualized in code and can check out the realistic delineation (Kaushik, 2016).

  1. To start, we initially select various classes/gatherings to utilize and randomly introduces their separate focuses. To make sense of the quantity of classes to utilize, it is great to investigate the information and endeavour, to recognize any unmistakable groupings. The middle focuses are vectors of indistinguishable length from every datum point vector and are the "X's" in the above realistic.  
  2. Each information point is ordered by processing the separation between that point and each gathering focus, and afterwards characterizing the point to be in the gathering whose middle is nearest to it.
  3. Based on these characterized focuses, we recomputed the gathering focus by taking the mean of the considerable number of vectors in the gathering.
  4. Repeat these means for a set of number emphases or until the point that the gathering focuses don't change much between the cycles. You can likewise pick random introduce for gathering focuses a couple of times, and afterwards select the run that appears as though it provided the best outcomes (Celebi, 2016).

K-Means has the preferred standpoint that it's truly quick, as all we're truly doing is registering the separations among the focuses and gather focus; not many calculations! It hence has a direct multifaceted nature O(n).

Then again, K-Means has few inconveniences. Right off the bat, you need to choose what number of gatherings/classes there are. This isn't constantly unimportant and preferably with a clustering calculation we'd need it to make sense of those for us in the light of the fact that the purpose of it is to increase some knowledge from the information. K-means likewise begins with an arbitrary decision of group focus and subsequently it might yield diverse clustering results on various keeps running of the calculation. Along these lines, the outcomes may not be repeatable and need consistency. Other bunch of strategies are more reliable.

K-Medians is another clustering calculation identified with K-Means, aside from as opposed to recomposing the gathering focuses on utilizing the mean, so we utilize the middle vector of the gathering. This technique is less touchy to anomalies (on account of utilizing the Median) however it is much slower for bigger datasets as arranging is required on every emphasis when registering the Median vector.

Utilization of K-Means Clustering

k-means strategy is utilized for isolating the perceptions into similar bunches, in the light of their portrayal by an arrangement of quantitative factors. K-means clustering has the accompanying points of interest specifically as follows:

  • A protest might be relegated to a class amid one cycle at that point, change the class in the accompanying emphasis, which isn't conceivable with the Agglomerative Hierarchical Clustering, where the task cannot be reversed.
  • With the duplication of the beginning stages and reiterations, a few arrangements might be investigated.

Grouping criteria for k-means Clustering

A few grouping reasons might be utilized for achieving the answer. XLSTAT provides four factors as limited:

  • Trace (W) or Median
  • Determinant (W)
  • Trace (W)
  • Wilks lambda

Results of k-means grouping in XLSTAT

  • The optimization outline: This is a table which demonstrates the development of the inside class difference. On the off chance that, few redundancies have been asked for the outcomes, for each reiteration are shown.
  • Statistics for every cycle: Activate this choice to see the development of random insights computed as it emphasises for redundancy continuing, and provides the ideal outcome for the picked rule. In the event that the comparing choice is initiated in the Charts tab, an outline demonstrating the advancement of the picked foundation as the emphases continue is shown.
  • Variance decay for the ideal arrangement: This is a table which demonstrates the inside class change between the class difference and the aggregate fluctuation.
  • Class centroids: This is a table which demonstrates the class centroids for different descriptors.
  • Distance between class centroids: This is a table which demonstrates Euclidean separations among the class centroids for different descriptors.
  • Central objects: This is a table which demonstrates the directions of the closest which questions the centroid for every class.
  • Distance between the focal articles: This is a table which demonstrates the Euclidean separations between the class focal items for the different descriptors.
  • Results by class: The expressive measurements for the classes (number of articles, aggregate of weights, inside class change, least separation to the centroid, most extreme separation to the centroid, mean separation to the centroid) are shown in the initial segment of the table. The second part demonstrates the items.
  • Result by question: This is a table which demonstrates the task class for every single protest in arranged items.
Result of BBC Sports Matrix

 

 

 

 

 

 

 

 

 

 

Statistics’ Summary:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable

Observations

Obs. with missing data

Obs. without missing data

Minimum

Maximum

Mean

Std. deviation

 

 

7

9

0

9

0.000

0.000

0.000

0.000

 

 

1

9

0

9

0.000

0.000

0.000

0.000

 

 

3

9

0

9

0.000

1.000

0.222

0.441

 

 

2

9

0

9

0.000

0.000

0.000

0.000

 

 

4

9

0

9

0.000

1.000

0.333

0.500

 

 

2

9

0

9

0.000

1.000

0.333

0.500

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Optimization summary:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Repetition

Iteration

Initial within-class variance

Final within-class variance

ln(Determinant(W))

 

 

 

 

 

1

1

0.750

0.583

-Inf

 

 

 

 

 

2

1

0.938

0.375

-Inf

 

 

 

 

 

3

1

0.708

0.250

-Inf

 

 

 

 

 

4

1

1.000

0.333

-Inf

 

 

 

 

 

5

1

0.458

0.333

-Inf

 

 

 

 

 

6

1

0.708

0.375

-Inf

 

 

 

 

 

7

1

0.667

0.250

-Inf

 

 

 

 

 

8

1

0.750

0.375

-Inf

 

 

 

 

 

9

1

1.000

0.250

-Inf

 

 

 

 

 

10

1

0.875

0.250

-Inf

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Statistics for each iteration:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Iteration

Within-class variance

Trace(W)

ln(Determinant(W))

Wilks' Lambda

 

 

 

 

 

0

0.750

3.000

-Inf

0.000

 

 

 

 

 

1

0.583

2.333

-Inf

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variance decomposition for the optimal classification:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Absolute

Percent

 

 

 

 

 

 

 

Within-class

0.583

84.00%

 

 

 

 

 

 

 

Between-classes

0.111

16.00%

 

 

 

 

 

 

 

Total

0.694

100.00%

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Initial class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

7

1

3

2

4

2

 

 

 

1

0.000

0.000

1.000

0.000

0.500

0.500

 

 

 

2

0.000

0.000

0.000

0.000

0.500

0.500

 

 

 

3

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

4

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

5

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

7

1

3

2

4

2

Sum of weights

Within-class variance

 

1

0.000

0.000

1.000

0.000

0.500

0.500

2.000

1.000

 

2

0.000

0.000

0.000

0.000

0.667

0.667

3.000

0.667

 

3

0.000

0.000

0.000

0.000

0.000

0.000

2.000

0.000

 

4

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

 

5

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Distances between the class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

2

3

4

5

 

 

 

 

1

0

1.027

1.225

1.225

1.225

 

 

 

 

2

1.027

0

0.943

0.943

0.943

 

 

 

 

3

1.225

0.943

0

0.000

0.000

 

 

 

 

4

1.225

0.943

0.000

0

0.000

 

 

 

 

5

1.225

0.943

0.000

0.000

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Central objects:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

7

1

3

2

4

2

 

 

 

1 (0)

0.000

0.000

1.000

0.000

0.000

0.000

 

 

 

2 (0)

0.000

0.000

0.000

0.000

1.000

1.000

 

 

 

3 (0)

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

4 (0)

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

5 (0)

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Distances between the central objects:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 (0)

2 (0)

3 (0)

4 (0)

5 (0)

 

 

 

 

1 (0)

0

1.732

1.000

1.000

1.000

 

 

 

 

2 (0)

1.732

0

1.414

1.414

1.414

 

 

 

 

3 (0)

1.000

1.414

0

0.000

0.000

 

 

 

 

4 (0)

1.000

1.414

0.000

0

0.000

 

 

 

 

5 (0)

1.000

1.414

0.000

0.000

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Result based on class:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

1

2

3

4

5

 

 

 

 

Objects

2

3

2

1

1

 

 

 

 

Sum of weights

2

3

2

1

1

 

 

 

 

Within-class variance

1.000

0.667

0.000

0.000

0.000

 

 

 

 

Minimum distance to centroid

0.707

0.471

0.000

0.000

0.000

 

 

 

 

Average distance to centroid

0.707

0.654

0.000

0.000

0.000

 

 

 

 

Maximum distance to centroid

0.707

0.745

0.000

0.000

0.000

 

 

 

 

 

0

0

0

0

0

 

 

 

 

 

1

0

0

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Results by object:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Observation

Class

Distance to centroid

 

 

 

 

 

 

 

0

1

0.707

 

 

 

 

 

 

 

0

2

0.745

 

 

 

 

 

 

 

0

3

0.000

 

 

 

 

 

 

 

0

4

0.000

 

 

 

 

 

 

 

0

5

0.000

 

 

 

 

 

 

 

0

3

0.000

 

 

 

 

 

 

 

0

2

0.745

 

 

 

 

 

 

 

1

1

0.707

 

 

 

 

 

 

 

0

2

0.471

 

 

 

 

 

 

 
Result of BBC Sports classes

 

 

 

 

 

 

 

 

 

Statistics’ Summary:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable

Observations

Observation with the missing data

Observation without the missing data

Minimum

Maximum

Mean

Std. deviation

 

0

5

0

5

0.000

0.000

0.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Summary of Optimization:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Repetitions

Iterations

starting within-class variance

Final within-class variance

ln (Determinant(W))

 

 

 

 

1

1

0.000

0.000

-Inf

 

 

 

 

2

1

0.000

0.000

-Inf

 

 

 

 

3

1

0.000

0.000

-Inf

 

 

 

 

4

1

0.000

0.000

-Inf

 

 

 

 

5

1

0.000

0.000

-Inf

 

 

 

 

6

1

0.000

0.000

-Inf

 

 

 

 

7

1

0.000

0.000

-Inf

 

 

 

 

8

1

0.000

0.000

-Inf

 

 

 

 

9

1

0.000

0.000

-Inf

 

 

 

 

10

1

0.000

0.000

-Inf

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Statistics for every single iteration:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Iteration

Within-class variance

Trace (W)

Ln (Determinant(W))

Wilks' Lambda

 

 

 

 

0

0.000

0.000

-Inf

0.000

 

 

 

 

1

0.000

0.000

-Inf

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For optimal classification, variance decomposition:

 

 

 

 

 

 

 

 

 

 

 

 

 

Absolute

%

 

 

 

 

 

 

Within-class

0.000

0.00%

 

 

 

 

 

 

Between the classes

0.000

0.00%

 

 

 

 

 

 

SUM

0.000

100.00%

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Initial class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

0

 

 

 

 

 

 

 

1

0.000

 

 

 

 

 

 

 

2

0.000

 

 

 

 

 

 

 

3

0.000

 

 

 

 

 

 

 

4

0.000

 

 

 

 

 

 

 

5

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

0

Sum of weights

Within-class variance

 

 

 

 

 

1

0.000

1.000

0.000

 

 

 

 

 

2

0.000

1.000

0.000

 

 

 

 

 

3

0.000

1.000

0.000

 

 

 

 

 

4

0.000

1.000

0.000

 

 

 

 

 

5

0.000

1.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Distances between the class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

2

3

4

5

 

 

 

1

0

0.000

0.000

0.000

0.000

 

 

 

2

0.000

0

0.000

0.000

0.000

 

 

 

3

0.000

0.000

0

0.000

0.000

 

 

 

4

0.000

0.000

0.000

0

0.000

 

 

 

5

0.000

0.000

0.000

0.000

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Central objects:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

0

 

 

 

 

 

 

 

1 (0)

0.000

 

 

 

 

 

 

 

2 (0)

0.000

 

 

 

 

 

 

 

3 (0)

0.000

 

 

 

 

 

 

 

4 (0)

0.000

 

 

 

 

 

 

 

5 (0)

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Distances between the central objects:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 (0)

2 (0)

3 (0)

4 (0)

5 (0)

 

 

 

1 (0)

0

0.000

0.000

0.000

0.000

 

 

 

2 (0)

0.000

0

0.000

0.000

0.000

 

 

 

3 (0)

0.000

0.000

0

0.000

0.000

 

 

 

4 (0)

0.000

0.000

0.000

0

0.000

 

 

 

5 (0)

0.000

0.000

0.000

0.000

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Result based on class:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Classes

1

2

3

4

5

 

 

 

Objects

1

1

1

1

1

 

 

 

Sum of weights

1

1

1

1

1

 

 

 

Within-class variance

0.000

0.000

0.000

0.000

0.000

 

 

 

Minimum distance to centroid

0.000

0.000

0.000

0.000

0.000

 

 

 

Average distance to centroid

0.000

0.000

0.000

0.000

0.000

 

 

 

Maximum distance to centroid

0.000

0.000

0.000

0.000

0.000

 

 

 

 

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Results by object:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Observation

Class

Distance to centroid

 

 

 

 

 

 

0

1

0.000

 

 

 

 

 

 

0

2

0.000

 

 

 

 

 

 

0

3

0.000

 

 

 

 

 

 

0

4

0.000

 

 

 

 

 

 

0

5

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Means clustering

Repeat K-means are provided in the below result.

For BBC Sports Matrix

Statistics’ Summary:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable

Observations

Observations with missing data

Observations without missing data

              Minimum

Maximum

Mean

Std. deviation

 

 

 

0

9

0

9

0.000

1.000

0.333

0.500

 

 

 

0

9

0

9

0.000

1.000

0.111

0.333

 

 

 

0

9

0

9

0.000

2.000

0.556

0.882

 

 

 

0

9

0

9

0.000

2.000

0.667

0.866

 

 

 

0

9

0

9

0.000

0.000

0.000

0.000

 

 

 

0

9

0

9

0.000

0.000

0.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Optimization summary:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Repetition

Iteration

Initial within-class variance

Final within-class variance

ln(Determinant(W))

 

 

 

 

 

1

1

1.958

0.375

-Inf

 

 

 

 

 

2

1

2.875

0.300

-Inf

 

 

 

 

 

3

1

2.583

0.125

-Inf

 

 

 

 

 

4

1

2.500

0.833

-Inf

 

 

 

 

 

5

1

1.688

0.125

-Inf

 

 

 

 

 

6

1

2.438

0.500

-Inf

 

 

 

 

 

7

1

2.792

0.750

-Inf

 

 

 

 

 

8

1

2.833

0.125

-Inf

 

 

 

 

 

9

1

2.500

0.500

-Inf

 

 

 

 

 

10

1

2.875

0.125

-Inf

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Statistics for each iteration:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Iteration

Within-class variance

Trace(W)

ln(Determinant(W))

Wilks' Lambda

 

 

 

 

 

0

1.958

7.833

-Inf

0.000

 

 

 

 

 

1

0.375

1.500

-Inf

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variance decomposition for the optimal classification:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Absolute

Percent

 

 

 

 

 

 

 

Within-class

0.375

19.85%

 

 

 

 

 

 

 

Between-classes

1.514

80.15%

 

 

 

 

 

 

 

Total

1.889

100.00%

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Initial class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

0

0

0

0

0

0

 

 

 

1

0.333

0.000

0.000

0.667

0.000

0.000

 

 

 

2

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

3

1.000

0.000

0.000

0.000

0.000

0.000

 

 

 

4

0.000

0.500

1.500

1.500

0.000

0.000

 

 

 

5

0.500

0.000

1.000

0.500

0.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

0

0

0

0

0

0

Sum of weights

Within-class variance

 

1

1.000

0.000

0.000

0.000

0.000

0.000

3.000

0.000

 

2

0.000

0.000

0.000

0.000

0.000

0.000

2.000

0.000

 

3

0.000

0.000

0.000

2.000

0.000

0.000

1.000

0.000

 

4

0.000

0.000

2.000

1.000

0.000

0.000

1.000

0.000

 

5

0.000

0.500

1.500

1.500

0.000

0.000

2.000

1.500

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Distances between the class centroids:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

2

3

4

5

 

 

 

 

1

0

1.000

2.236

2.449

2.398

 

 

 

 

2

1.000

0

2.000

2.236

2.179

 

 

 

 

3

2.236

2.000

0

2.236

1.658

 

 

 

 

4

2.449

2.236

2.236

0

0.866

 

 

 

 

5

2.398

2.179

1.658

0.866

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Central objects:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

0

0

0

0

0

0

 

 

 

1 (0)

1.000

0.000

0.000

0.000

0.000

0.000

 

 

 

2 (1)

0.000

0.000

0.000

0.000

0.000

0.000

 

 

 

3 (0)

0.000

0.000

0.000

2.000

0.000

0.000

 

 

 

4 (0)

0.000

0.000

2.000

1.000

0.000

0.000

 

 

 

5 (0)

0.000

0.000

1.000

1.000

0.000

0.000

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Distances between the central objects:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 (0)

2 (1)

3 (0)

4 (0)

5 (0)

 

 

 

 

1 (0)

0

1.000

2.236

2.449

1.732

 

 

 

 

2 (1)

1.000

0

2.000

2.236

1.414

 

 

 

 

3 (0)

2.236

2.000

0

2.236

1.414

 

 

 

 

4 (0)

2.449

2.236

2.236

0

1.000

 

 

 

 

5 (0)

1.732

1.414

1.414

1.000

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Results by class:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Class

1

2

3

4

5

 

 

 

 

Objects

3

2

1

1

2

 

 

 

 

Sum of weights

3

2

1

1

2

 

 

 

 

Within-class variance

0.000

0.000

0.000

0.000

1.500

 

 

 

 

Minimum distance to centroid

0.000

0.000

0.000

0.000

0.866

 

 

 

 

Average distance to centroid

0.000

0.000

0.000

0.000

0.866

 

 

 

 

Maximum distance to centroid

0.000

0.000

0.000

0.000

0.866

 

 

 

 

 

0

1

0

0

0

 

 

 

 

 

0

0

 

 

1

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Results by object:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Observation

Class

Distance to centroid

 

 

 

 

 

 

 

0

1

0.000

 

 

 

 

 

 

 

1

2

0.000

 

 

 

 

 

 

 

0

2

0.000

 

 

 

 

 

 

 

0

1

0.000

 

 

 

 

 

 

 

0

1

0.000

 

 

 

 

 

 

 

0

3

0.000

 

 

 

 

 

 

 

0

4

0.000

 

 

 

 

 

 

 

0

5

0.866

 

 

 

 

 

 

 

1

5

0.866

 

 

 

 

 

 

 

References

Aggarwal, C. and Reddy, C. (2016). Data clustering.

Celebi, M. (2016). Partitional clustering algorithms. [S.l.]: Springer International Pu.

Kaushik, S. (2016). An Introduction to Clustering & different methods of clustering. [online] Analytics Vidhya. Available at: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/ [Accessed 24 Aug. 2018].

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Clustering And K-Means Algorithm With BBC Sports Dataset. Retrieved from https://myassignmenthelp.com/free-samples/sit720-machine-learning/partitional-clustering-algorithms.html.

"Clustering And K-Means Algorithm With BBC Sports Dataset." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/sit720-machine-learning/partitional-clustering-algorithms.html.

My Assignment Help (2021) Clustering And K-Means Algorithm With BBC Sports Dataset [Online]. Available from: https://myassignmenthelp.com/free-samples/sit720-machine-learning/partitional-clustering-algorithms.html
[Accessed 25 July 2024].

My Assignment Help. 'Clustering And K-Means Algorithm With BBC Sports Dataset' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/sit720-machine-learning/partitional-clustering-algorithms.html> accessed 25 July 2024.

My Assignment Help. Clustering And K-Means Algorithm With BBC Sports Dataset [Internet]. My Assignment Help. 2021 [cited 25 July 2024]. Available from: https://myassignmenthelp.com/free-samples/sit720-machine-learning/partitional-clustering-algorithms.html.

Get instant help from 5000+ experts for
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing: Proofread your work by experts and improve grade at Lowest cost

loader
250 words
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Plagiarism checker
Verify originality of an essay
essay
Generate unique essays in a jiffy
Plagiarism checker
Cite sources with ease
support
Whatsapp
callback
sales
sales chat
Whatsapp
callback
sales chat
close