CS3753 Data Science in Python
Submission: 1.Submit a single python script through Blackboard learn. All the results are outputted from your Python code. 2.You should have the instruction of running your code at the beginning of your code. It should run successfully either in the basic Python3 environment or in Jupyter Notebook.3.Do not compress your source code and data files. Make sure all your files are in the same folder when you run the code. So, after the graders download your homework, they do not need to set the path for the data file. They can run your code successfully. Questions 1.K-means clustering. You do not need to import any libraries or modules about K-means clustering because you will implement it from scratch. The template of the code is provided, and you just need to write your code at specified locations with “your code is here”. Download the dataset ‘k_means_clustering_data.csv’ and save it into your working directory where we can find your source code about this homework. The dataset has two columns (‘x’ and ‘y’) and 42 records. They are 42 points in a 2D plane. Your goal is to group them into K clusters using K-means clustering algorithm. The basic step of k-means clustering is simple. Initially, we determine number of cluster K and select K centroid or center of these clusters from the dataset randomly. If there is no update on the locations of all centroids, the K-means algorithm will iterate at the following steps until convergence. a.Measure the distance of each point in the dataset to the K centroids b.Group the point based on minimum distance c.Calculate the location of each centroid based on the data points in the cluster