- Course Code: CSC 411
- University: University Of Toronto Mississauga
- Country: Canada

In this question you will generate and plot 2-dimensional data for a binary classification problem. We will call the two classes Class 0 and Class 1 (for which the target values are t = 0 and t = 1, respectively).

(a) (6 points) Write a Python function genData(mu0,mu1,Sigma0,Sigma1,N) that generates two clusters of data, one for each class. Each cluster consists of N data points. The cluster for class 0 is centred at mu0 and has covariance matrix Sigma0. The cluster for class 1 is centred at mu1 and has covariance matrix Sigma1. Note that mu0 and mu1 and all the data points are 2-dimensional vectors. Sigma0 and Sigma1 are 2 × 2 symmetric matrices that describe the shape of the clusters: the diagonal entries specify the variance of a cluster along each of the two dimensions, and the off-diagonal entries describe how correlated the two dimensions are. The function should return two arrays, X and t, representing data points and target values, respectively. X is a 2N × 2 dimensional array in which each row is a data point. t is a 2N-dimensional vector of 0s and 1s. Specifically, t[i] is 0 if X[i] belongs to class 0, and 1 if it belongs to class 1. The data for the two classes should be distributed randomly in the arrays. In particular, the data for class 0 should not all be in the first half of the arrays, with the data for class 1 in the second half.

We will model each cluster as a multivariate normal distribution. Recall that the probability density of such a distribution is given bywhere µ is the mean (cluster centre), Σ is the covariance matrix, and k is the dimensionaliy of the data (2 in our case). To generate data for a cluster, use the function multivariate normal in numpy.random. Use the function shuffle in sklearn.utils to distribute the data randomly in the arrays.

(b) (1 point) Use your function from part (a) to generate two clusters with 10,000 points each with mu0 = (0, −1), mu1 = (−1, 1) and Sigma0 = 2.0 0.5 0.5 1.0 !

Sigma1 = 1.0 −1.0

−1.0 2.0

!

You will have to encode these argument values as Numpy arrays.

−1.0 2.0

!

You will have to encode these argument values as Numpy arrays.

(c) Display the data from part (b) as a scatter plot, using red dots for points in cluster 0, and blue dots for points in cluster 1. Use the function scatter in numpy.pyplot. Specify a relatively small dot size by using the named argument s=2. Use the functions xlim and ylim to extend the x and y axes from -5 to 6. Title the plot, “Question 1(c): sample cluster data (10,000 points per cluster)”. If you have done everything correctly, the scatter plot should look something like Figure 1, which shows two heavily overlapping clusters. In particular, the2. (?? points) Binary Logistic Regression.

In this question you will use logistic regression to generate a classifier for cluster data. You will also generate a precision-recall curve for the classifier. Use the Python class LogisticRegression in sklearn.linear model to do the logistic regression. This class generates a Python object, much as the function Ridge did in Question 5 of 4

Assignment 1. The class comes with a number of attributes and methods that you will find useful for answering the questions below.

Assignment 1. The class comes with a number of attributes and methods that you will find useful for answering the questions below.

(a) Use genData to generate training data consisting of two clusters with 1000 points each. Use the same cluster centers and covariance matrices as in Question 1(b).

(b) Carry out logistic regression on the data in part (a). Print out the values of the bias term, w0, the weight vector, w, and the mean accuracy of the classifier on the training data. (Accuracy is the number of correct predictions.)

(b) Carry out logistic regression on the data in part (a). Print out the values of the bias term, w0, the weight vector, w, and the mean accuracy of the classifier on the training data. (Accuracy is the number of correct predictions.)

(c) Generate a scatter plot of the taining data as in Question 1(c), and draw the decision boundary of the classifier as a black line on top of the data. Title the figure, “Question 2(c): training data and decision boundary”.

(d) Recall that the standard decision boundary tends to make the number of false positives equal to the number of false negatives. However, these two kinds of error may have different costs, and we may want to shift the decision boundary to account for this. That is, instead of defining the decision boundary by w T x+w0 =0, we may want to define it by w T x + w0 = t for some threshold, t.Generate a scatter plot of the data, and plot seven different decision boundaries on top of it, for t = 3, 2, 1, 0, −1, −2, −3. Plot the decision boundary as a blue line when t is positive, as a red line when t is negative, and as a black line when t is 0. Title the figure, “Question 2(d): decision boundaries for seven thresholds”.

(e) Which of the seven values of t in part (d) gives the greatest number of false positives (i.e., false blue predictions)? Explain your answer.

(f) For t = 1, what is the probability of a point on the decision boundary being in class 1 (i.e., blue).

(g) Use genData to generate test data consisting of two clusters with 10,000 points each. Use the same cluster centers and covariance matrices as for the training data.

(h) Use the test data to compute and print out the following values for t = 1:

• The number of predicted positives (i.e., points predicted to be in class 1)

• The number of predicted negatives (i.e., points predicted to be in class 0)

• The number of true positives (i.e., predictions for class 1 that are correct).

• The number of false postives (i.e., predictions for class 1 that are incorrect)

• The number of true negatives (i.e., predictions for class 0 that are correct)

• The number of false negatives (i.e., predictions for class 0 that are incorrect).

• The precision.

• The recall.

• The number of predicted negatives (i.e., points predicted to be in class 0)

• The number of true positives (i.e., predictions for class 1 that are correct).

• The number of false postives (i.e., predictions for class 1 that are incorrect)

• The number of true negatives (i.e., predictions for class 0 that are correct)

• The number of false negatives (i.e., predictions for class 0 that are incorrect).

• The precision.

• The recall.

The number of predicted positives should be less than the number of predicted negatives. The number of true positives should be much greater than the number of false positives. (Explain both of these points, generating an appropriate figure to simplify your explanation. Title the figure, “Question 2(h): explanatory figure”.)

5

(i) Use the test data to generate a precision/recall curve for the classifier. That is, plot precision vs recall for 1000 different values of the threshold, t. You should choose the range of t values so that the curve is as long as possible. You should find that 0.5 ≤ precision ≤ 1 and 0 ≤ recall ≤ 1. The result should look something like Figure 2 (although the minimum precision in this curve is different). Label the axes, and title the figure, “Question 2(i): precision/recall curve”.

(j) Explain why the minimum precision is 0.5.

(k) Compute and print the area under the precision/recall curve. The area should be between 0.5 and 1.0. (Recall that the area under a curve (AUC) is the area between the curve and the x axis.)

(k) Compute and print the area under the precision/recall curve. The area should be between 0.5 and 1.0. (Recall that the area under a curve (AUC) is the area between the curve and the x axis.)

(l) Explain why the area under the curve must betwen 0.5 and 1.0. (You may want to include a figure in your explanation. If so, title it, “Question 2(l): explanatory figure”.)

3. (?? points total) Multi-class Classification. In this question, you will use logistic regression and K nearest neighbors (KNN) to classify images of handwritten digits. There are ten different digits (0 to 9), so you will be using multi-class classification. To start, download and uncompress (if necessary) the MNIST data file from the course

web page. The file, called mnist.pickle.zip, contains training and test data. Next, start the Python interpreter and import the pickle module. You can then read the file

mnist.pickle with the following command (’rb’ opens the file for reading in binary):

web page. The file, called mnist.pickle.zip, contains training and test data. Next, start the Python interpreter and import the pickle module. You can then read the file

mnist.pickle with the following command (’rb’ opens the file for reading in binary):

with open(’mnist.pickle’,’rb’) as f: Xtrain,Ytrain,Xtest,Ytest = pickle.load(f) The variables Xtrain and Ytrain contain training data, while Xtest and Yest contain test data. Use this data for training and testing in this question and in the rest of this assignment. Xtrain is a Numpy array with 60,000 rows and 784 columns. Each row represents a hand-written digit. Although each digit is stored as a row vector with 784 components, it actually represents an array of pixels with 28 rows and 28 columns (784 = 28 × 28). Each pixel is stored as a floating-point number, but has an integer value between 0 and 255 (i.e., the values representable in a single byte). The variable Ytrain is a vector of 60,000 image labels, where a label is an integer betwen 0 and 9. For example, if row n of Xtrain is an image of the digit 7, then Ytrain[n] = 7. Likewise for Xtest and Ytest, which represent 10,000 test images. To view a digit, you must first convert it to a 28 × 28 array using the function numpy.reshape. To display a 2-dimensional array as an image, you can use the function imshow in matplotlib.pyplot. To see an image in black-and-white, add the keyword argument cmap=’Greys’ to imshow. To remove the smoothing and see the 784 pixels clearly, add the keyword argument interpolation=’nearest’. Try displaying a few digits as images. (Figure 3 shows an example.) For comparison, try printing them as vectors. (Do not hand this in.)

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2019). *Machine Learning And Data Mining*. Retrieved from https://myassignmenthelp.com/free-samples/csc-411-machine-learning-and-data-mining.

"Machine Learning And Data Mining." My Assignment Help, 2019, https://myassignmenthelp.com/free-samples/csc-411-machine-learning-and-data-mining.

My Assignment Help (2019) *Machine Learning And Data Mining* [Online]. Available from: https://myassignmenthelp.com/free-samples/csc-411-machine-learning-and-data-mining

[Accessed 10 August 2020].

My Assignment Help. 'Machine Learning And Data Mining' (My Assignment Help, 2019) <https://myassignmenthelp.com/free-samples/csc-411-machine-learning-and-data-mining> accessed 10 August 2020.

My Assignment Help. Machine Learning And Data Mining [Internet]. My Assignment Help. 2019 [cited 10 August 2020]. Available from: https://myassignmenthelp.com/free-samples/csc-411-machine-learning-and-data-mining.

If you are stuck with an overly complicated dissertation and looking for an ideal academic expert for the needful backup, then count on our expertise. Our writers are capable of working on a plethora of research paper topics. Be it Management or Law, Humanities or Literature – we shall always have an accurate answer to your question. In addition, we shall help you with the perfect research paper outline and a flawless list of bibliography on time. So, what are you waiting for? Place an order with us right now.

- Course Code: ENGT5219
- University: De Montfort University
- Country: United Kingdom

Answer: Introduction The most serious issue that the nations around the world are dealing with is the climatic change and its impact on the people as well as the business operating. UK government has developed industrial strategy along with the private organisations which has identified four grand challenges including promoting the use of AI and data structure,catering the needs of ageing society, development of better road and rail facilitie...

Answer: Introduction The report highlights facts about the company Snap Inc. The report discusses the matter related to the structure of the organization, its market value, scope and scale of its business operations. The elements used in the organization have been discussed here in the report. The modern approach used by the management in the company. Also the innovative strategic objectives of management are discussed in the report (Seidl, 2...

- Course Code: ECCDD301A
- University: Tafe NSW
- Country: Australia

Answer: Early Childhood Education: An Analysis Of The Importance Of Visual And Media Arts As stated by Chalmers (2019), the process of early childhood education is a complex one and the educators as well as the educational institutions are required to give adequate amount of attention both to the physical and the mental developments of the children. Hattie, Masters and Birch (2015) are of the viewpoint that the major objective of the early ch...

- Course Code: 289ACC
- University: Coventry University
- Country: United Kingdom

Answer: Requirement for budget and strategic planning Budgeting and strategic planning are two major factors of a particular business enterprise. The business organizations regardless of sector, complexity, size all are heavily dependent on the budgetary systems and the budgets so that they can achieve the strategic goals. The process related to budgeting mainly involves setting of objectives and strategic goals and thereby developing costs, ...

- Course Code: 289ACC
- University: Coventry University
- Country: United Kingdom

Answer: Requirement for budget and strategic planning Budgeting and strategic planning are two major factors of a particular business enterprise. The business organizations regardless of sector, complexity, size all are heavily dependent on the budgetary systems and the budgets so that they can achieve the strategic goals. The process related to budgeting mainly involves setting of objectives and strategic goals and thereby developing costs, ...

Just share Requriment and get customize Solution.

Orders

Overall Rating

Experts

Our writers make sure that all orders are submitted, prior to the deadline.

Using reliable plagiarism detection software, Turnitin.com.We only provide customized 100 percent original papers.

Feel free to contact our assignment writing services any time via phone, email or live chat.

Our writers can provide you professional writing assistance on any subject at any level.

Our best price guarantee ensures that the features we offer cannot be matched by any of the competitors.

Get all your documents checked for plagiarism or duplicacy with us.

Get different kinds of essays typed in minutes with clicks.

Calculate your semester grades and cumulative GPa with our GPA Calculator.

Balance any chemical equation in minutes just by entering the formula.

Calculate the number of words and number of pages of all your academic documents.

Our Mission Client Satisfaction

*Awesome work. Awesome response time. Very thorough & clear. Love the results I get with MAH!*

Australia

*Work was done in a timely manner took it through grammarly checked for plagiarism very well satisfied*

Australia

*Great work for the short notice given. Thank you for never disappointing and helping out.*

Australia

*I received a full point on the assignment. Thank you for all the help with the assignment.*

Australia