Connect on Whatsapp : +97143393999, Uninterrupted Access, 24x7 Availability, 100% Confidential. Connect Now

### Question:

Explain the concept of having the imbalance data in classification techniques and the way that it should be treated in developing the classification models?

2-Explain the concept of over-fitting. Explain how overfitting can be avoided?

3-Give two examples of how logistics regression can be used.  You only need to explain the problem. One example is the bank that are using logistic regression to classify its new customers for loan approval. The bank wanted to identify customers that are more likely to default on their loan. Explain why you cannot use linear regression in your examples.

There are 500 client records in the first sheet of the file Toy-Info which have shopped many special toys from an e-Business website. Each record includes data on types of product purchased (between 1-5), purchase amount (\$), age, gender, marital status, whether the client has a membership and whether the customer has a discount card.

A business analyst has applied the k-means clustering method on all seven variables. The analyst increased the number of clusters to recommend a proper value of k. The resultant tests for k=5 and k=6 shown in the following sheets of the file revealed the best k as k=6.

a)Explain how the analyst found that k=6 is a proper number of clusters. Refer the relevant sheet name, table name and the values you compared.

b)Describe all 6 clusters by their average characteristics.

The following data is the results of a 4- year study conducted to assess how age, weight, and gender influence the risk of diabetes. Risk is interpreted as the probability that the patient will have diabetes over the next 4-year period.

a)Develop a multiple regression model that relates risk of diabetes to the person’s age, weight and the gender. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression.

b)Develop an estimated multiple regression model that relates risk of diabetes to the person’s age, weight, gender and life style. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression.

c)What is the risk percentage of diabetes over the next 4 years for a 55-year-old man living in a big city with 70 kg weight?

a)Determine the selected input variables in each model and explain why the analyst has changed one of the input variables.

b)Write the obtained logistic regression equation for the first model shown in worksheet “4-1-1” and predict a customer with Contract duration of 16 months, Bonus data of 63 GB and Usage of 237 GB whether he/she will decide to buy the new service or not? Explain how you found the prediction.

c)Find the class 1 and class 0 errors based on the sheet “4-1-2” and compare your results with the confusion matrix. Explain which kind of these errors are more undesirable in this model?

d)In the second model (shown in worksheet “4-2-1”), compare the accuracy of the model with the first model. Which one do you recommend?

### Cite This Work

[Accessed 20 August 2022].

My Assignment Help. 'Business Analytics' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/isys3375-business-analytics/investigate-a-polynomial-regression-model.html> accessed 20 August 2022.

My Assignment Help. Business Analytics [Internet]. My Assignment Help. 2020 [cited 20 August 2022]. Available from: https://myassignmenthelp.com/free-samples/isys3375-business-analytics/investigate-a-polynomial-regression-model.html.

### Content Removal Request

If you are the original writer of this content and no longer wish to have your work published on Myassignmenthelp.com then please raise the content removal request.

## 5% Cashback

On APP - grab it while it lasts!