Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave
Statistics and Data Analysis Questions

Question 1

Question 1. Consider the following data set containing the age and serum creatine (sc) levels for a set of people: (16 points)
 
age 23  23    27  27   39  41    47    49    50    52    54    54    56    57    58    58    60  61

sc 9.5 26.5 7.8 17.8 31.4  25.9 27.4 27.2 31.2 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7 


a) Calculate the mean, median and standard deviation of age and creatine level. (6 points)

b) Draw a scatter plot of these two variables. (4 points)

c) Normalize the two variables based on the z-score normalization technique. (6 points)

 

Question 2. Consider data for analysis that includes the attribute length whose recorded values are: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. (10 points)

 

 a) Use smoothing by bin means to smooth the above data, using a bin depth of 3. Illustrate your steps. (8 points) 

 b)  Discuss how you would determine if there are outliers in the data. (2 points)

 

Question 3. Consider the data set D = {(1.5, 1.7), (2, 1.9), (1.6, 1.8), (1.2, 1.5), (1.5, 1.0)}, where each element is a two-dimensional point in the Euclidian space. (18 points)


 a) Given a new data point, (1.4, 1.6), rank/order the points in D based on their similarity to the new point using as the similarity measure: (8 points)

•   a.1. the Euclidean distance

•   a.2. the cosine similarity

 

b) Normalize the data set, including point (1.4, 1.6), to make the Euclidian norm of each data point equal to 1. Rank/order the transformed/normalized points based on their similarity to the normalized (1.4, 1.6) using the Euclidean distance as a similarity measure. (10 points)

 

Question 4. Design an algorithm, and describe it in pseudocode, for the automatic generation of a concept hierarchy for categorical data based on the number of distinct values of the attributes in a given schema. Describe how an arbitrary schema would be represented in your framework and how the algorithm would generate a concept hierarchy for categorical data based on the number of distinct values of attributes in the given schema. (6 points)

 

Question 5. Consider the following data set containing information about participants in an online test. The dataset contains, for each participant, their age and the number of minutes it took them to complete the test: (12 points)
 
age 20   24   32   38   44   46   47   49   50  

test duration 27   30   29   23   25   25   30   34   32  
 
(a) Calculate the median and standard deviation of the test duration variable. (6 points)

(b) Normalize the test duration variable based on the z-score normalization technique. (6 points)

 

Question 6. Consider the following data set: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46. Use smoothing by bin median to smooth the above data, using a bin depth of 5. Illustrate your steps. [10 points]

 

Question 7. What is Cluster Analysis?

 

Question 8. What is data ?

• How is data structured ?

• How to use basic statistical descriptions to study/infer data characteristics ?

 

Question 9. When to do data pre?processing ?

support
Whatsapp
callback
sales
sales chat
Whatsapp
callback
sales chat
close