Classify each variable in this data table as categorical or numeric (otherwise called continuous).
Column name and description
 ID: participant identification number
 Sex: Gender coded 1=male, 2=female
 Age group: Age coded as 1=15 to 17 years; 2=18 to 29 years; 3=30 to 64 year;, 4=65 or more years
 Activities: Respondents answer to the question: “How many activities did you attend this week which required you to leave your house? Include sporting, cultural, social and community activities”.
 Transport: Most commonly used transport to attend these activities coded 1=drive car/ride motorcycle, 2=passenger in car or motorcycle, 3=public transport, 4=other
Excerpt from the data set:
ID 
Sex 
Age group 
Number of activities attended this week 
Most commonly used transport 
1 
1 
1 
1 
1 
2 
1 
2 
2 
2 
3 
2 
3 
3 
3 
4 
2 
4 
4 
4 
Question 2 Note: Each student will get different answers as the data sets differ.
 Using the assignment data file allocated to you and R Commander, tabulate the relationship between gender (sex) and most frequently used mode of transport in the past month (transport). Please use the results from R Commander to create table in Word with appropriate headings (the output from R Commander is poorly labelled and will not be accepted).
 Using row or column percentages describe the relationship between gender and most frequently used mode of transport in the past month.
Question 3 Note: Each student will get different answers as the data sets differ.
 Using the assignment data file allocated to you and R Commander, graph the relationship between the number of activities attended in the past month (activities) and drivers licence status in this sample of 17yearold Australians. This figure should be prepared in R Commander with appropriate axis labels then copied and pasted into your assignment answers with appropriate title.
 Use appropriate statistics to describe the centre, spread and shape of the distribution of number of activities attended per month for each category of drivers licence status separately. You must clearly indicate which statistics describe the centre, which describe the spread and which describe the shape. Copying R Commander output is insufficient and should be avoided.
 Using the results in parts a. and b. above, describe the relationship between number of activities attended in the past month and drivers licence status.
Question 4 Note: Each student will get different answers as the data sets differ.
 Using the assignment data file allocated to you and R Commander, draw an appropriate graph of the relationship between selfreported sedentary hours per week and number of activities attended in the past month. When preparing the graph in R Commander don’t forget to provide meaningful labels on the axes.
 Using the graph on a. describe the form, direction and strength of the relationship between selfreported sedentary hours per week and number of activities attended in the past month.
Question 5
A group of 8 students were asked about their age, gender and area of study. The responses (sorted on age) are shown in the following table:
initials 
Age in years 
Gender 
Area of study 
HP 
17 
Male 
Nursing 
RT 
19 
Male 
Accounting 
SK 
20 
Female 
Psychology 
KZ 
20 
Male 
Psychology 
AN 
21 
Female 
Nursing 
KK 
22 
Female 
Psychology 
JH 
22 
Male 
Psychology 
PV 
25 
Female 
Nursing 
If you select one person at random from this group, what is the probability this person will be 18 or more years of age?
 If you selected one person at random from this group, what is the probability they will be a female who is studying psychology
 If you selected one female at random from this group, what is the probability they will be 21 or more years of age and studying nursing
Question 6
In Australia, the probability of having blood type B is 0.1.
 Blood type was recorded for a random sample of 250 Australian adults. Using R Commander, what is the probability that this random sample of 250 adults will contain 25 or fewer people whose blood group is B
 Suppose 200 random samples were drawn and each of these 200 samples contained exactly 250 people. We would predict 12% of all samples to contain fewer than how many people with type B blood
 Estimate the mean number of people with type B blood per sample. Show any working.
Question 7
The hours of sleep per night for 17 year olds is known to be Normally distributed with mean 8.2 hours and standard deviation of 0.6 hours. Using this information to address the following questions. Show any working.
 In 17 year olds, how many hours sleep corresponds with a zscore of 1.
 Choose one 17year old at random from this population. Using R Commander, estimate the probability that this person sleeps between 7.5 and 8.0 hours per night
 Choose a random sample of sixteen 17year olds. Using R Commander, estimate the probability that the sample mean for normal nights will lie between 7.5 and 8.0 hours per night? Show any working.
 Choose random sample of sixteen 17 years olds, how many of this group would you expect to sleep between 7.5 and 8.0 hours.
Categorical and Numeric Variables
The following figure shows an excerpt of the recoded data out of the available data. The data for age was missing from the given data set and the column was thus left blank.
Frequency Table 
TRANSPORT in past month 

Driver/rider of car or motor cycle 
Passenger of car 
Others (excluding public transport) 

GENDER 
female 
23 
59 
37 
male 
45 
76 
31 
Frequency Table: Row Percentage 
TRANSPORT in past month 


Driver/rider of car or motor cycle 
Passenger of car 
Others (excluding public transport) 
Total 

GENDER 
female 
19.32773% 
49.57983% 
31.09244% 
100% 
male 
29.60526% 
50% 
20.39474% 
100% 
Frequency Table: Column Percentage 
TRANSPORT in past month 

Driver/rider of car or motor cycle 
Passenger of car 
Others (excluding public transport) 

GENDER 
female 
33.82353% 
43.7037% 
54.41176% 
male 
66.17647% 
56.2963% 
45.58824% 


Total 
100% 
100% 
100% 
The row percentages show that 19.32% of females drove to their destinations in the past month, although most, that is, 49.57% were mainly driven by someone else, that is they were passengers. 31.09% reported some other means of transport. For the males, 50% were driven by someone else, 29.60% drove by themselves and 20.39% travelled by some other means. It is thus seen that most of the people are driven by someone.
Grouping 
Statistical Measure on Number of Activities in the past month 

License Status 
Mean 
Standard Deviation 
Pearson's Skewness 
not licensed 
6.269 
2.164326 
0.14226 
learners permit 
6.405 
2.181031 
0.10214 
licensed 
8.408 
2.110251 
0.327959 
The statistical measure which explains the centre of the distribution of the number of activities in the last month is the mean. The mean number of activities for the “not licensed” was found to be 6.26 , the mean for those with “learner’s permit” is 6.405 and the mean for those with “license” was 8.408. The measure of spread of the distribution for the respective groups is the standard deviation. It is 2.16 for the not licensed, 2.18 for the ones with learners permit and 2.11 for the licensed. The measure which explains shape of a distribution is the skewness measure.
A distribution with Pearson’s skewness more than 0 is leptokurtic, those less than 0 are mesokurtic and those have same shape as Gaussian or normal. The further away from normal, the larger is the absolute value of the coefficient. Those with license are leptokurtic whereas the other two groups are mesokurtic. The table above shows the measures as described.
The results from part (a) and part (b) implies that, the individuals with a license are the ones with most activity. The distribution for the unlicensed and those with learners permit has greater variation with fatter tails than normal.
b.
The relationship between the number of sedentary hours spent last month and the number of activities last month was found to be negatively related. The line of best fit, as depicted in blue in the figure in part (a) is explained by the regression equation:
The equation shows that with unit increase in sedentary hours, the number of the activities decrease by 0.434 units. The absence of sedentary hours implies that the number of activities would be 11.819.
 The probability that a person chosen at random from the 8 students are at least 18 years of age is given by the ration of the number of students who are greater than or equal to 18 years in age by the total number of students. The probability as computed using R commander was found to be equal to 0.875.
 The probability that a person chosen at random out of the 8 students in female and a psychology major is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of students, that is, 8. The probability was found to be 0.75.
 The conditional probability that a student is aged at least 21 years of age, given that the student is female is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of women. The probability was found to be 0.25.
 The probability of an Australian adult to have blood type B was given to be 0.1. Then the probability that a random sample of 250 people will contain at most 25 people with blood type B is given by P( X< 25) where X denotes the number of people in a sample of 250 who have blood type B. X then follows binomial with size 250 and probability parameter 0.1. Then the required probability was computed using R commander as 0.0838.
 It is given that it is of interest to determine the maximum number of blood type B’s such that 12% of multiple samples of size 250 of Australian adults have. This means that it is of interest to determine the value x where P ( X> x) = 0.12 where X is binomial(250,0.1). The value was computed using R commander as 19. So at most 19 people is found to have blood type B among 12% of the samples of size 250 drawn of Australian adults.
 The mean number of people with blood type B is then computed as the expectation of binomial distribution of size 250 and probability parameter 0.1. The mean value is then 250x0.1 which equals 25.
 The zscore for a random variable X following normal distribution is defined as
Z= (X mean of X)/standard deviation of X
Then using R commander the Z score when equal to 1, mean of normal is 8.2 and standard deviation 0.6, the value of X is given by X = 0.6* Z + 8.2 = 8.8.
 The probability that a variable X denoting the hours of sleep of the 17 year olds, following Normal(8.2, 0.6) will have value between 7.5 and 8 is given by:
P (7.5 <X< 8.0) = P(X<8.0) – P(X<7.5) = 0.247
 The distribution of the mean of a sample of size n of a random variable which follows normal distribution with mean ‘m’ and standard deviation ‘s’ is a normal distribution with mean ‘m’ and standard deviation ‘s’/n. Then the distribution of the mean hours of sleep of a sample of 16 students is Normal(8.2, 0.6/16). Let the mean statistic be denoted by X_{bar}
Then the probability that the mean lies between 7.5 and 8.0 is given by P (7.5 <X_{bar}< 8.0) = P(X_{bar }<8.0) – P(X_{bar }<7.5) which was computed to be 5x10^{8}as per R commander.
 The number of students who are then expected to sleep for 7.5 to 8 hours among the 16 students is given by the expectation of a binomial distribution with size 16 and probability 5*x10^{8}. Then the expected number of students is approximately 0.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Classifying Variables In A Data Table. Retrieved from https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/skewnessmeasure.html.
"Classifying Variables In A Data Table." My Assignment Help, 2021, https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/skewnessmeasure.html.
My Assignment Help (2021) Classifying Variables In A Data Table [Online]. Available from: https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/skewnessmeasure.html
[Accessed 12 September 2024].
My Assignment Help. 'Classifying Variables In A Data Table' (My Assignment Help, 2021) <https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/skewnessmeasure.html> accessed 12 September 2024.
My Assignment Help. Classifying Variables In A Data Table [Internet]. My Assignment Help. 2021 [cited 12 September 2024]. Available from: https://myassignmenthelp.com/freesamples/401077introductiontobiostatistics/skewnessmeasure.html.