Classify each variable in this data table as categorical or numeric (otherwise called continuous).
Column name and description
- ID: participant identification number
- Sex: Gender coded 1=male, 2=female
- Age group: Age coded as 1=15 to 17 years; 2=18 to 29 years; 3=30 to 64 year;, 4=65 or more years
- Activities: Respondents answer to the question: “How many activities did you attend this week which required you to leave your house? Include sporting, cultural, social and community activities”.
- Transport: Most commonly used transport to attend these activities coded 1=drive car/ride motorcycle, 2=passenger in car or motorcycle, 3=public transport, 4=other
Excerpt from the data set:
ID |
Sex |
Age group |
Number of activities attended this week |
Most commonly used transport |
1 |
1 |
1 |
1 |
1 |
2 |
1 |
2 |
2 |
2 |
3 |
2 |
3 |
3 |
3 |
4 |
2 |
4 |
4 |
4 |
Question 2 Note: Each student will get different answers as the data sets differ.
- Using the assignment data file allocated to you and R Commander, tabulate the relationship between gender (sex) and most frequently used mode of transport in the past month (transport). Please use the results from R Commander to create table in Word with appropriate headings (the output from R Commander is poorly labelled and will not be accepted).
- Using row or column percentages describe the relationship between gender and most frequently used mode of transport in the past month.
Question 3 Note: Each student will get different answers as the data sets differ.
- Using the assignment data file allocated to you and R Commander, graph the relationship between the number of activities attended in the past month (activities) and drivers licence status in this sample of 17-year-old Australians. This figure should be prepared in R Commander with appropriate axis labels then copied and pasted into your assignment answers with appropriate title.
- Use appropriate statistics to describe the centre, spread and shape of the distribution of number of activities attended per month for each category of drivers licence status separately. You must clearly indicate which statistics describe the centre, which describe the spread and which describe the shape. Copying R Commander output is insufficient and should be avoided.
- Using the results in parts a. and b. above, describe the relationship between number of activities attended in the past month and drivers licence status.
Question 4 Note: Each student will get different answers as the data sets differ.
- Using the assignment data file allocated to you and R Commander, draw an appropriate graph of the relationship between self-reported sedentary hours per week and number of activities attended in the past month. When preparing the graph in R Commander don’t forget to provide meaningful labels on the axes.
- Using the graph on a. describe the form, direction and strength of the relationship between self-reported sedentary hours per week and number of activities attended in the past month.
Question 5
A group of 8 students were asked about their age, gender and area of study. The responses (sorted on age) are shown in the following table:
initials |
Age in years |
Gender |
Area of study |
HP |
17 |
Male |
Nursing |
RT |
19 |
Male |
Accounting |
SK |
20 |
Female |
Psychology |
KZ |
20 |
Male |
Psychology |
AN |
21 |
Female |
Nursing |
KK |
22 |
Female |
Psychology |
JH |
22 |
Male |
Psychology |
PV |
25 |
Female |
Nursing |
If you select one person at random from this group, what is the probability this person will be 18 or more years of age?
- If you selected one person at random from this group, what is the probability they will be a female who is studying psychology
- If you selected one female at random from this group, what is the probability they will be 21 or more years of age and studying nursing
Question 6
In Australia, the probability of having blood type B is 0.1.
- Blood type was recorded for a random sample of 250 Australian adults. Using R Commander, what is the probability that this random sample of 250 adults will contain 25 or fewer people whose blood group is B
- Suppose 200 random samples were drawn and each of these 200 samples contained exactly 250 people. We would predict 12% of all samples to contain fewer than how many people with type B blood
- Estimate the mean number of people with type B blood per sample. Show any working.
Question 7
The hours of sleep per night for 17 year olds is known to be Normally distributed with mean 8.2 hours and standard deviation of 0.6 hours. Using this information to address the following questions. Show any working.
- In 17 year olds, how many hours sleep corresponds with a z-score of 1.
- Choose one 17-year old at random from this population. Using R Commander, estimate the probability that this person sleeps between 7.5 and 8.0 hours per night
- Choose a random sample of sixteen 17-year olds. Using R Commander, estimate the probability that the sample mean for normal nights will lie between 7.5 and 8.0 hours per night? Show any working.
- Choose random sample of sixteen 17 years olds, how many of this group would you expect to sleep between 7.5 and 8.0 hours.
Categorical and Numeric Variables
The following figure shows an excerpt of the recoded data out of the available data. The data for age was missing from the given data set and the column was thus left blank.
Frequency Table |
TRANSPORT in past month |
|||
Driver/rider of car or motor cycle |
Passenger of car |
Others (excluding public transport) |
||
GENDER |
female |
23 |
59 |
37 |
male |
45 |
76 |
31 |
Frequency Table: Row Percentage |
TRANSPORT in past month |
|
|||
Driver/rider of car or motor cycle |
Passenger of car |
Others (excluding public transport) |
Total |
||
GENDER |
female |
19.32773% |
49.57983% |
31.09244% |
100% |
male |
29.60526% |
50% |
20.39474% |
100% |
Frequency Table: Column Percentage |
TRANSPORT in past month |
|||
Driver/rider of car or motor cycle |
Passenger of car |
Others (excluding public transport) |
||
GENDER |
female |
33.82353% |
43.7037% |
54.41176% |
male |
66.17647% |
56.2963% |
45.58824% |
|
|
Total |
100% |
100% |
100% |
The row percentages show that 19.32% of females drove to their destinations in the past month, although most, that is, 49.57% were mainly driven by someone else, that is they were passengers. 31.09% reported some other means of transport. For the males, 50% were driven by someone else, 29.60% drove by themselves and 20.39% travelled by some other means. It is thus seen that most of the people are driven by someone.
Grouping |
Statistical Measure on Number of Activities in the past month |
||
License Status |
Mean |
Standard Deviation |
Pearson's Skewness |
not licensed |
6.269 |
2.164326 |
-0.14226 |
learners permit |
6.405 |
2.181031 |
-0.10214 |
licensed |
8.408 |
2.110251 |
0.327959 |
The statistical measure which explains the centre of the distribution of the number of activities in the last month is the mean. The mean number of activities for the “not licensed” was found to be 6.26 , the mean for those with “learner’s permit” is 6.405 and the mean for those with “license” was 8.408. The measure of spread of the distribution for the respective groups is the standard deviation. It is 2.16 for the not licensed, 2.18 for the ones with learners permit and 2.11 for the licensed. The measure which explains shape of a distribution is the skewness measure.
A distribution with Pearson’s skewness more than 0 is leptokurtic, those less than 0 are mesokurtic and those have same shape as Gaussian or normal. The further away from normal, the larger is the absolute value of the coefficient. Those with license are leptokurtic whereas the other two groups are mesokurtic. The table above shows the measures as described.
The results from part (a) and part (b) implies that, the individuals with a license are the ones with most activity. The distribution for the unlicensed and those with learners permit has greater variation with fatter tails than normal.
b.
The relationship between the number of sedentary hours spent last month and the number of activities last month was found to be negatively related. The line of best fit, as depicted in blue in the figure in part (a) is explained by the regression equation:
The equation shows that with unit increase in sedentary hours, the number of the activities decrease by 0.434 units. The absence of sedentary hours implies that the number of activities would be 11.819.
- The probability that a person chosen at random from the 8 students are at least 18 years of age is given by the ration of the number of students who are greater than or equal to 18 years in age by the total number of students. The probability as computed using R commander was found to be equal to 0.875.
- The probability that a person chosen at random out of the 8 students in female and a psychology major is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of students, that is, 8. The probability was found to be 0.75.
- The conditional probability that a student is aged at least 21 years of age, given that the student is female is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of women. The probability was found to be 0.25.
- The probability of an Australian adult to have blood type B was given to be 0.1. Then the probability that a random sample of 250 people will contain at most 25 people with blood type B is given by P( X< 25) where X denotes the number of people in a sample of 250 who have blood type B. X then follows binomial with size 250 and probability parameter 0.1. Then the required probability was computed using R commander as 0.0838.
- It is given that it is of interest to determine the maximum number of blood type B’s such that 12% of multiple samples of size 250 of Australian adults have. This means that it is of interest to determine the value x where P ( X> x) = 0.12 where X is binomial(250,0.1). The value was computed using R commander as 19. So at most 19 people is found to have blood type B among 12% of the samples of size 250 drawn of Australian adults.
- The mean number of people with blood type B is then computed as the expectation of binomial distribution of size 250 and probability parameter 0.1. The mean value is then 250x0.1 which equals 25.
- The z-score for a random variable X following normal distribution is defined as
Z= (X- mean of X)/standard deviation of X
Then using R commander the Z score when equal to 1, mean of normal is 8.2 and standard deviation 0.6, the value of X is given by X = 0.6* Z + 8.2 = 8.8.
- The probability that a variable X denoting the hours of sleep of the 17 year olds, following Normal(8.2, 0.6) will have value between 7.5 and 8 is given by:
P (7.5 <X< 8.0) = P(X<8.0) – P(X<7.5) = 0.247
- The distribution of the mean of a sample of size n of a random variable which follows normal distribution with mean ‘m’ and standard deviation ‘s’ is a normal distribution with mean ‘m’ and standard deviation ‘s’/n. Then the distribution of the mean hours of sleep of a sample of 16 students is Normal(8.2, 0.6/16). Let the mean statistic be denoted by Xbar
Then the probability that the mean lies between 7.5 and 8.0 is given by P (7.5 <Xbar< 8.0) = P(Xbar <8.0) – P(Xbar <7.5) which was computed to be 5x10-8as per R commander.
- The number of students who are then expected to sleep for 7.5 to 8 hours among the 16 students is given by the expectation of a binomial distribution with size 16 and probability 5*x10-8. Then the expected number of students is approximately 0.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Classifying Variables In A Data Table. Retrieved from https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html.
"Classifying Variables In A Data Table." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html.
My Assignment Help (2021) Classifying Variables In A Data Table [Online]. Available from: https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html
[Accessed 18 December 2024].
My Assignment Help. 'Classifying Variables In A Data Table' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html> accessed 18 December 2024.
My Assignment Help. Classifying Variables In A Data Table [Internet]. My Assignment Help. 2021 [cited 18 December 2024]. Available from: https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html.