Classifying Variables in a Data Table

Classify each variable in this data table as categorical or numeric (otherwise called continuous).

Column name and description

ID: participant identification number
Sex: Gender coded 1=male, 2=female
Age group: Age coded as 1=15 to 17 years; 2=18 to 29 years; 3=30 to 64 year;, 4=65 or more years
Activities: Respondents answer to the question: “How many activities did you attend this week which required you to leave your house? Include sporting, cultural, social and community activities”.
Transport: Most commonly used transport to attend these activities coded 1=drive car/ride motorcycle, 2=passenger in car or motorcycle, 3=public transport, 4=other

Excerpt from the data set:

ID	Sex (1=male, 2=female)	Age group (1=15 to 17 years, 2=18 to 29 years, 3=30 to 64 years, 4=65 or more years)	Number of activities attended this week	Most commonly used transport (1=drive car/ride motorcycle, 2=passenger in car or motorcycle, 3=public transport, 4=other)
1	1	1	1	1
2	1	2	2	2
3	2	3	3	3
4	2	4	4	4

Question 2 Note: Each student will get different answers as the data sets differ.

Using the assignment data file allocated to you and R Commander, tabulate the relationship between gender (sex) and most frequently used mode of transport in the past month (transport). Please use the results from R Commander to create table in Word with appropriate headings (the output from R Commander is poorly labelled and will not be accepted).

Using row or column percentages describe the relationship between gender and most frequently used mode of transport in the past month.

Question 3 Note: Each student will get different answers as the data sets differ.

Using the assignment data file allocated to you and R Commander, graph the relationship between the number of activities attended in the past month (activities) and drivers licence status in this sample of 17-year-old Australians. This figure should be prepared in R Commander with appropriate axis labels then copied and pasted into your assignment answers with appropriate title.

Use appropriate statistics to describe the centre, spread and shape of the distribution of number of activities attended per month for each category of drivers licence status separately. You must clearly indicate which statistics describe the centre, which describe the spread and which describe the shape. Copying R Commander output is insufficient and should be avoided.

Using the results in parts a. and b. above, describe the relationship between number of activities attended in the past month and drivers licence status.

Question 4 Note: Each student will get different answers as the data sets differ.

Using the assignment data file allocated to you and R Commander, draw an appropriate graph of the relationship between self-reported sedentary hours per week and number of activities attended in the past month. When preparing the graph in R Commander don’t forget to provide meaningful labels on the axes.

Using the graph on a. describe the form, direction and strength of the relationship between self-reported sedentary hours per week and number of activities attended in the past month.

Question 5

A group of 8 students were asked about their age, gender and area of study. The responses (sorted on age) are shown in the following table:

initials	Age in years	Gender	Area of study
HP	17	Male	Nursing
RT	19	Male	Accounting
SK	20	Female	Psychology
KZ	20	Male	Psychology
AN	21	Female	Nursing
KK	22	Female	Psychology
JH	22	Male	Psychology
PV	25	Female	Nursing

If you select one person at random from this group, what is the probability this person will be 18 or more years of age?

If you selected one person at random from this group, what is the probability they will be a female who is studying psychology

If you selected one female at random from this group, what is the probability they will be 21 or more years of age and studying nursing

Question 6

In Australia, the probability of having blood type B is 0.1.

Blood type was recorded for a random sample of 250 Australian adults. Using R Commander, what is the probability that this random sample of 250 adults will contain 25 or fewer people whose blood group is B

Suppose 200 random samples were drawn and each of these 200 samples contained exactly 250 people. We would predict 12% of all samples to contain fewer than how many people with type B blood

Estimate the mean number of people with type B blood per sample. Show any working.

Question 7

The hours of sleep per night for 17 year olds is known to be Normally distributed with mean 8.2 hours and standard deviation of 0.6 hours. Using this information to address the following questions. Show any working.

In 17 year olds, how many hours sleep corresponds with a z-score of 1.

Choose one 17-year old at random from this population. Using R Commander, estimate the probability that this person sleeps between 7.5 and 8.0 hours per night

Choose a random sample of sixteen 17-year olds. Using R Commander, estimate the probability that the sample mean for normal nights will lie between 7.5 and 8.0 hours per night? Show any working.

Choose random sample of sixteen 17 years olds, how many of this group would you expect to sleep between 7.5 and 8.0 hours.

Categorical and Numeric Variables

The following figure shows an excerpt of the recoded data out of the available data. The data for age was missing from the given data set and the column was thus left blank.

Frequency Table		TRANSPORT in past month
		Driver/rider of car or motor cycle	Passenger of car	Others (excluding public transport)
GENDER	female	23	59	37
GENDER	male	45	76	31

Frequency Table: Row Percentage		TRANSPORT in past month
		Driver/rider of car or motor cycle	Passenger of car	Others (excluding public transport)	Total
GENDER	female	19.32773%	49.57983%	31.09244%	100%
GENDER	male	29.60526%	50%	20.39474%	100%

Frequency Table: Column Percentage		TRANSPORT in past month
		Driver/rider of car or motor cycle	Passenger of car	Others (excluding public transport)
GENDER	female	33.82353%	43.7037%	54.41176%
GENDER	male	66.17647%	56.2963%	45.58824%
	Total	100%	100%	100%

The row percentages show that 19.32% of females drove to their destinations in the past month, although most, that is, 49.57% were mainly driven by someone else, that is they were passengers. 31.09% reported some other means of transport. For the males, 50% were driven by someone else, 29.60% drove by themselves and 20.39% travelled by some other means. It is thus seen that most of the people are driven by someone.

Grouping	Statistical Measure on Number of Activities in the past month
License Status	Mean	Standard Deviation	Pearson's Skewness
not licensed	6.269	2.164326	-0.14226
learners permit	6.405	2.181031	-0.10214
licensed	8.408	2.110251	0.327959

The statistical measure which explains the centre of the distribution of the number of activities in the last month is the mean. The mean number of activities for the “not licensed” was found to be 6.26 , the mean for those with “learner’s permit” is 6.405 and the mean for those with “license” was 8.408. The measure of spread of the distribution for the respective groups is the standard deviation. It is 2.16 for the not licensed, 2.18 for the ones with learners permit and 2.11 for the licensed. The measure which explains shape of a distribution is the skewness measure.

A distribution with Pearson’s skewness more than 0 is leptokurtic, those less than 0 are mesokurtic and those have same shape as Gaussian or normal. The further away from normal, the larger is the absolute value of the coefficient. Those with license are leptokurtic whereas the other two groups are mesokurtic. The table above shows the measures as described.

The results from part (a) and part (b) implies that, the individuals with a license are the ones with most activity. The distribution for the unlicensed and those with learners permit has greater variation with fatter tails than normal.

The relationship between the number of sedentary hours spent last month and the number of activities last month was found to be negatively related. The line of best fit, as depicted in blue in the figure in part (a) is explained by the regression equation:

The equation shows that with unit increase in sedentary hours, the number of the activities decrease by 0.434 units. The absence of sedentary hours implies that the number of activities would be 11.819.

The probability that a person chosen at random from the 8 students are at least 18 years of age is given by the ration of the number of students who are greater than or equal to 18 years in age by the total number of students. The probability as computed using R commander was found to be equal to 0.875.

The probability that a person chosen at random out of the 8 students in female and a psychology major is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of students, that is, 8. The probability was found to be 0.75.

The conditional probability that a student is aged at least 21 years of age, given that the student is female is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of women. The probability was found to be 0.25.

The probability of an Australian adult to have blood type B was given to be 0.1. Then the probability that a random sample of 250 people will contain at most 25 people with blood type B is given by P( X< 25) where X denotes the number of people in a sample of 250 who have blood type B. X then follows binomial with size 250 and probability parameter 0.1. Then the required probability was computed using R commander as 0.0838.

It is given that it is of interest to determine the maximum number of blood type B’s such that 12% of multiple samples of size 250 of Australian adults have. This means that it is of interest to determine the value x where P ( X> x) = 0.12 where X is binomial(250,0.1). The value was computed using R commander as 19. So at most 19 people is found to have blood type B among 12% of the samples of size 250 drawn of Australian adults.
The mean number of people with blood type B is then computed as the expectation of binomial distribution of size 250 and probability parameter 0.1. The mean value is then 250x0.1 which equals 25.
The z-score for a random variable X following normal distribution is defined as

Z= (X- mean of X)/standard deviation of X

Then using R commander the Z score when equal to 1, mean of normal is 8.2 and standard deviation 0.6, the value of X is given by X = 0.6* Z + 8.2 = 8.8.

The probability that a variable X denoting the hours of sleep of the 17 year olds, following Normal(8.2, 0.6) will have value between 7.5 and 8 is given by:

P (7.5 <X< 8.0) = P(X<8.0) – P(X<7.5) = 0.247

The distribution of the mean of a sample of size n of a random variable which follows normal distribution with mean ‘m’ and standard deviation ‘s’ is a normal distribution with mean ‘m’ and standard deviation ‘s’/n. Then the distribution of the mean hours of sleep of a sample of 16 students is Normal(8.2, 0.6/16). Let the mean statistic be denoted by X_bar

Then the probability that the mean lies between 7.5 and 8.0 is given by P (7.5 <X_bar< 8.0) = P(X_bar<8.0) – P(X_bar<7.5) which was computed to be 5x10^-8as per R commander.

The number of students who are then expected to sleep for 7.5 to 8 hours among the 16 students is given by the expectation of a binomial distribution with size 16 and probability 5*x10^-8. Then the expected number of students is approximately 0.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Classifying Variables In A Data Table. Retrieved from https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html.

"Classifying Variables In A Data Table." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html.

My Assignment Help (2021) Classifying Variables In A Data Table [Online]. Available from: https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html
[Accessed 01 June 2025].

My Assignment Help. 'Classifying Variables In A Data Table' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html> accessed 01 June 2025.

My Assignment Help. Classifying Variables In A Data Table [Internet]. My Assignment Help. 2021 [cited 01 June 2025]. Available from: https://myassignmenthelp.com/free-samples/401077-introduction-to-biostatistics/skewness-measure.html.

Get instant help from 5000+ experts for

Writing Rewriting Editing

Subject/course code

❮ ❯

Pages

250 words

Description (write/attach)

Attach file

I accept the T&C, agree to receive offers & updates

Have a coupon?