Categorical Data
Data is the small fragments of raw information collected for study and analysis for making it useful in form an informed conclusion. Data is of three types namely, Categorical, discrete numerical and continuous numerical. Categorical data is also understood as the qualitative data which represents various characteristics like gender, marital status, city, etc. Such data may have a numerical value but which don’t have any mathematical meaning.
On the other hand, numerical data, as the name itself suggest, is with quantitative characteristic of measurement, like height, weight, etc. Further dividing into two as Discrete data which can be counted and have possible values which is either fixed or in a range going on to infinity; and Continuous data represent measurements that can be described using intervals.
As per the views of Cressie (2015), mmeasurement scales are of three types namely, Nominal, ordinal, interval and ratio scale. The nominal scale measures variable with a descriptive category, but have no Natural Numerical Value. The ordinal scale has both identity and magnitude property. The interval scale has identity, magnitude and equal intervals as its properties (Willer and Lernoud, 2016). The ratio scale has all four properties of measurement namely, identity, magnitude, equal intervals and minimum value of zero.
What is your gender? (Male = 0, Female = 1)
Data Type: Categorical data with qualitative characteristic of a gender.
Measurement Level: Nominal scale of measurement as satisfies only identity property.
What is your approximate undergraduate college GPA? (1.0 to 4.0)
Data Type: Discrete data with finite possible values.
Measurement Level: Ordinal scale of measurement satisfying both identity and magnitude as property.
About how many hours per week do you expect to work at an outside job this semester?
Data Type: Discrete date with infinite possible values.
Measurement Level: Ordinal scale of measurement satisfying both identity and magnitude as property.
What do you think is the ideal number of children for a married couple?
Data Type: Discrete data with finite possible values.
Measurement Level: Interval scale of measurement with identity, magnitude and equal intervals.
On a 1 to 5 scale, which best describes your parents? (1 = Mother clearly dominant ? 5 = Father clearly dominant)
Data Type: Discrete data
Measurement Level: Ordinal scale of measurement satisfying both identity and magnitude as property.
No. of Students (N): 30
Monthly Rent paid: 730 730 730 930 700 570
690 1,030 740 620 720 670
560 740 650 660 850 930
Discrete Data
600 620 760 690 710 500
730 800 820 840 720 700
(a)
Total of values = 18850
Mean = x = Σx / n
= 18850/30 = 628.33
Median = [(n/2)+(n/2+1)] / 2
= [(30/2)+(30/2+1)] / 2 = (15+16) / 2
= 15.5 i.e. average of the 15th and 16th value = 820+930/2 = 875
Mode = The values occurring more than once therefore it’s a multimodal data, thus grouping will give the more appropriate mode which is 730 occurring 4 times.
(b) Agreement of the measure of Central Tendency:
Since the values of mean, median and mode are not very close to each other, the measures of central tendency are more scattered. Since mean takes into account all the values; median calculates the mid value and mode analyses value that occurs more frequently, any value over 730 would be more favorable and agreeable situation.
(c) Calculation of Standard Deviation =
Value |
Mean |
A-B |
Square (A-B) |
A |
B |
C |
|
560 |
628.33 |
-68.33 |
4668.989 |
600 |
628.33 |
-28.33 |
802.5889 |
690 |
628.33 |
61.67 |
3803.189 |
730 |
628.33 |
101.67 |
10336.79 |
730 |
628.33 |
101.67 |
10336.79 |
730 |
628.33 |
101.67 |
10336.79 |
1030 |
628.33 |
401.67 |
161338.8 |
620 |
628.33 |
-8.33 |
69.3889 |
740 |
628.33 |
111.67 |
12470.19 |
800 |
628.33 |
171.67 |
29470.59 |
730 |
628.33 |
101.67 |
10336.79 |
740 |
628.33 |
111.67 |
12470.19 |
650 |
628.33 |
21.67 |
469.5889 |
760 |
628.33 |
131.67 |
17336.99 |
820 |
628.33 |
191.67 |
36737.39 |
930 |
628.33 |
301.67 |
91004.79 |
620 |
628.33 |
-8.33 |
69.3889 |
660 |
628.33 |
31.67 |
1002.989 |
690 |
628.33 |
61.67 |
3803.189 |
840 |
628.33 |
211.67 |
44804.19 |
700 |
628.33 |
71.67 |
5136.589 |
720 |
628.33 |
91.67 |
8403.389 |
850 |
628.33 |
221.67 |
49137.59 |
710 |
628.33 |
81.67 |
6669.989 |
720 |
628.33 |
91.67 |
8403.389 |
570 |
628.33 |
-58.33 |
3402.389 |
670 |
628.33 |
41.67 |
1736.389 |
930 |
628.33 |
301.67 |
91004.79 |
500 |
628.33 |
-128.33 |
16468.59 |
700 |
628.33 |
71.67 |
5136.589 |
Sum of Square of (A-B) = C = 657169.267
Mean of C = D = 21905.64223
Square Root of D = 148.0055404
(d) Sort and standardize the data.
Standardized value = X – μ / σ
Where:
X is the value
μ is the mean
σ is the standard deviation
Data is sorted from smallest to largest in the following table with their standard values:
Value |
Mean |
SD |
Standardized Data |
A |
B |
C |
(A-B)/C |
500 |
628.33 |
148.0055 |
-0.86706 |
560 |
628.33 |
148.0055 |
-0.46167 |
570 |
628.33 |
148.0055 |
-0.39411 |
600 |
628.33 |
148.0055 |
-0.19141 |
620 |
628.33 |
148.0055 |
-0.05628 |
620 |
628.33 |
148.0055 |
-0.05628 |
650 |
628.33 |
148.0055 |
0.146413 |
660 |
628.33 |
148.0055 |
0.213979 |
670 |
628.33 |
148.0055 |
0.281544 |
690 |
628.33 |
148.0055 |
0.416674 |
690 |
628.33 |
148.0055 |
0.416674 |
700 |
628.33 |
148.0055 |
0.484239 |
700 |
628.33 |
148.0055 |
0.484239 |
710 |
628.33 |
148.0055 |
0.551804 |
720 |
628.33 |
148.0055 |
0.619369 |
720 |
628.33 |
148.0055 |
0.619369 |
730 |
628.33 |
148.0055 |
0.686934 |
730 |
628.33 |
148.0055 |
0.686934 |
730 |
628.33 |
148.0055 |
0.686934 |
730 |
628.33 |
148.0055 |
0.686934 |
740 |
628.33 |
148.0055 |
0.754499 |
740 |
628.33 |
148.0055 |
0.754499 |
760 |
628.33 |
148.0055 |
0.889629 |
800 |
628.33 |
148.0055 |
1.159889 |
820 |
628.33 |
148.0055 |
1.295019 |
840 |
628.33 |
148.0055 |
1.43015 |
850 |
628.33 |
148.0055 |
1.497715 |
930 |
628.33 |
148.0055 |
2.038235 |
930 |
628.33 |
148.0055 |
2.038235 |
1030 |
628.33 |
148.0055 |
2.713886 |
(e) Are there outliers or unusual data values?
Values which have a standardized value or Z-score of over 2 are the unusual value like 930 and 1030.
(f) Using the Empirical Rule, do you think the data could be from a normal population?
For Normal Distribution, the Empirical rule is defined as values that fall in 1 Standard Deviation of the mean is 68%, 95% falls in 2 standard deviation of the mean and 99.73% in 4 standard deviation of the mean. That means
% of Values falling in range |
Higher Value |
Lower Value |
|
68% |
mean ± sd |
776.3355 |
480.3245 |
95% |
mean ± 2 sd |
924.341 |
332.319 |
99.73% |
mean ± 3 sd |
1220.352 |
36.308 |
Considering the Empirical Rule data is from the normal population apparently.
Find the mean, median, and mode for each quiz.
I |
II |
III |
IV |
|
60 |
65 |
66 |
10 |
|
60 |
65 |
67 |
49 |
|
60 |
65 |
70 |
70 |
|
60 |
65 |
71 |
80 |
|
71 |
70 |
72 |
85 |
|
73 |
74 |
72 |
88 |
|
74 |
79 |
74 |
90 |
|
75 |
79 |
74 |
93 |
|
88 |
79 |
95 |
97 |
|
99 |
79 |
99 |
98 |
|
Mean |
72 |
72 |
76 |
76 |
Median |
72 |
72 |
72 |
86.5 |
Mode |
60 |
65 |
72, 74 |
Do these measures of center agree? Explain.
Yes the measures of centre agree as the mean and median and mode are closely related.
For each data set, note strengths or weaknesses of each statistic of center.
Quiz |
Strength |
Weakness |
I |
Mean: Very useful measure results into average score of class. Median: The mid value derived minimize the error in skewed distribution Mode: Easily markable. |
Mean: Unusually high scores affect the average score Median: Insensitive to extreme values of the sample. Mode: Least useful information scope. |
II |
Mean: Symmetric average score. Median: Minimized error in skewed distribution Mode: Easily spotted. |
Mean: Minimal difference affect the average score. Median: High sensitivity to fresh additions. Mode: Two common scores create multimodal result.. |
III |
Mean: Very useful central tendency result. Median: Minimized skewed distribution errors Mode: Mode easily spot able. |
Mean: Competitive scores affect the average score. Median: Close scattering of scores. Mode: Small sample of frequency. |
IV |
Mean: Blend for average score of class. Median: The mid value minimize the error in skewed distribution Mode: Couldnot be spotted. |
Mean: Very scattered scores does not portray correct potential of class. Median: Mid values too high than least values. Mode: No frequency of two similar scores could be spotted |
Are the data symmetric or skewed? If skewed, which direction?
Data of quiz II, III and I is symmetric in order of its mention with the long tail of skewness extending to right. However the scores of quiz IV is very asymmetric with higher levels of variation and differences and also median is higher than mean, resulting into data skewed to left.
Continuous Data
Briefly describe and compare student performance on each quiz.
Student performance in Quiz II is symmetrical in order with a very low difference in the minimum and maximum score. On the other hand performance in Quiz I and II is more competitive with higher differences in the least and the maximum score. Lastly Quiz IV results show that many students were confident and well prepared than a few who turned out to be a low performing in this case.
Total Probability (one of the alternator fail or both fail or none fails) = 1
P (alternator 1 or 2 fail) = P(1 fails) or P(2 fails) = 0.02
P (Alternator 1 or 2 works well) = P(1 works) or P(2 works) = 1 - 0.02 = 0.98
Probability that both alternator fails = P(1 fails) * P(2 fails) (Anderberg, 2014)
= 0.02*0.02
= 0.0004
Probability that neither of Alternators fail = P(1 works) * P(2 works)
= 0.98 * 0.98
= 0.9604
Probability that one or the other alternator will fail
= P (1 fails) * P(2 works) OR P(2 fails) * P(1 works)
= 0.02 * 0.98
= 0.0196
Mean = x = Σx / n
= 59017/18
= 3278.722
X |
Mean |
X-Mean |
Square(X-Mean) |
3450 |
3278.722 |
171.278 |
29336.15 |
3363 |
3278.722 |
84.278 |
7102.781 |
3228 |
3278.722 |
-50.722 |
2572.721 |
3360 |
3278.722 |
81.278 |
6606.113 |
3304 |
3278.722 |
25.278 |
638.9773 |
3407 |
3278.722 |
128.278 |
16455.25 |
3324 |
3278.722 |
45.278 |
2050.097 |
3365 |
3278.722 |
86.278 |
7443.893 |
3290 |
3278.722 |
11.278 |
127.1933 |
3289 |
3278.722 |
10.278 |
105.6373 |
3346 |
3278.722 |
67.278 |
4526.329 |
3252 |
3278.722 |
-26.722 |
714.0653 |
3237 |
3278.722 |
-41.722 |
1740.725 |
3210 |
3278.722 |
-68.722 |
4722.713 |
3140 |
3278.722 |
-138.722 |
19243.79 |
3220 |
3278.722 |
-58.722 |
3448.273 |
3103 |
3278.722 |
-175.722 |
30878.22 |
3129 |
3278.722 |
-149.722 |
22416.68 |
59017 |
160129.6 |
Standard deviation = ( Rohatgi and Saleh, 2015)
= Square Root [160129.6/18]
= 94.31908
Standard error = Standard deviation / SQRT of no. of observation (Allen, 2014)
= 94.31908 / SQRT 18
= 22.23122
E = 22.23122 * 1.96 = 43.57319
95% confidence interval = (3278.722 – 43.57319) to (3278.722 + 43.57319)
= 3235.149 to 3322.295 steps
Sample size to obtain an error of ± 20 steps with 95 percent confidence
= [(1.96 * Standard deviation)/20]^2
= 85.43804
Line chart of the data
The chart chart shows that the No. Of steps taken by Dave while jogging has gone down from the first day. But he picked up gradually after the 3rd day. But the steps again reduced on 15tg, 17th and 18th day.
References
Books and Journal
Allen, A.O., 2014. Probability, statistics, and queueing theory. Academic Press.
Anderberg, M.R., 2014. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks (Vol. 19). Academic press.
Cressie, N., 2015. Statistics for spatial data. John Wiley & Sons.
Rohatgi, V.K. and Saleh, A.M.E., 2015. An introduction to probability and statistics. John Wiley & Sons.
Willer, H. and Lernoud, J., 2016. The world of organic agriculture. Statistics and emerging trends 2016 (Pp. 1-336). Research Institute of Organic Agriculture FiBL and IFOAM Organics International.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). Types Of Data And Measurement Scales. Retrieved from https://myassignmenthelp.com/free-samples/sta101-statistics-for-business/natural-numerical-value-file-A9B68D.html.
"Types Of Data And Measurement Scales." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/sta101-statistics-for-business/natural-numerical-value-file-A9B68D.html.
My Assignment Help (2022) Types Of Data And Measurement Scales [Online]. Available from: https://myassignmenthelp.com/free-samples/sta101-statistics-for-business/natural-numerical-value-file-A9B68D.html
[Accessed 05 December 2023].
My Assignment Help. 'Types Of Data And Measurement Scales' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/sta101-statistics-for-business/natural-numerical-value-file-A9B68D.html> accessed 05 December 2023.
My Assignment Help. Types Of Data And Measurement Scales [Internet]. My Assignment Help. 2022 [cited 05 December 2023]. Available from: https://myassignmenthelp.com/free-samples/sta101-statistics-for-business/natural-numerical-value-file-A9B68D.html.