Normality test for cholesterol level dataset
Examine and comment on the normality/normal distribution of the following distribution of cholesterol values (mmol/L) using graphical and quantitative measures:
------------------------------------------------------------------------------------------------
5.3 6.2 3.4 5.8 7.5 7.2 7.9 6.2 8.1 7.4 7.3
7.5 6.5 5.2 8.5 5.5 5.6 7.9 4.3 3.8 5.6 8.5
3.5 4.8 5.1 5.5 6.8 5.5 4.3 5.7 7.3 8.4 6.9
7.8 4.2 7.6 3.9 5.5 3.1 3.0 4.5 5.4 6.9 6.8
------------------------------------------------------------------------------------------------
The data set followed in the current exercise for the data investigation involves the cholesterol level values of the participants involved in the experiment. The cholesterol values are measured in terms of mmol per litre (of blood) i.e., mmol/L. The data variable is a ratio level variable hence (Cohen & Holliday, 1996).
Figure 1.1: Cholesterol data histogram
The above graphical analysis of the data set for the cholesterol level observations is showing a comparison of the observation dataset (histogram) with a normal distribution (line chart) with a similar mean and standard distribution.
Table 1.1: Normality test for cholesterol level observations
The observation above is representing the normality test of the dataset for the dataset, where following the size of the sample (<100), the Shapiro-Wilk is the more suitable. As per this, the normality of the dataset is not possible to be rejected (p = 0.121 > 0.05) (Gerald, 2018) (Tsagris & Pandis, 2021).
Generate descriptive statistics for these data and 95% confidence intervals and comment on them. Also comment on and present which measures of central tendency and dispersion should be used to summarise this sample.
Table 1.2: Descriptive statistics
The descriptive statistics of the data show a mean value of 5.993 mmol/L with a range of 5.5 mmol/L (3.0-8.5 mmol/L). In order to choose the right measures for the central tendencies and dispersion of the data distribution, it is required to understand the data type and nature of the distribution. As discussed earlier, the data variable used in the current analysis is ratio type data. Now, for a ratio level data type, if the normality test satisfies then the measures used for describing the central tendencies and dispersion are the mean value and the standard deviation (SD) (Orcan, 2020). Therefore, as the normality test is not able to reject the normality of the dataset, it is best to address the mean as the central tendency and standard deviation as the dispersion measure (mean = 5.993 mmol/L, SD = 1.572 mmol/L).
Descriptive statistics and confidence intervals
With the 95% confidence level the intervals are estimated as 5.515 mmol/L (lower) & 6.471 mmol/L (upper). This indicates the data interval with 2SD limits.
Analyse the two data sets below to test the hypothesis that there are no significant differences between the two groups of participants in the following variables: aerobic capacity (AC), systolic blood pressure (SBP) and body fat (BF).
Group 1 – Healthy weight
---------------------------------------------------------------------------------------------------------------------------
AC1 40.2 53.2 55.5 40.2 41.3 32.3 58.3 50.9 48.9 56.0 47.2 39.9 38.6
SBP2 121 125 120 122 128 118 122 125 122 123 121 118 131
BF3 24.0 19.0 22.5 18.0 32.0 28.5 15.0 15.5 17.5 18.5 26.5 22.5 20.0
---------------------------------------------------------------------------------------------------------------------------
Group 2 - Overweight
---------------------------------------------------------------------------------------------------------------------------
AC1 27.5 42.2 21.3 32.3 32.3 30.9 30.2 35.2 36.2 44.3 40.0 30.2 44.2
SBP2 135 142 130 135 130 142 135 132 144 129 135 139 147
BF3 27.0 34.0 32.0 28.5 31.0 29.5 28.0 35.0 37.0 30.0 34.0 32.0 39.5
---------------------------------------------------------------------------------------------------------------------------
1ml.kg.min-1, 2 mm/Hg,3 %.
There are three key variables that are associated with the current dataset of health conditions for two groups (healthy weight and overweight participants). These key datasets are as follows:
- Aerobic capacity or AC in ml.kg/min
- Systolic blood pressure or SBP in mm/Hg
- Body fat or BF in % of body weight
All the data variables mentioned above are ratio type numerical data variables. As there are two groups involved in the analysis, there is total of six features in the dataset (three health variables x two groups).
Repeated measures paired samples (for healthy weight and overweight samples)
There are no significant differences between the repeated measures
Table 2.1: Normality Test
As per the normality test (based on the sample size <100, Shapiro-Wilk test is considered), for none of the variables (healthy or overweight), the normality assumption is possible to be rejected (p > 0.05); therefore, for the comparison between the conditions of the healthy weight and overweight person a paired t-test is chosen (Warmenhoven et al., 2018).
Table 2.2: Paired sample t-test – a. Paired sample statistics, b. Test output
As per the test, the significance values are showing significant differences (p<0.05) in between the conditions of a healthy weight and an overweight person for all variables (Coakes, 2007).
The paired sample statistics is representing the descriptive statistics for all the six variables involved in the data analysis. As per the sample statistics, AC for healthy person (mean = 46.35 ml.kg/min, SD = 8.14) is greater than that of an overweight one (mean = 34.37 ml.kg/min, SD = 6.87). SBP for healthy person (mean = 122.77 mm/Hg, SD = 3.72) is less than that of an overweight one (mean = 136.54 mm/Hg, SD = 5.8). BF for healthy person (mean = 21.5%, SD = 5.1) is greater than that of an overweight one (mean = 32.1%, SD = 3.7).
Hypothesis testing for two groups
The risk of type I error is 5%, as the alpha value considered is 0.05.
Investigate the bi-variate relationships between the three variables.
Bear in mind the relevant diagnostic tests and the likely consequence of conducting multiple statistical tests on data from the same sample.
There are three variables that are associated with the dataset analysed in the current exercise as follows:
- Diastolic blood pressure (diastolic bp) in mm/Hg
- Systolic blood pressure (systolic bp) in mm/Hg
- Age in years
All the above-mentioned variables are ratio level variables and the bivariate relationships are observed in the ratio scale for the data variables.
Independent group
Figure 3.1: Correlations analysis
The descriptive statistics of the two ratio level variables are presented in the analysis above firstly. The descriptive analysis shows the central tendencies and the dispersion:
- Systolic bp: mean – 123.67, SD – 14.76
- Diastolic BP: mean – 78.86, SD – 9.87
- Age: mean – 36.29, SD – 11.46
Report your findings and comment on the key relationships observed.
As per the analytical result (correlation analysis), there is a significant bivariate relationship between the diastolic and systolic pressure (p < 0.001) and between age and the diastolic pressure (p = 0.01). The alpha value considered for the test is 0.05.
Using the “Exercise_4.sav” data file, select the appropriate statistical technique to test the hypothesis that there is no difference in systolic blood pressure due to an intervention of diet and aerobic exercise training lasting 6, 12 and 24 weeks. Group 1 performs the intervention for 6 weeks, group 2 for 12 weeks and group 3 for 24 weeks. Report and interpret your findings and consider the use of post hoc tests to provide additional information.
The dataset for the current exercise involves an analysis of the impact of diet intervention on systolic blood pressure. In the experimental set up, three groups (1, 2 & 3) of participants are selected for 6, 12 and 24 weeks of diet interventions. The systolic bp for all the participants are collected for the analysis hence. Therefore, the key variables in the associated dataset are as follows:
- Systolic blood pressure or systolic bp in mm/Hg
- Diet intervention group (1 – 6 weeks intervention, 2 – 12 weeks intervention and 3 – 24 weeks intervention)
The systolic blood pressure is a ratio level data variable. On the other hand, the group variable is an interval variable that ranks the level or amount of intervention of the diet for the participants involved. The interval ranges from 1 to 3 representing the three interval groups.
- Group (weeks of diet intervention) wise independent sample
- There is a significant difference between the diet intervention groups regarding systolic bp
Table 4.1: Descriptive statistics – a. systolic BP, b. group (intervention)
The analysis outcomes above represent the descriptive statistics of the two key variables involved in the analysis, where the systolic BP is a ratio level variable and the group is an interval variable. The overall summary statistics (central tendency and dispersion measure) for these variables are as follows:
- systolic bp: mean – 178.61 mm/Hg, SD – 15.06 mm/Hg
- Diet intervention group: median – 2, range – 2 (minimum – 1, maximum – 3)
The above descriptive statistics of the systolic BP per group is summarizing the key summary statistics to elaborate the group-wise blood pressure data. The key summary statistics are represented as follows:
- 6 weeks intervention group systolic BP: median – 185 mm/Hg, range – 40 mm/Hg (minimum – 170 mm/Hg, maximum – 210 mm/Hg)
- 12 weeks intervention group systolic BP: median – 176 mm/Hg, range – 41 mm/Hg (minimum – 159 mm/Hg, maximum – 200 mm/Hg)
- 24 weeks intervention group systolic BP: median – 164.5 mm/Hg, range – 14 mm/Hg (minimum – 155 mm/Hg, maximum – 169 mm/Hg)
The Kruskal-Wallis statistics show a significant difference in means between groups (p = 0.003 < 0.05) (?uriš & Tirpáková, 2020).
As per the Mann-Whitney test (post-hoc) the difference is significant between 6 and 24 weeks (p = 0.001 < 0.017). (Bonferroni adjustment = 0.05/3 = 0.017) (Mann-Whitney, 2017) (Huynh et al, 2020). The difference in the central measures are observable in the descriptive statistics of the group-wise data:
- 6 weeks intervention group systolic BP: median – 185 mm/Hg, range – 40 mm/Hg
- 24 weeks intervention group systolic BP: median – 164.5 mm/Hg, range – 14 mm/Hg
Open the data file “Exercise_5.sav” and identify the eight separate heart rate (HR) variables recorded during incremental treadmill running (four exercise levels x two trials). Also identify the between-subjects variable within this data set and incorporate these three factors into your analysis of variance procedure.
The current exercise focuses on analysing the level of heart rate performance among two particular groups of sports athletes – runners and rowers. In order to conduct the current analysis, it is observed that there are two trials performed each with four levels of physical exercise before measuring or collecting the heart rate variables. Therefore, in total, there are eight variables (four exercise levels x two trials) for heart rate measures involved in the analysis performed. The data description of the data set used as follows:
- Main Sports participated in (the experiment) - runner or rower (1 – runner, 2 – rower)
- Heart rate variables –total eight: four exercise levels x two trials
The heart rate variables are ratio level variables showing the heart rate measures. While the “main sports participated” variable is an ordinal variable representing the athlete or sports group for the participants involved in the experiment and analysis.
- Group (main sports participated in) wise independent sample
- There is a significant difference in heart rate between the runner and rower group
The above descriptive statistics show the key summary statistics for the intervention group variable:
Median – 1.5
As per the normality test (Shapiro-Wilk, considered for sample size < 100), the normality hypothesis is not possible to be rejected (group-wise) (p > 0.05 for all groups) (Pour?Aboughadareh et al., 2019).
The above descriptive statistics for the group-wise heart rate data for each of the sports groups (runner & rower) represent the summary statistics as follows:
- HR Trial 1 Level 1 for runner: Median – 137.5, Range – 41 (minimum – 118, maximum – 159)
- HR Trial 1 Level 1 for rower: Median – 155, Range – 22 (minimum – 142, maximum – 164)
- HR Trial 1 Level 2 for runner: Median – 154, Range – 40 (minimum – 126, maximum – 166)
- HR Trial 1 Level 2 for rower: Median – 161.5, Range – 26 (minimum – 152, maximum – 178)
- HR Trial 1 Level 3 for runner: Median – 162.5, Range – 32 (minimum – 138, maximum – 170)
- HR Trial 1 Level 3 for rower: Median – 167.5, Range – 25 (minimum – 160, maximum – 185)
- HR Trial 1 Level 4 for runner: Median – 175, Range – 36 (minimum – 154, maximum – 190)
- HR Trial 1 Level 4 for rower: Median – 176.5, Range – 29 (minimum – 167, maximum – 196)
- HR Trial 2 Level 1 for runner: Median – 140.5, Range – 36 (minimum – 116, maximum – 152)
- HR Trial 2 Level 1 for rower: Median – 148.5, Range – 16 (minimum – 143, maximum – 159)
- HR Trial 2 Level 2 for runner: Median – 152.5, Range – 34 (minimum – 128, maximum – 162)
- HR Trial 2 Level 2 for rower: Median – 161, Range – 20 (minimum – 152, maximum – 172)
- HR Trial 2 Level 3 for runner: Median – 160, Range – 27 (minimum – 142, maximum – 169)
- HR Trial 2 Level 3 for rower: Median – 171.5, Range – 21 (minimum – 159, maximum – 180)
- HR Trial 2 Level 4 for runner: Median – 170.5, Range – 30 (minimum – 158, maximum – 188)
- HR Trial 2 Level 1 for rower: Median – 182.5, Range – 29 (minimum – 166, maximum – 195)
Figure 5.5: ANOVA (analysis of variance) Test outcome
As per the ANOVA test except trial 1 level 3 & and trial 2 level 4 (p > 0.05), there are significant differences in the heart rates between runners and rowers in all other observations (p < 0.05) (George & Mallery, 2019).
References
Abdellatif, D., El Moutaouakil, K., & Satori, K. (2018). Clustering and Jarque-Bera normality test to face recognition. Procedia Computer Science, 127, 246-255.
Coakes, S. J. (2007). Analysis without anguish: Version 12.0 for Windows. John Wiley & Sons, Inc.
Cohen, L., & Holliday, M. (1996). Practical statistics for students: An introductory text. Sage.
?uriš, V., & Tirpáková, A. (2020). A survey on the global optimization problem using Kruskal–Wallis test. In Annales Mathematicae et Informaticae (Vol. 52, pp. 281-298). Eszterházy Károly Egyetem Líceum Kiadó.
George, D., & Mallery, P. (2019). IBM SPSS statistics 26 step by step: A simple guide and reference. Routledge.
Gerald, B. (2018). A brief review of independent, dependent and one sample t-test. International Journal of Applied Mathematics and Theoretical Physics, 4(2), 50-54.
Huynh, T. L. D., Nasir, M. A., Nguyen, S. P., & Duong, D. (2020). An assessment of contagion risks in the banking system using non-parametric and Copula approaches. Economic Analysis and Policy, 65, 105-116.
Mann-Whitney, U. (2017). SPSS. 17 Kruskal–Wallis Mann-Whitney U.
Orcan, F. (2020). Parametric or non-parametric: Skewness to test normality for mean comparison. International Journal of Assessment Tools in Education, 7(2), 255-265.
Pour?Aboughadareh, A., Yousefian, M., Moradkhani, H., Poczai, P. & Siddique, K.H., (2019). STABILITYSOFT: A new online program to calculate parametric and non?parametric stability statistics for crop traits. Applications in plant sciences, 7(1), p.e01211.
Tsagris, M., & Pandis, N. (2021). Normality test: Is it really necessary?. American journal of orthodontics and dentofacial orthopedics, 159(4), 548-549.
Warmenhoven, J., Harrison, A., Robinson, M. A., Vanrenterghem, J., Bargary, N., Smith, R., ... & Pataky, T. (2018). A force profile analysis comparison between functional data analysis, statistical parametric mapping and statistical non-parametric mapping in on-water single sculling. Journal of Science and Medicine in Sport, 21(10), 1100-1105.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). Cholesterol Levels And Health Variables Essay: An Analysis.. Retrieved from https://myassignmenthelp.com/free-samples/es9y9-research-methodology/data-variable-and-data-type-file-A1DE2BC.html.
"Cholesterol Levels And Health Variables Essay: An Analysis.." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/es9y9-research-methodology/data-variable-and-data-type-file-A1DE2BC.html.
My Assignment Help (2022) Cholesterol Levels And Health Variables Essay: An Analysis. [Online]. Available from: https://myassignmenthelp.com/free-samples/es9y9-research-methodology/data-variable-and-data-type-file-A1DE2BC.html
[Accessed 10 December 2024].
My Assignment Help. 'Cholesterol Levels And Health Variables Essay: An Analysis.' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/es9y9-research-methodology/data-variable-and-data-type-file-A1DE2BC.html> accessed 10 December 2024.
My Assignment Help. Cholesterol Levels And Health Variables Essay: An Analysis. [Internet]. My Assignment Help. 2022 [cited 10 December 2024]. Available from: https://myassignmenthelp.com/free-samples/es9y9-research-methodology/data-variable-and-data-type-file-A1DE2BC.html.