Answer:
Data Description:
The Data on body temperature by stat crunch is selected for statistical inquiry and analysis. The data has 3 variable among which one is categorical. The Gender variable has two categories of male and female. The other variables are body temperature and heart rate. The body temperature measured in Fahrenheit and the heart rate in bpm. The size of the data is 130.
Research Proposal:
One might be interested in the question whether the body temperate in the given sample differs from male to female. We might be interested in similar question regarding heart rate. Further we can check if there is any relation between temperature and heart rate.
Descriptive Statistics:
The following is the summary statistics for body temperature and heart rate. It seems there is no much difference between the body temp of male and female, which needs to be confirmed by the test. In heart rate, even though the means look to be near for male and female, there is huge difference in variances.
Summary statistics for Body Temp:
Group by: Gender
Gender
|
n
|
Mean
|
Variance
|
Std. dev.
|
Std. err.
|
Median
|
Range
|
Min
|
Max
|
Q1
|
Q3
|
Female
|
65
|
98.393846
|
0.55277404
|
0.74348775
|
0.092218306
|
98.4
|
4.4
|
96.4
|
100.8
|
98
|
98.8
|
Male
|
65
|
98.104615
|
0.48825962
|
0.69875576
|
0.086669986
|
98.1
|
3.2
|
96.3
|
99.5
|
97.6
|
98.6
|
Summary statistics for Heart Rate:
Group by: Gender
Gender
|
n
|
Mean
|
Variance
|
Std. dev.
|
Std. err.
|
Median
|
Range
|
Min
|
Max
|
Q1
|
Q3
|
Female
|
65
|
74.153846
|
65.694712
|
8.1052274
|
1.0053297
|
76
|
32
|
57
|
89
|
68
|
80
|
Male
|
65
|
73.369231
|
34.517788
|
5.8751841
|
0.7287269
|
73
|
28
|
58
|
86
|
70
|
78
|
The best graphical view of summary statistics, box plots are shown below.
The following is the histogram drawn for Body Temperature. The given skewness and kurtosis gives the extent the data deviates from standard normal.
Column
|
Skewness
|
Kurtosis
|
Body Temp
|
-0.0044191312
|
0.7804574
|
The histogram of the heart rate and its deviation from the standard normal is measured by skewness and kurtosis values mentioned below.
Summary statistics:
Column
|
Skewness
|
Kurtosis
|
Heart Rate
|
-0.17835296
|
-0.46302097
|
An easy way to see the normality of the data is to see the Q-Q plot. The Q-Q plot shows that the data are normally distributed which can be seen by the data points lying near to the line
A scatter plot is displayed below for Body Temperature Vs Heart Rate. It shows a positive correlation between the variables.
Correlation between Body Temp and Heart Rate is: 0.2536564(p-value=0.0036)
The pair plot above shows how Body Temp and Heart Rate are related in each gender.
Correlation between Body Temp and Heart Rate for Female Gender = 0.28693115(p value=0.0205)
Correlation between Body Temp and Heart Rate for Male Gender = 0.19558938(p value=0.1184).
Inferential Statistics:
The first test is about checking whether there is significant correlation between the Body Temp and Heart Rate.
Null Hypothesis: There is no correlation between the Body Temp and Heart Rate.
Alternative Hypothesis: There is correlation between the Body temp and Heart Rate.
We perform this test by using Pearson correlation test where we use t statistic. The p-value of the test is 0.0036 < 0.05 and we can infer that there is significant correlation between the variables ,Body Temp and Heart Rate variables.
When similar test is conducted with in each females and males, we got p-values as 0.021 and 0.118(>0.05) which means there is significant correlation among females and no significant correlation among males.
In the second test, we are interested to know whether the mean values of Body Temp are significantly different between males and females.
Null Hypothesis (Ho): There is no difference in the mean Body temperature of males and females.
Alternative Hypothesis (H1): There is difference in the mean Body temperature of males and females.
To test this hypothesis, we use t-test for 2 samples. The following is the output from the software. Normality assumption is satisfied by both the samples (Shapiro-Wilk Normality test).Equal variance assumption is also satisfied (Levene’s test).
The p-value of t test is 0.0239 < 0.05 which means we have enough evidence to reject Ho. or otherwise we can conclude there is difference in the mean Body temperature of males and females.
Two sample T hypothesis test:
μ1 : Mean of Body Temp(Male)
μ2 : Mean of Body Temp(Female)
μ1 - μ2 : Difference between two means
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0
(without pooled variances)
Hypothesis test results:
Difference
|
Sample Diff.
|
Std. Err.
|
DF
|
T-Stat
|
P-value
|
μ1 - μ2
|
0.28923077
|
0.12655395
|
127.5103
|
2.2854345
|
0.0239
|
Shapiro-Wilk normality test results:
Sample
|
n
|
Stat
|
P-Value
|
var5
|
65
|
0.96797487
|
0.0902
|
var8
|
65
|
0.98940716
|
0.8545
|
Homogeneity of Variance results:
Data stored in separate columns.
Levene's Test for Homogeneity of Variance
Test Statistic
|
DF 1
|
DF 2
|
P-value
|
0.061118127
|
1
|
128
|
0.8051
|
In the second test, we are interested to know whether the mean values of Heart Rate are significantly different between males and females.
Null Hypothesis: There is no difference in the mean Heart Rate of males and females.
Alternative Hypothesis: There is difference in the mean Heart Rate of males and females
Again we use the t-test of samples and output from the software is given below. The assumption of normality is satisfied (Shapiro-Wilk normality test). The p-value of the test is 0.587 which means we have to accept the null hypothesis. We can conclude that there is no significant difference in the mean heart rates of males and females.
Two sample T hypothesis test:
μ1 : Mean of Heart Rate(Male)
μ2 : Mean of Heart Rate(Female)
μ1 - μ2 : Difference between two means
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0
Hypothesis test results:
Difference
|
Sample Diff.
|
Std. Err.
|
DF
|
T-Stat
|
P-value
|
μ1 - μ2
|
0.78461538
|
1.2416645
|
116.70438
|
0.6319061
|
0.5287
|
Shapiro-Wilk normality test results:
Sample
|
n
|
Stat
|
P-Value
|
var6
|
65
|
0.97206506
|
0.1483
|
var9
|
65
|
0.98813546
|
0.7912
|
Regression Analysis:
Consider the model y = a + b*x + error
Where y is heart rate, x is body temperature.
We want to significance of the model.
So we are interested in two hypothesis
- Null: b =0
Alternative: b0
- Null: a = 0
Alternative: a 0
The following is the output from the software.
Simple linear regression results:
Dependent Variable: Body Temp
Independent Variable: Heart Rate
Body Temp = 96.306754 + 0.026334549 Heart Rate
Sample size: 130
R (correlation coefficient) = 0.2536564
R-sq = 0.064341571
Estimate of error standard deviation: 0.71196889
Parameter estimates:
Parameter
|
Estimate
|
Std. Err.
|
Alternative
|
DF
|
T-Stat
|
P-value
|
Intercept
|
96.306754
|
0.65770318
|
≠ 0
|
128
|
146.4289
|
<0.0001
|
Slope
|
0.026334549
|
0.0088763359
|
≠ 0
|
128
|
2.9668265
|
0.0036
|
Analysis of variance table for regression model:
Source
|
DF
|
SS
|
MS
|
F-stat
|
P-value
|
Model
|
1
|
4.4617613
|
4.4617613
|
8.8020594
|
0.0036
|
Error
|
128
|
64.883162
|
0.5068997
|
|
|
Total
|
129
|
69.344923
|
|
|
|
From the output, we can see that the constant (a) and slope (b) are significantly different from zero. The estimated line is given and residuals are normally distributed. We can see from the graphs that all the assumptions of the simple linear regression are satisfied.
References
Anderson, T. W., & Finn, J. D. (1996). The new statistical analysis of data. New York: Springer.
Mendenhall, W., & Sincich, T. (2003). A second course in statistics: Regression analysis. Upper Saddle River, NJ: Pearson Education.
Pretorius, T. B. (1995). Inferential statistics: Hypothesis testing and decision-making. Cape Town: Percept.
Rohatgi, V. K., & Saleh, A. K. (2015). An introduction to probability theory and statistics. Hoboken, NJ: John Wiley & Sons.