Application of Statistical Knowledge in Real Data Analysis essay.

Analysis of One Quantitative Variable

For this project, you must find some sort of published, existing data. Possible sources include almanacs, magazine, journal articles, textbooks, web resources, athletic teams, newspapers, reference materials, campus organizations, professors with experimental data, electronic data repositories, the sports pages or collect your own data from fellow students, neighbours or friends. The dataset you select must have at least 25 cases. It also must have at least two categorical variables and at least two quantitative variables. Choose or collect a dataset that interests you!

See the description below of what analysis should be included. Use technology to automate calculations Write Your Report Cut and paste all relevant computer output with your analysis. Be sure to include both computer output and your discussion of that output in every case. As you discuss each analysis, be sure to interpret what you are finding in the context of your particular data situation. Include all of the following.
How did you find or collect your data (If you found the data, give a clear reference. If you collected the data, describe clearly the data collection process you used.) What are the cases What are the variables What population do you believe the sample might.generalize to Is the sample data from an experiment or an observational study Include a copy of the dataset.

• Analysis of One Quantitative Variable: For at least one of the quantitative variables, include summary statistics (mean, standard deviation, five number summary) and at least one graphical display. Are there any outliers Is the distribution symmetric, skewed, or some other shape
• Analysis of One Categorical Variable: For at least one of the categorical variables, include a frequency table and a relative frequency table.
• Analysis of One Relationship between Two Categorical Variables: Analyse your own data for a chi-square test for association between the two Categorical Variables. State the hypotheses of the test. Conduct the test, showing all details such as expected counts, contribution of each cell to the chi-square statistic, degrees of freedom used, and the p-value. State a clear conclusion in context. If the results are significant, which cells contribute the most to the chi-square statistic For these cells, are the observed counts greater than or less than expected Whether or not the results are significant, describe the relationship as if you were writing an article for your campus paper. If the results are significant, can we infer a causal relationship between the variables.

• Analysis of One Relationship between a Categorical Variable and a Quantitative Variable Include a side-by-side histogram and describe it. Does there appear to be an association between the two variables If so, describe it. Also, use some summary statistics to compare the groups.
• Analysis of One Relationship between Two Quantitative Variables: For at least one pair of quantitative variables, include a scatterplot and discuss it.
• Conclusion: Briefly summarize the most interesting features of your data.

Analysis of One Quantitative Variable

This study sought to apply statistical knowledge learnt in class in analyzing real data. I obtained my dataset from the internet, the link to the dataset is given The data contains 60 observations with a total of four variables (two categorical and two numerical/quantitative variables) namely

Variable	Type
Prior Sexual Experience	Categorical
Dose of Drug	Categorical
Sexual Activity Index	Numerical/Quantitative
Age of the respondent	Numerical/Quantitative

Analysis of One Quantitative Variable

The first analysis done is looking at the summary statistics of one of the quantitative variable. Some of the measures analyzed include, mean, median, minimum, maximum, mode, standard deviation, skewness and kurtosis of the data.

Table 1: Descriptive statistics (Quantitative data):
Statistic	Sexual Activity Index
Nbr. of observations	60
Minimum	9.370
Maximum	23.550
Range	14.180
1st Quartile	12.405
Median	15.020
3rd Quartile	17.323
Mean	15.152
Variance (n-1)	9.593
Standard deviation (n-1)	3.097
Variation coefficient	0.203
Skewness (Pearson)	0.297
Kurtosis (Pearson)	-0.547
Lower bound on mean (95%)	14.352
Upper bound on mean (95%)	15.952

As we can see from the table, the average sexual activity index of the respondents is 15.152 with a median of 15.02. The maximum and minimum sexual activity index are 9.37 and 23.55 respectively. The 95% confidence interval for the sample mean showed that the range is between 14.352 (lower bound on the mean) and 15.952 (upper bound on the mean).

Also measured were the Skewness and Kurtosis; skewness measures the distribution symmetry of the dataset. The skewness and kurtosis values are close to zero implying that the data could have come from a normally distributed dataset.

Table 2 below gives the normality test. Both the Kolmogorov-Smirnov test and Shapiro-Wil test showed were insignificant (p-value > 0.05). We thus fail to reject the null hypothesis and conclude that the data is normally distributed at % level of significance.

Table 2: Tests of Normality
	Kolmogorov-Smirnov^a			Shapiro-Wilk
	Statistic	df	Sig.	Statistic	df	Sig.
Sexual Activity Index	.094	60	.200^*	.974	60	.226
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

Next a histogram of the sexual activity index is presented and as can be seen the data seems to be normally distributed though not perfect as is expected of the bell-shaped curve.The above boxplot shows that there is no any outlier in the dataset and that the data seems to be normally distributed.

Analysis of One Categorical Variable

In analyzing one categorical variable the study considered analyzing dose of drug taken by the respondents. There were three dosage namely vehicle, 10 mg ad 15 mg. Table 3 below presents the frequency table for the categorical variable dose of drug. As can be seen in the table, equal number of respondents took the three different dosage, i.e. 33.3% (n = 20) took vehicle, another 33.3% (n = 20) took 10 mg and the remaining 33.3% (n = 20) took 15 mg.

Table 3: Dose of Drug
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Vehicle	20	33.3	33.3	33.3
	10 mg	20	33.3	33.3	66.7
	15 mg	20	33.3	33.3	100.0
	Total	60	100.0	100.0

The above can also be visualized in the bar chart presented below;

Analysis of One Relationship between Two Categorical Variables:

In analyzing relationship between two categorical variable, the study considered Prior Sexual Experience and Dose of Drug.

Analysis of One Categorical Variable

The following hypothesis was tested using Chi-Square test of association;

H₀: There is no association between Prior Sexual Experience and Dose of Drug.

H₁: There is association between Prior Sexual Experience and Dose of Drug.

This was tested at 5% level of significance.

To test the above hypothesis, a Pearson Chi-Squared (χ²) test of independence (association) was used.

*Table 4: Dose of Drug Prior Sexual Experience: Cross tabulation**
			Prior Sexual Experience:		Total
			No Sexual Experience	Prior Sexual Experience	Total
Dose of Drug	Vehicle	Count	10	10	20
	Vehicle	Expected Count	10.0	10.0	20.0
	10 mg	Count	10	10	20
	10 mg	Expected Count	10.0	10.0	20.0
	15 mg	Count	10	10	20
	15 mg	Expected Count	10.0	10.0	20.0
Total		Count	30	30	60
Total		Expected Count	30.0	30.0	60.0

As can be seen in the above table (table 4), there is no any difference in terms of prior sexual experience of respondents and the dose of drug. The count and expected counts are equal across all the columns and rows.

Chi-Square Tests
	Value	df	Asymp. Sig. (2-sided)
Pearson Chi-Square	.000^a	2	1.000
Likelihood Ratio	.000	2	1.000
Linear-by-Linear Association	.000	1	1.000
N of Valid Cases	60
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.00.

A chi-square test of association was performed to determine whether there was association between prior sexual experience of respondents and the dose of drug taken by the respondents. There was no association between prior sexual experience of respondents and the dose of drug taken by the respondents, = 0.000, p > .05. We can conclude that prior sexual experience of the respondent does not significantly influence the dose of drug taken by the respondent.

Analysis of One Relationship between a Categorical Variable and a Quantitative Variable

This section sought to analyze One Relationship between a Categorical Variable and a Quantitative Variable. The relationship we looked at is that between prior sexual experience and sexual activity index. The hypothesis we sought to test is;

H₀: The mean sexual activity index for the respondents with prior sexual experience is the same as that of the respondents with no prior sexual experience

H₁: The mean sexual activity index for the respondents with prior sexual experience is different from that of the respondents with no prior sexual experience.

This was tested at 5% level of significance.

In testing the hypothesis, an independent t-test was used. This test is usual when comparing the two groups of data sets like in our case.

Table 6: Group Statistics
	Prior Sexual Experience:	N	Mean	Std. Deviation	Std. Error Mean
Sexual Activity Index	No Sexual Experience	30	13.8967	2.66063	.48576
Sexual Activity Index	Prior Sexual Experience	30	16.4067	3.02956	.55312

Table 6 above gives the group statistics. As can be seen, the average sexual activity index for the respondents with prior sexual experience is 16.41 while that of the respondents with no sexual experience is 13.90. Respondents with prior sexual experience have a much higher sexual activity index when compared to the respondents with no prior sexual experience.

Table 7: Independent Samples Test
		Levene's Test for Equality of Variances		t-test for Equality of Means
		F	Sig.	t	df	Sig. (2-tailed)	Mean Difference	Std. Error Difference	95% Confidence Interval of the Difference
									Lower	Upper
Sexual Activity Index	Equal variances assumed	.194	.661	-3.410	58	.001	-2.51	.73614	-3.98	-1.036
	Equal variances not assumed			-3.410	57.05	.001	-2.51	.73614	-3.98	-1.036

An independent samples t-test was conducted so as to be able to make a comparison in the mean sexual activity index for the respondents with prior sexual experience and that of the respondents with no prior sexual experience. There was a significant difference in the sexual activity index for the respondents with prior sexual experience (M = 16.41, SD = 3.03) and respondents with no prior sexual experience (M = 13.90, SD = 2.66) conditions; t (58) = -3.41, p = 0.001 (< 0.05). These results suggest that prior sexual experience really does have an effect on the sexual activity index. Specifically, our results suggest that respondents with prior sexual experience have higher sexual activity index compared to the respondents with no prior sexual experience.

Analysis of One Relationship between Two Categorical Variables

The boxplots above further attempts to visualize the differences in the sexual activity index based on the prior sexual experience. As can be seen, the mean sexual activity index for those with prior sexual experience is much higher compared to that of the respondents with no prior sexual experience. No outliers were observed in any of the two box plots.

Analysis of One Relationship between Two Quantitative Variables

This section sought to analyze the relationship that exists between two quantitative variables. We considered age of the respondent and the sexual activity index. Pearson correlation test was done to check on the relationship that exists between the two variables based on the coefficient that exists.

Table 8: Correlations
		Sexual Activity Index	Age of the respondents
Sexual Activity Index	Pearson Correlation	1	-.460^**
Sexual Activity Index	N	60	60
Age of the respondents	Pearson Correlation	-.460^**	1
	Sig. (2-tailed)	.000
	N	60	60
**. Correlation is significant at the 0.01 level (2-tailed).

As can be seen n table 8 above, the Pearson correlation coefficient is -0.460 and the relationship is significant at 5% level of significance (r = -0.460, p < 0.05). The negative coefficient means that there is a negative relationship between the two variables (sexual activity index and age of the respondents). Negative linear relationship means that an increase in the age of the respondent would result to a decrease in the sexual activity of the respondent while a decrease in the age would result to an increase in the sexual activity index.

A negative linear relationship can be observed between the two variables.

Regression model

To further understand how age of the respondent affects the sexual activity index, a regression equation model was constructed.The linear equation model is Where is the constant coefficient while is the coefficient for the independent variable “respondent’s age”.

The model summary table (table 9) presents the value of R, R-Square, adjusted R-Square and the standard error of the estimate. The value of R-Squared is 0.211, this basically means that 21.1% of the variation in the sexual activity index (dependent variable) is explained by the independent variable (age of the respondent). This value is quite small, implying that the larger proportion is explained by other variables outside the model.

Table 9: Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.460^a	.211	.198	2.77418
a. Predictors: (Constant), Age

The regression model was found to be fit and appropriate in predicting the sexual activity index using the explanatory variable “Age of the respondent” (p < 0.05). see table 10 below.

Table 10: ANOVA^a
Model		Sum of Squares	df
1	Regression	119.587	1
	Residual	446.373	58
	Total	565.960	59
a. Dependent Variable: Sexual Activity Index
b. Predictors: (Constant), Age

Looking at table 11 presented below, we observe that the coefficient of the intercept (constant) is 18.336; this implies that when all other factors are held constant we would expect the sexual activity index to be 18.336. Lastly, the coefficient of the explanatory variable (Age of the respondent) is -0.082; this implies that a unit increase in the age of the respondent would result to a decrease in the sexual activity index of the respondent. Similarly, a unit decrease in the age of the respondent would result to an increase in the sexual activity index of the respondent by 0.082. It is important to note that the respondent was found to be significant in the model (p < 0.05).

Table 11: Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
		B	Std. Error	Beta
1	Age of the respondent	-.082	.021	-.460	-3.942	.000
a. Dependent Variable: Sexual Activity Index

Conclusion

This study utilized data on prior sexual experience and dose of an androgen. The idea was to present statistical analysis of the dataset. Summary statistics was done to identify the nature of the dataset where it was found that the data comes from a normally distributed dataset with a mean computed to be 15.152.

The maximum and minimum sexual activity index was found to be 9.37 and 23.55 respectively while the 95% confidence interval for the sample mean showed that the range is between 14.352 (lower bound on the mean) and 15.952 (upper bound on the mean). No outliers were observed. In terms relationships, we observed that age is one of the crucial factors that influence the sexual activity behavior. Prior sexual experience was also identified as a factor that influences the sexual activity index. There is however no association between Prior Sexual Experience and Dose of Drug; Chi-Square test was found to be insignificant at 5% level of significance.

References

Cook, L., & Fleming, C. (2007). Analysis of clinic attendances by under-14s to sexual health clinics in Gwent. Journal of Family Planning and Reproductive Health Care, 33(1), 23-26.

Hubert, M., & Vandervieren , E. (2008). An adjusted boxplot for skewed distributions . Computational Statistics and Data Analysis, 52(12), 5186–5201.

John , A. R. (2006). Mathematical Statistics and Data Analysis.

Plackett, R. L. (2003). Karl Pearson and the Chi-Squared Test. International Statistical Review. International Statistical Institute (ISI), 51(1), 59–72.

Smith, A. (2009). Young people's contraception and sexual health: Report of a local needs assessment in Staveley. Journal of Family Planning and Reproductive Health Care, 27(1), 29.

Waegeman, W., & De , B. B. (2008). ROC analysis in ordinal regression learning: Pattern Recognition Letters. 29, 1–9.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Essay: Statistical Knowledge In Real Data Analysis. Retrieved from https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/statistical-knowledge.html.

"Essay: Statistical Knowledge In Real Data Analysis." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/statistical-knowledge.html.

My Assignment Help (2021) Essay: Statistical Knowledge In Real Data Analysis [Online]. Available from: https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/statistical-knowledge.html
[Accessed 07 June 2025].

My Assignment Help. 'Essay: Statistical Knowledge In Real Data Analysis' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/statistical-knowledge.html> accessed 07 June 2025.

My Assignment Help. Essay: Statistical Knowledge In Real Data Analysis [Internet]. My Assignment Help. 2021 [cited 07 June 2025]. Available from: https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/statistical-knowledge.html.

Get instant help from 5000+ experts for

Writing Rewriting Editing

Subject/course code

❮ ❯

Pages

250 words

Description (write/attach)

Attach file

I accept the T&C, agree to receive offers & updates

Have a coupon?