Scatterplot and Correlation
1.For the following scatter plot, what would be your best estimate of the correlation coefficient?
2.Discuss which statistical tests to apply for different types of data and how to interpret the results.
3.Nonparametric statistical techniques are based on fewer assumptions about the population and the parameters compared to parametric statistical techniques. Discuss this statement with suitable examples and supportive references.
1.Scatterplot: The scatterplot is a way to represent the visual relationship between two quantitative variables, the visual representation indicates the strength of relationship between the variables or how they are associated. The one variable can be considered as explanatory variable and another variable can be considered as the response variable.
The positive trend of scatterplot indicates a positive association between the variables, as value of one variable increases the corresponding value of another variable also increases.
The negative trend of scatterplot indicates a negative association between the variables, as value of one variable increases the corresponding value of another variable decreases.
The no trend of scatterplot indicates a non-association between the variables. (Steve DeGroof, 2017)
Correlation: is a measure of the relationship between the two variables. It measures the strength of relationship between two or more normally distributed interval or ratio level variables. The coefficient of correlation is denoted by r, and the value of correlation coefficient lies value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.
In general, if (r) lies between 0-0.19, then the strength of relationship between two variables is very weak. If (r) lies between 0.20-0.39 then strength of relationship between two variables is weak. If (r) lies between 0.40-0.59 then strength of relationship between two variables is moderate. If (r) lies between 060-0.79 then strength of relationship between two variables is strong. And, if the value of correlation coefficient (r) lies between 0.79-0.99 then it can say that the strength of relationship between two variables is very strong. (Patricia Cohen, Stephen West and Leona S. Aiken, 2014)
According to the provided scatterplot, as the value of the variable on the horizontal axis increases very strongly, the corresponding value of the variable on the vertical axis decreases very strongly. Thus, there is a very strong negative association between the variables. So, the value of correlation coefficient will lies between. The estimated value of the correlation coefficient will be about
Different Types of Data and Statistical Tests
2.The data can be categorized as qualitative or quantitative, the qualitative data can be further categorized as binary level, nominal level or the ordinal level of measurements and the quantitative data can be further categorized as interval or ratio level measurements. (Renata Tesch, 2013)
The statistical tests applies on the basis of research design, type of variable and the distribution of data. If the data is distributed normally, then the parametric tests used for analysis, and if the data is non-normally distributed then the non-parametric statistical tests used for the analysis.
The parametric test: The parametric tests makes assumptions about the parameters of the population distribution from which the sample data is drown. The parametric tests applied on the ratio or interval types of data in which data is normally distributed. (Lorena Madrigal, 2012)
The parametric test are defined as follows:
The Pearson correlation (Association test) test applied to test whether there is a relationship in the population. It is applied on the quantitative samples, the two variables should be measured in the interval or ratio level. The results of the test indicates whether the population correlation coefficient is 0 or not.
Example: To know relationship between the two thin rice populations are different genetically. So, Pearson correlation coefficient will indicate whether there is a positive/ negative or no relationship between the thin rice categories. (Gravetter Frederick and Wallnau Larry, 2010)
Simple linear regression and Logistic regression: The simple linear regression analysis used on the quantitative types of data, it is used to predict the value of response variable from explanatory variable.
Example: To predict the weight of a person by using the age of that person. Thus, the linear regression will indicate the predicted weight of that person for the particular age. (Iain Pardoe, 2013)
The logistic regression used on the quantitative types of data, it is used to predict the value of response variable from explanatory variable.
t-test: The t-test applied to test whether the mean of a sample is a representative of the population or not. The t-test applied on the quantitative samples, the variables should be measured in the interval or ratio level, the sample size should be less than 30 and the population standard deviation is unknown. (Cole Davis, 2013)
The 1-sample t-test applied to test whether a sample mean is equal to or not equal to the population mean, it is applied on the quantitative type of data.
Parametric Statistical Tests
Example: To test whether the average mathematics score of class X is differs from 90.
The 2-sample t-test for independent samples applied to test whether a sample means are equal to each other or not, it is applied on the quantitative types of data.
Example: To test whether the mean weight of males and females candidates of age group 50-60 are statistically significant different from each other.
The Pair-sample t-test for dependent samples applied to test whether the mean difference between the sample values is equal to 0 or not, it is applied on the quantitative types of data.
Example: To test whether the mean sugar level difference of 40 patients who took a drug before treatment is equal to the sugar level of after treatment.Z-Test: The z-test applied to test whether the mean of a sample is a representative of the population or not. The z-test applied on the quantitative samples, the variables should be measured in the interval or ratio level, the sample size should be greater than 30 and the population standard deviation is known.
Example: A sample of 100 students has a mean score is 12 and the population standard deviation is 2. Test the hypothesis that the population mean score is 13.2 at 5% of level of significance.
ANOVA: Analysis of variance is used to test more than two sample means measured in quantitative level of measurements. (Maxwell Roberts and Riccardo Russo, 2014)
One-Way ANOVA: The one way analysis of variance is used to tests whether there is significant relationship between the means of unrelated groups which have more than two levels
Example: To compute the mean difference associated with difference in criminal record of three groups.
Two-Way ANOVA: The two way analysis of variance is used to tests whether there is significant relationship between the two unrelated groups which has more than two levels.
Example: To know whether there is difference in occupation stress according to the age groups of 20-30, 30-40 and 40-50.
Nonparametric test: The nonparametric tests makes fewer assumptions about the parameters of the population distribution from which the sample data is drown. The nonparametric tests applied on the binary (Two category), nominal (More than two categories) and ordinal (The orders of categories) types of data in which data is non-normally distributed. (Mark Harmon, 2011)
The nonparametric test are defined as follows:
The Spearman’s correlation coefficient (Association test) test applied to test whether there is a monotonic relationship between the paired data. It is applied on the qualitative samples, the two variables should be measured in the ordinal level. The dependent variables is measured in ordinal/continuous level and the independent variable is measured in ordinal/continuous level. (Jesse Russell and Ronald Cohn, 2012)
Non-Parametric Statistical Tests
Run test: It is a test of randomness and used to check whether the order of observations is random or not. It is applied for large and small samples.
Example: To check the succession of observations which have two characteristics as success and failure.
Mann-Whitney U-Test: The test applied on the qualitative independent samples, the variables should be measured in the ordinal level and independent to each other. The test applied for samples which are less than 10 or greater than 10. The dependent variables is measured in ordinal/continuous level and the independent variable is measured in binary level. (Ken Black, 2009)
Wilcoxon matched-pair sign ranked test: It is used when the samples are dependent and the data is ranked or ordinal. This test is an alternative of matched pair t-test when the normality assumptions are not met. The dependent variables is measured in ordinal/continuous level and the independent variable is measured in binary level. (Chris Spatz, 2010)
Chi-Square test: The chi-square test applied on the two or more nominal/ordinal level variables which have qualitative data. This is used to test whether there is association between the variables and also use to test the goodness of fit. (Alan Agresti, 2013)
Kruskal- Wallis test: It is an alternative of one-way ANOVA analysis when the data is not normally distributed and measured ordinal level. The dependent variables is measured in ordinal/continuous level and the independent variable is measured in binary level. (Deborah Rumsey, 2007)
Friedman test: It is an alternative of randomized block design when the data is not normally distributed and measured ordinal level. In this test the blocks are independent and there is no interaction between the blocks and the treatments. The dependent variables is measured in ordinal/continuous level in at least 3 different situations and the independent variable is measured in binary leve
3.The parametric tests makes fewer assumptions about the parameters of the population distribution from which the sample data is drown and used when the data is not normally distributed. The nonparametric tests applied on the binary (Two category), nominal (More than two categories) and ordinal (The orders of categories) types of data in which data is non-normally distributed. The basic nonparametric test are:
- The Spearman’s correlation coefficient (Association test).
- Run test
- Mann-Whitney U-Test:
- Wilcoxon matched-pair sign ranked test.
- Chi-Square test
- Kruskal- Wallis test.
- Friedman test.
The above test are based on the distribution free statistics and used with nominal or ordinal data.
The conditions of the nonparametric test are given as below:
- The data should not be normal.
- When the data is ordinal or ranked.
Hence, the provided statement is true, that is “Nonparametric statistical techniques are based on fewer assumptions about the population and the parameters compared to parametric statistical techniques”.
References:
Agresti Alan. Categorical Data Analysis. John Wiley & Sons 2013.
Black Ken. Business Statistics: Contemporary Decision Making. John Wiley & Sons, 2009.
Cohen Patricia, West Stephen and Aiken Leona. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Phycology press, 2014.
Davis Cole. SPSS for Applied Sciences: Basic Statistical Testing. Csiro Publishing, 2013.
Frederick Gravetter, Larry Wallnau. Essentials of Statistics for the Behavioral Sciences. Cengage Learning, 2010.
Harmon Mark, Nonparametric Testing in Excel-The Excel Statistical Master. Mark Harmon, 2011.
Jesse Russell, Ronald Cohn. Spearman’s rank correlation coefficient. 2012.
Madrigal Lorena, Statistics for anthropology. Cambridge university press: Science, 2012.
Pardoe Iain. Applied Regression modeling. John Wiley and Sons, 2013.
Roberts Maxwell, Russo Riccardo. A Student's Guide to Analysis of Variance. Routledge, 2014.
Rumsey Deborah. Intermediate Statistics For Dummies. John Wiley & Sons, 2007.
Spatz Chris. Basic Statistics: Tales of Distributions. Cengage Learning, 2010.
Steve, DeGroof. Lulu.com. Fiction, 2017.
Tesch Renata, Qualitative Research: Analysis Types and Software. Routledge: Education, 2013
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2019). Understanding Scatterplot, Correlation & Statistical Testing. Retrieved from https://myassignmenthelp.com/free-samples/categorical-data-analysis.
"Understanding Scatterplot, Correlation & Statistical Testing." My Assignment Help, 2019, https://myassignmenthelp.com/free-samples/categorical-data-analysis.
My Assignment Help (2019) Understanding Scatterplot, Correlation & Statistical Testing [Online]. Available from: https://myassignmenthelp.com/free-samples/categorical-data-analysis
[Accessed 25 November 2024].
My Assignment Help. 'Understanding Scatterplot, Correlation & Statistical Testing' (My Assignment Help, 2019) <https://myassignmenthelp.com/free-samples/categorical-data-analysis> accessed 25 November 2024.
My Assignment Help. Understanding Scatterplot, Correlation & Statistical Testing [Internet]. My Assignment Help. 2019 [cited 25 November 2024]. Available from: https://myassignmenthelp.com/free-samples/categorical-data-analysis.