Give a brief introduction about the assignment, including your research question. Include a short summary of a related article with a proper citation. ?
Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What types of variable(s) is involved? Display the first 5 cases of your dataset. ?
Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What type of variable(s) is/are involved? You don’t need to display your data in this section.
Descriptive Statistics (Using Dataset 1)
- Even though the representation of females in the total workforce has enhanced but one key concern still remains i.e. the salary level difference that is visible for the two genders. This is referred to as gender gap and is supported by various evidences in the Australian context such as the recent report from WGEA (Workplace Gender Equality Agency) which highlights the salary levels of females as being 15% lower in comparison to males. Besides, a concerning aspect is that this gender gap is not only found in those occupations that are dominated by males but exists even amongst the professions where female are in majority. This gender gap exists despite the government bringing in various laws to prevent such a gap (Livsey, 2017). In wake of this background, further research is required on the provided dataset so that it could be analysed if gender gap does exist in Australia.
- The dataset one has data for 1000 taxpayers and covers various aspects such as gender, salary levels, underlying occupation and gift related deductions. The dataset 1 cannot be categorised as primary data since it has not been collected by the university but has been collected by the ATO and from there the university has obtained the data (Flick, 2015). The gender variable is categorical since it can assume two labels i.e. male and female. The occupation is captured using various codes but these have no particular arrangement possible. The data for the salary/wages is in the form of quantitative variable which is captured through the use of ratio scale. The gift related deduction would also be considered as a quantitative variable having a ratio scale measurement (Hair et. al., 2015). The dataset 1 is unique and the initial five cases are presented as follows.
- The collection of dataset 2 has been completed through the use of convenience sampling in which I contacted people whom I knew and ensured that there is fair representation of both genders so that a comparison of salary levels could be made. Even though this dataset is primary owing to self-collection but still it has some potential shortcoming which would impact the reliability of the results obtained (Hillier, 2016).. The first drawback is that the given sample can potentially be biased since random sampling has not been deployed. Also, some of the data may be wrong particularly the salary as people may have overstated considering that they were aware that this was being given to me and since I knew them personally. In this primary dataset, there are only two variables which are of interest for the given research question. One of these is the gender represented through the use of male and female labels. The variable is categorical. The other variable i.e. salary is quantitative owing to representation by numerical values (Eriksson and Kovalainen, 2015). The sample size of dataset 2 is taken as 34.
Section 2: Descriptive Statistics (Using Dataset 1)
- The requisite representation in graphical terms for the relationship between occupation and gender is summarised as follows.
The key observation which can be drawn from the graph indicated above is that the representation of the two genders is not the same or similar across professions and there is a significant difference in this regards. A case in point is of code 7 which represents drivers and machine operators. The proportion of female workers in this occupation is abysmally low. Then there are certain occupations represented with code 4 and 5 which have a disproportionate representation from females since one of the occupations is community services while the other comprises of clerical and administrative workers. Community workers require high degree of empathy and hence it is dominated by females. Similarly, administrative jobs are jobs which do not require any travel and are limited to desk only, thereby these have a preferences amongst females.
- The relationship between salary/wage and gender is graphically captured as indicated below.
The gender gap is adequately captured by the bar chart graph as shown above. For the two salary levels i.e. 0-$25000 and $ 25,000-$50,000, the proportion of females is higher than their male counterparts, thereby highlighting a disproportionate amount of women tend to have a low salary. For salary levels exceeding $ 50,000 per year, males tend to have a higher representation when compared with females. Further, the female proportion has a negative relation with salary levels since as higher salary brackets are considered, the proportion of females continues to decline. This is quite a disconcerting observation. A potential defence for the above observation could be that the proportion of men in high paying jobs tends to be higher resulting in higher average salaries. However, an issue with this argument is that the various studies highlight that gender gap also exists in occupations where female representation is higher.
- The requisite summary between salary levels and gender in numerical terms is indicated as follows.
The shocking revelation from the above numerical summary is the fact that almost 50% of the females included in the sample tend to have an annual salary level lower than $ 25,000. The representation of females in a given income group seems inversely related to the income level. This conclusion is drawn as the proportion of females continues keeps falling as the salary levels bracket tend to rise. This reflects presence of glass ceiling so that the inferior job choices are pursued by females while the higher paying and more authoritative jobs are manned by male counterparts. To an extent the above pattern can be explained by the occupational distribution between the two genders, however, it would be imprudent to attribute the complete salary gap between the two genders to only occupational distribution patterns.
- The underlying relationship between gift amount deduction and income levels is highlighted using the scatter diagram below.
The above scatterplot clearly reflects that the relationship between income and donation deduction is not significant. Further, evidence in this regards is provided by R2 value which is close to zero and hence indicative of the underlying insignificance of the association relationship between income and donation deduction (Flick, 2015). But this result is quite expected and should not lead to any surprises considering that donation would not have any significant link with the income earned and is more driven by the personal factors like the urge to donate and how closely a given individual is attached to a particular cause for which donation may be done.
Section 3: Inferential Statistics
- The salary levels corresponding to the different occupations have been contained in the sample data. Using statistical techniques, the median salary associated with the given occupational code has been calculated which indicates that the highest paying occupations as per median salaries are 2,1,7,3 in decreasing order.
For estimating the gender proportions in the occupations identified above, 95% confidence interval would be computed based on the participation levels of both the genders in the sample data. The determination of required confidence intervals for specified occupations is carried out as follows.
The result obtained above suggests that it can be concluded with a likelihood of 95% that female representation in occupation code 1 would range from 0.3123 and 0.5272 (Hillier, 2016).
The result obtained above suggests that it can be concluded with a likelihood of 95% that female representation in occupation code 3 would range from 0.0539 and 0.1879 (Hair et. al., 2016).
The result obtained above suggests that it can be concluded with a likelihood of 95% that female representation in occupation code 2 would range from 0.4425 and 0.5928 (Flick, 2015).
The result obtained above suggests that it can be concluded with a likelihood of 95% that female representation in occupation code 7 would range from 0.0000 and 0.1258 (Eriksson,and Kovalainen, 2015).
The computation of confidence interval above highlights the dismal representation of females in these occupations with high median salary. Fair representation of females is visible only in two occupations while the other two are highly male centric.
- The hypotheses considered for given hypothesis test are listed below.
Null Hypothesis (H0): p≤0.8 i.e. the representation of males in occupation code 7 is not greater than 80%.
Alternative Hypothesis (H1): p>0.8 i.e. the representation of males in occupation code 7 is greater than 80%.
The relevant test statistics for the given hypothesis test is z and a right tail test would be conducted. This has been completed in excel and the relevant output is pasted below.
The computations above clearly highlight that p value is 0.0057. For the given hypothesis test, the significance level is 5%. Considering that the p value does not exceed the significance level for this given hypothesis test, hence it would be fair to conclude that the statistical evidence would lead to rejection of null hypothesis. As a result, alternative hypothesis would be accepted (Hillier, 2016). Therefore, it would be appropriate to conclude that male representation in machine operators and drivers tends to be higher than 80%.
- The hypotheses considered for given hypothesis test are listed below.
Null Hypothesis (H0): µfemale = µmale which implies that the average salary levels do not tend to significantly vary across the two genders.
Alternative Hypothesis (H1): µfemale ≠ µmale which implies that the average salary levels do tend to significantly vary across the two genders
Considering that population standard deviation is not known in this case, hence the relevant test statistics would be T (Flick, 2015). The hypothesis test has been completed using excel and the output obtained is outlined as follows.
It is apparent from the alternative hypothesis that the test is two tailed and therefore the p value that would be considered for performing hypothesis test is two tail p value which has been derived as 0.00. For the given hypothesis test, the significance level is 5%. Considering that the p value does not exceed the significance level for this given hypothesis test, hence it would be fair to conclude that the statistical evidence would lead to rejection of null hypothesis. As a result, alternative hypothesis would be accepted (Eriksson and Kovalainen,.2015). Therefore, it would be appropriate to conclude that gender gap does exist based on the given sample data.
- The objective is to perform a hypothesis test on the basis of dataset 2 so as to check if gender gap tends to exist or not.
Considering that population standard deviation is not known in this case, hence the relevant test statistics would be T (Hair et. al., 2015). The hypothesis test has been completed using excel and the output obtained is outlined as follows.
The relevant p value is 0.1864 which is clearly higher than the assumed significance level of 5%. Thus, the available evidence does not warrant rejection of null hypothesis. Hence, the correct conclusion to be drawn is that the gender gap is not significant (Hastie, Tibshirani and Friedman, 2011).
- The analysis conducted above clearly highlights the existence of a gender gap in terms of differential salary for the two genders. Also, it is noted that males in general tend to have a higher representation in those jobs which tend to have a higher median salary. Besides, the relationship between gift/donation deduction and the salary level was not found to be significant since it is dependent more on the individual concerned than the underlying salary. Besides, the gender representations in the various occupations highlighted is found to be significantly different for the two genders.
- An area of further research would be to test for presence of gender gap in occupations which are dominated by females as it would provide irrefutable evidence for gender gap existence. This is paramount since even though this particular research study also hints at presence of gender gap but it could be attributed to the difference in the occupational representations for the two genders.
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research 3rd ed. London: Sage Publications.
Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials of business research methods. 2nd ed. New York: Routledge.
Hastie, T., Tibshirani, R. and Friedman, J. (2011) The Elements of Statistical Learning. 4th ed. New York: Springer Publications.
Hillier, F. (2016) Introduction to Operations Research. 6th ed. New York: McGraw Hill Publications.
Livsey, A (2017) Australia's gender pay gap: why do women still earn less than men? [online] Available at https://www.theguardian.com/australia-news/datablog/2017/oct/18/australia-gender-pay-gap-why-do-women-still-earn-less-than-men [Assessed at May 12, 2018]