- Overview Of The Assignment
This assignment will test your skill to collect and analyse data to answer a specific business problem. It will also test your understanding and skill to use statistical methods to make inferences about business data and solve business problems, including constructing hypotheses, test them and interpret the findings.
- TaskDescription: Written Report
There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below.
Dataset 1: You will receive an email that contains a dataset that is specifically allocated to you. This dataset is a subset of 2013-2014 individual sample file, provided by the ATO and has been edited to only include a subset of the cases and variables. The original dataset can be obtained from https://data.gov.au/dataset/taxation-statistics-individual-sample-files, and it is under the license of Creative Commons Attribution 3.0 Australia. Data dictionary of the edited dataset is given in the following table.
Variable |
Description |
Values |
Gender |
Gender (sex) |
Female or Male |
Occ_code |
Salary/wage occupation code |
0 = Occupation not listed/ Occupation not specified |
0 = Occupation not listed/ Occupation not specified 1 = Managers 2 = Professionals 3 = Technicians and Trades Workers 4 = Community and Personal Service Workers 5 = Clerical and Administrative Workers 6 = Sales workers 7 = Machinery operators and drivers 8 = Labourers 9 = Consultants, apprentices and type not specified or not listed |
||
Sw_amt |
Salary/wage amount |
All numeric |
Gift_amt |
Gifts or donation deductions |
All numeric |
Dataset 2: Collect data (e.g. via a survey) that will answer your research question. There is no requirement about the number of variables, sampling methods and sample size, but you need to justify your approaches in Section 1 (see below).
Both datasets should be saved in an Excel file (one file, separate worksheets). All data processing should be performed in Excel or Statkey (http://www.lock5stat.com/StatKey).
Prepare a report in a document file (.doc or .docx) which includes all relevant tables and figures, using the following structure:
- Section 1: Introduction
- Give a brief introduction about the assignment, including your research question. Include a short summary of a related article with a proper citation.
- Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What types of variable(s) is involved? Display the first 5 cases of your dataset.
- Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What type of variable(s) is/are involved? You don’t need to display your data in this section.
- Section 2: Descriptive Statistics Use Dataset 1
- Using suitable graphical display, describe the relationship between the variables Gender and Occ_code for Dataset 1. Make sure your graph shows the distribution of Gender for each Occ_code.
- Using suitable graphical display, describe the relationship between the variables Gender and Sw_amt.
- Using suitable numerical summary, describe the relationship between the variables Gender and Sw_amt.
- Using suitable graphical display, describe the relationship between the variables Sw_amt and Gift_amt.
- Section 3: Inferential Statistics Use Dataset 1
- List top 4 occupation based on median salary and find the proportion of the gender of those top 4 occupation.
- Perform a suitable hypothesis test at a 5% level of significance to test whether the proportion of machinery operators and drivers who are male is more than 80%.
- Perform a suitable hypothesis test at a 5% level of significance to test whether there
is a difference in salary amount between gender.
Use Dataset 2
- Perform a suitable statistical analysis on dataset 2 (the one you collected) that will answer your research question.
- Section 4: Discussion & Conclusion
- What can you conclude from your findings in the previous sections?
- Give a suggestion for further research
1.a)
This assignment highlights the techniques and skills of collecting and analysing the data set by “MS-Excel” software. The assignment deals with both primary and secondary data set. Various types of statistical methods are used in this analysis for having descriptive and inferential statistics.
The data set gathered from the “Australian Taxation Office (ATO)” would help the viewers to discover the “Gender gap” of salaries and wages. The proposed cause for is the discrimination of hiring in the job “Gender gap”. It is also accused that the salary and occupation preference change according to the various gender that vary significantly. The data set abridges the analysis of job profile of males and females of various industries. The researcher regarded the data file as main data file that includes totally 1000 samples.
For proving the validity of first data set according to the sampling method, the researcher collected another data set by simple random sampling process. The research questions established for the second data set is that-
- “Is there any difference of the mean salary amount between “Males” and “Females” in second data set?”
- “Are the proportion of “Males” and “Females” having job to the occupation “Machinery Operators and drivers” equal or nor?”
1. b)
The data set provided in datasheet is actually a “Secondary data” which collected by Australian Taxation Office (ATO) at the time of sorting many types of detailed information in the lodgement session 2013-2014.
The analysing data set have four factors that are- 1) “Gender”, 2) “Occupation code”, 3) “Salary and Wage amount” and 4) “Gift amount”. Here, the “Numerical” or “Quantitative” factors are – 1) “Salary and Wage amount” and “Gift amount”. The qualitative variables are – 1) “Gender” and 2) “Occ_code”. “Gender” is a “Categorical” variable having two levels “Males” and “Females”. “Occupational code” is actually a “Categorical” variable that is transformed in “Numerical” variable.
Table 1: The first five cases of data set 1 are shown below
1. c)
“Random Sampling Method” helped to collect the second data set. The researcher surveyed it from the 100 international students who are dwelling in Australia. The data is primary data. The data set has mainly three variables that are “Gender”, “Sw_amt” and “Occ_code”. “sw_amt” is quantitative variables and other two variables are categorical variables.
2. a)
Figure 1: Visualization of “Occupational code” and “Gender”
“Female” employees mainly like the job of two occupations that are “Clerical and Administrative workers” and “Professionals”. “Male” employees like the job of two occupations such as “Technicians and Trade Workers” and “Professionals”. The number of female workers is least in the profession of “Machinery Operators and drivers”. The number of male workers is least in the occupation of “Sales worker”.
Research questions
Figure 2: Visualization of “Gender” and “Salary and Wage amount”
The amount of mean salary and standard deviation of salaries are higher for “Male” employees than “Female” employees.
Figure 3: Histogram of “Salary and wage amount” of “Males”
(Jelen 2010)
Figure 4: Histogram of “Salary and Wage amount” of “Females”
Table 2: Table of “Numerical Summary” of “Salary and Wage amount” gender wise
- The lowest salary of both males and females are 0.
- The highest salaries of males and females are equal (308183).
- Males earn higher salary and wages than females as an average (48181.46 > 33841.72).
- The standard deviation of salaries of males is greater than the females in (46863.41>33428.35).
- The total amount earned by males are much greater than females (25150721>16176341).
Figure 5: Scatter plot of “Gift amount” and “Salary and Wage amount”
As per the scatter plot, two quantitative variables “Salary and wage amount” and “Gift amount” have no linear relationship between themselves.
Table 3: The “Median” salaries of top four “Occupations”
Table 4: Pivot table of “Gender” and “Occupations with top four median salaries”
(Jelen and Alexander 2010)
As per higher median salaries the top four occupations are –
- “Managers”.
- “Professionals”.
- “Machinery operators and drivers”.
- “Technicians and trade workers”.
The proportions of “Male” and “Female” in all 4 types of occupations are 0.65 and 0.35. The proportions of “Male” and “Female” in all 4 types of occupations are 0.65 and 0.35. The proportions of “Male” and “Female” in all 4 types of occupations are 0.65 and 0.35. The proportions of “Male” and “Female” “Managers” are 0.64 and 0.36 respectively. The proportions of “Male” and “Female” “Professionals” are 0.47 and 0.53 respectively. The proportions of “Male” and “Female” “Technicians and Trade Worker” are 0.81 and 0.19 respectively. The proportions of “Male” and “Female” “Machinery operators and driver” “Male” and “Female” are 0.96 and 0.04 respectively.
Hypotheses:
Null hypothesis (H0): “The proportion of “Male” employees in the profession of “Machinery operators and drivers” is 0.8”.
Alternative hypothesis (HA): “The proportion of “Male” in the profession of “Machinery Operators and drivers” is higher than 0.8”.
Table 6: “One sample proportional Z-test”
Among 46 employees who work as “Machinery Operators and drivers”, 44 employees are “Male” and rest of 2 are “Female”. The calculated proportion of males is 0.9565. As per margin of error, the proportion of male “Machinery operators and drivers” varies in the interval of 0.89759 to 1 with 95% probability. One sample proportional z-test assuming “Level of significance” = 0.05 is applied to prove the hypothesis. The calculated Z-statistic is found 2.654. The value of Z-statistic at critical level (95%) = 1.959964. As, 2.654 > 1.959964, therefore, “Z-calculated” is greater than “Z-critical”. Therefore, the analyst can reject the null hypothesis with 95% probability (Lehmann and Romano 2006). Conversely, the alternative hypothesis is not rejected.
Data Analysis
Conclusion: It can be concluded that the proportion of males in the profession of “Machinery Operators and drivers” is higher than 80%.
Hypotheses:
Null hypothesis (H0): “The difference of mean amounts of “Salary and Wage” of “Male” and “Female” is equal to 0”.
Alternative hypothesis (HA): “The difference of mean amounts of “Salaries and Wages” of “Male” and the mean amounts of “Salary and Wages” of females is not equal to 0”.
Table 7: Table of “Two-sample t-test assuming unequal variances”
(Yuen 1974)
Among1000 sampled peoples, 478 are “Females” and 522 are “Males”. The mean “Salary and Wage” of females is 33841.72 and the mean “Salary and Wage” of males is 48181.46. Two samples t-test with unequal variances is applied to test the null hypothesis at “Level of significance” = 0.05. The calculated t-statistic is found as (-5.605) with 943 degrees of freedom. The calculated p-value = 0.0. It could be decided that as the calculated two-tailed p-value of the t-statistic is lesser than 5%, hence, the null hypothesis with 95% probability could be rejected. On the other hand, the alternative hypothesis is failed to reject.
Conclusion: It could be concluded that that the mean “Salary and Wage amount” of “Male” employees is higher than the mean “Salary and Wage amounts” of “Females” with 95% probability.
Hypotheses:
Null hypothesis (H0): “The difference of mean “Salary and Wage” amounts of “Male” employees and the mean “Salary and wage” of “Female” employees are 0 for the surveyed data”.
Alternative hypothesis (HA): “The difference of mean “Salary and Wage” amounts of “Male” employees and the mean “Salary or wages” of “Female” employees is unequal to 0 for the surveyed data”.
Table 8: Tables of two-sample t-test assuming unequal variances
The second data set has a total of 100 samples. The mean “Salary and Wage” of “Male” employees is 61421.02 and the mean “Salary and Wage” amount of “Female” employees is 333423.79245. The “Two samples t-test” is applied to find the differences of two averages at “Level of significance” = 0.05. The calculated t-statistics is found to be (-2.714) with 64 degrees of freedom (Romano and Lehmann 2005). The calculated two tailed p-value is (0.008). The two-tailed p-value is lesser than 5%. Hence, the analyst can reject the null hypothesis with 95% possibility.
Conclusion: It can be concluded that the mean “Salary and Wage” amounts of “Male” employees is greater than the mean “Salary and Wage” amounts of “Female” employees with 95% probability.
Hypotheses:
Null hypothesis (H0): “The difference of proportions of the males and females working as “Machinery drivers and operators” is 0”.
Alternative hypothesis (HA): “The proportion of the “Male” employees working as “Machinery Drivers and operators” is higher than the proportions of the females working as “Machinery drivers and operators””.
Table 9: Table of two-samples proportions Z-test
Out of 53 females only 2 (proportion = 3.774%) work as “Machinery drivers and operators” and out of 47 males only 6 (proportion = 12.766%) work as “Machinery drivers and operators”. Two samples proportional Z-test is applied to find the differences between the proportions (Panik, 2012). The Calculated proportion generates the “Z-statistic” = 1.654. The “Z-critical” with 5% degrees of freedom is found to be 1.9599 which is greater than calculated Z-statistic (Cressie and Whitford 1986). Therefore, the test has no significance. The researcher can reject the null hypothesis.
Conclusion: It could be concluded that the proportion of males in the profession of “Machinery drivers and operators” is greater than the females.
Section 4: Discussion and Conclusion
4. a)
The inherent facts that came into light by the analysis -
- Females prefer to work as “Clerical and administrative staffs” and “Professional employees”.
- Males prefer to work as “Technicians and trade workers” and “Professional employees”.
- Out of considered nine occupations, the “Salary and Wages” are greater according to the “Median” values in the professions- 1) “Manager” and 2) “Professionals”, “Machinery operators and drivers” and “Technicians and Trades Workers” respectively.
- The amount of “Salary and Wage” is found higher for males than females.
- The graphical visualization of the previous facts is supported by inferential decisions such as “Testing of Hypothesis”.
- This inference is also validated by surveyed data set, as the significant difference of mean salaries of “Male” and “Female” employees is attained in primary data analysis.
- The occupation “Machinery Operator and drivers” is majorly subjected by “Male” employees as the proportion of “Male” employees is higher than 80%.
- Hence, according to the analysis it is found that “Occupation” type as well as “Salary and wage amount” has a vital role to determine “Gender discrimination”.
4. b)
The future scope of the research is that-
- The number of samples of surveyed data could be more (almost 1000) that can be directly compared to the secondary data set.
- The reasons behind the amount of “Salary and wages” of males and females or both could be detected and distinguished. These variables and parameters should be included in the data sets.
- More information could be mined from the dataset if the samples regarding the variables “Age”, “Years of experience”, “Educational level” and “Monthly working hours” are included in the data set.
References:
Cressie, N.A.C. and Whitford, H.J., 1986. How to Use the Two Sample t?Test. Biometrical Journal, 28(2), pp.131-148.
Jelen, B. and Alexander, M., 2010. Pivot Table Data Crunching: Microsoft Excel 2010. Pearson Education.
Jelen, B., 2010. Charts and Graphs: Microsoft Excel 2010. Que Publishing.
Lehmann, E.L. and Romano, J.P., 2006. Testing statistical hypotheses. Springer Science & Business Media.
Panik, M.J., 2012. Testing Statistical Hypotheses. Statistical Inference: A Short Course, pp.184-216.
Romano, J.P. and Lehmann, E.L., 2005. Testing statistical hypotheses.
Yuen, K.K., 1974. The two-sample trimmed t for unequal population variances. Biometrika, 61(1), pp.165-170.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2020). Analysis Of Gender Gap In Salaries And Wages Using Statistical Methods. Retrieved from https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/skills-of-collecting-data-set-by-ms-excel-software.html.
"Analysis Of Gender Gap In Salaries And Wages Using Statistical Methods." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/skills-of-collecting-data-set-by-ms-excel-software.html.
My Assignment Help (2020) Analysis Of Gender Gap In Salaries And Wages Using Statistical Methods [Online]. Available from: https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/skills-of-collecting-data-set-by-ms-excel-software.html
[Accessed 26 December 2024].
My Assignment Help. 'Analysis Of Gender Gap In Salaries And Wages Using Statistical Methods' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/skills-of-collecting-data-set-by-ms-excel-software.html> accessed 26 December 2024.
My Assignment Help. Analysis Of Gender Gap In Salaries And Wages Using Statistical Methods [Internet]. My Assignment Help. 2020 [cited 26 December 2024]. Available from: https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/skills-of-collecting-data-set-by-ms-excel-software.html.