The first question uses educ and sibs1 (both quantitative variables). educ ranges from 0 to 20 and captures the respondent’s educational attainment in number of years. We recoded sibs into sibs1 where 15 - 34 siblings are top-coded as 15 siblings. Use sibs1 for the rest of this assignment.
1.Generate a frequency table showing the distribution of the top-coded number of siblings variable (sibs1). Briefly describe. Interpret (in words) the relative frequency for having 2 siblings in terms of a ratio of some sort. (2-3 sentences) [5 pts]
2.Generate a histogram showing the distribution of respondent's educational attainment in number of years (educ). Paste your output. Briefly describe. (2-3 sentences). [5 pts]
3.Generate the mean, standard deviation, median and range for respondent education, mother’s education, and father’s education, by gender. Place results in a table and include the table in your response (you can copy/paste output directly from Stata or other software programs). Briefly describe key patterns and whether or not there are any noticeable differences by gender. (~3 sentences) [8 pts]
4.Perform an independent samples t-test to answer the question, do women have, on average, statistically significantly different education than men? Discuss key results. (2-3 sentences) [5 pts]
5.Perform another t-test to examine whether women have, on average, statistically significantly less income than men? Report significance level. Briefly interpret the 95% confidence interval. (2-3 sentences). [5 pts]
If you wish, prior to answering the next two questions, you can use the tabulate command to inspect the categories and relative frequencies for fehire [tab fehire]and eqwlth [tab eqwlth]. (Do not include in your responses.)
6.Analyze the relationship between the gender (sex) and views about female hiring and promotion in the workplace (fehire). Report the chi-square test statistic. Report key results and interpret the significance level (possible levels are “not significant”, p<.05, p<.01, or p<.001). (2-3 sentences) [6 pts]
7.Now, analyze the relationship between gender and the government’s role in reducing income inequality, as measured by the eqwlth variable (7-categories ranging from 1 to 7; 1=govt should reduce income differences; 7=no govt action). The eqwlth captures the level of agreement with the following statement: Government should reduce income differences.
8.Let's analyze bivariate relationships between sibship (sibs1), respondent's education (educ), and parent's education (maeduc and paeduc). Generate a correlation matrix. Paste your output. Briefly interpret. Should we be concerned about potential multicollinearity with a regression model that includes both father's and mother's education? Why or why would we not want to test for that? (3-4 sentences). [8 pts]
9.Next, perform a bivariate regression of education on number of siblings (x1=sibs1). Paste the output. (Each part can be answered in 1 sentence). [10 pts]
10.Let’s analyze how number of siblings and mother's education are related to educational attainment. This model simply adds mother’s education (x2=maeduc) to our previous model (in Question 9). Paste the output. [5 pts]
11.Perform a multiple linear regression model of respondent’s income in constant dollars (realrinc) on education and gender. Paste the output. [8 pts]
12.Now, we'll run the same model that we just performed in Question 10 except that we’ll now treat mother's education as a categorical variable rather than one measured in number of years of education. macoldeg is a binary variable coded 1 if the respondent's mother received a BA degree or higher and coded as 0 if otherwise. [5 pts]
13.Now, let's use a 4-category measure for mother's education; this one categorizes by highest degree.
14.Finally, let’s add father's education (x3=paeduc) to the model we ran in earlier in Question 10 where we treated mother’s educational attainment as a quantitative variable (maeduc).
15.Find the standardized partial coefficients for the previous model. (each part 1-2 sentences)
16.Let’s say that we are analyzing the relationship between X and Y. Even though the relationship between the two variables is linear, we notice that the predicted Y is higher for secondary vs elementary grade students from high-SES families, but lower for secondary vs elementary students from lower-SES families. How is this possible? If we were modeling this relationship in OLS regression, what would we need to include? (2-3 sentences) [5 pts]