The National Health and Nutrition Examination Survey (NHANES) is a yearly survey conducted by the US Centers for Disease Control. This question uses the nhanes.samp.adult.500 dataset in the oibiostat package, which consists of information on a subset of 500 individuals ages 21 years and older from the larger NHANES dataset
Poverty (Poverty) is measured as a ratio of family income to poverty guidelines. Smaller numbers indicate more poverty, and ratios of 5 or larger were recorded as 5. Education (Education) is reported for individuals ages 20 years or older and indicates the highest level of education achieved: either 8th Grade, 9 - 11th Grade, High School, Some College, or College Grad. The variable HomeOwn records whether a participant rents or owns their home; the levels of the variable are Own, Rent, and Other
a) Create a plot showing the association between poverty and educational level. Describe what you see.
b) Fit a linear model to predict poverty from educational level.
i. Interpret the model coefficients and associated p-values.
ii. Assess whether educational level, overall, is associated with poverty. Be sure to include any relevant numerical evidence as part of your answer.
c) Create a plot showing the association between poverty and home ownership. Based on what you see, speculate briefly about the home ownership status of individuals who responded with Other.
d) Fit a linear model to predict poverty from educational level and home ownership. Comment on whether this model is an improvement from the model in part b).
Do men and women think differently about their body weight? To address this question, you will be using data from the Behavioral Risk Factor Surveillance System (BRFSS).
The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey of 350,000 people in the United States collected by the Centers for Disease Control and Prevention (CDC). As its name implies, the BRFSS is designed to identify risk factors in the adult population and report emerging health trends. For example, respondents are asked about diet and weekly physical activity, HIV/AIDS status, possible tobacco use, and level of healthcare coverage.
The cdc.sample dataset contains data on 500 individuals from a random sample of 20,000 respondents to the BRFSS survey conducted in 2000, on the following nine variables:
– genhlth: general health status, with categories excellent, very good, good, fair, and poor
– exerany: recorded as 1 if the respondent exercised in the past month and 0 otherwise
– hlthplan: recorded as 1 if the respondent has some form of health coverage and 0 otherwise
– smoke100: recorded as 1 if the respondent has smoked at least 100 cigarettes in their entire life and 0 otherwise
– height: height in inches
– weight: weight in pounds
– wt.desire: desired weight in pounds
– age: age in years
– gender: gender, recorded as m for male and f for female
a) Create a variable called wt.discr that is a measure of the discrepancy between an individual’s desired weight and their actual weight, expressed as a proportion of their actual weight:
weight discrepancy = actual weight − desired /weight actual weight
b) Fit a linear model to predict weight discrepancy from age and gender. Interpret the slope coefficients in the model. c) Investigate whether the association between weight discrepancy and age is different for males versus females.
i. Fit a linear model to predict weight discrepancy from age, gender, and the interaction between age and gender. Write the model equation.
ii. Write the prediction equation for males and the prediction equation for females.
iii. Is there statistically significant evidence of an interaction between age and gender? Explain your answer.
d) Comment on whether the results from part c) suggest that men and women think differently about their body weight. Do you find the results surprising; why or why not? Limit your response to at most five sentences
Studies have indicated that several factors contribute to clinicians perceiving encounters with patients as difficult; such factors may relate to physicians or patients. For example, physicians may have negative bias toward specific health conditions; additionally, physicians involved in difficult encounters tend to be less experienced. Patients who exhibit personality disorders, non-adherence to medical advice, and self-destructive behaviors can also contribute to encounter difficulty
A study was conducted at a university outpatient primary care clinic in Switzerland to identify factors associated with difficult doctor-patient encounters. The data consist of 527 patient encounters total, conducted by the 27 medical residents employed at the clinic during the time of the study. After each encounter, the attending medical resident completed two questionnaires: the Difficult Doctor Patient Relationship Questionnaire (DDPRQ-10) and the patient’s vulnerability grid (PVG). The data are in difficult_encounters.Rdata, stored as the diff.enc dataframe.
The DDPRQ-10 is a survey that measures the difficulty of a patient encounter, with a higher score indicating a more difficult encounter; the maximum possible score is 60 and encounters with scores 30 and higher are considered difficult. The DDPRQ-10 score for each counter is stored as ddprq.
The PVG measures five dimensions of patient vulnerability: somatic determinants, mental health state, behavioral determinants, social determinants, and healthcare use. Each dimension has a certain number of associated characteristics; a patient receives 1 point for each characteristic. The total score within a dimension is stored as the variable ending with total, while the variable ending with bin is a binary variable where 1 corresponds to a score of 1 or greater for that dimension and 0 indicates the patient does not have any of the characteristics for that dimension.
– Somatic determinants (soma.bin, soma.total): factors related to physical impairment, such as severe chronic disease, physical disability, or pregnancy
– Mental health state (mental.bin, mental.total): factors related to mental health, such as mood disorder, post-traumatic stress disorder, or dementia
– Behavioural determinants (risk.bin, risk.total): factors related to risky behavior, such as substance abuse or physical violence
– Social determinants (social.bin, social.total): factors related to social difficulty, such as complex family situation, inadequate housing, or language barrier
– Healthcare use (health.bin, health.total): factors related to healthcare use, such as being a frequent user or lacking a primary care physician
Features of the attending medical resident were also recorded: age in years (age), sex (sex, recorded as F for female and M for male), and years of training completed (yrs.train).
a) Use graphical and numerical summaries to explore the distribution of DDPRQ-10 score. Briefly summarize your findings. How many of the encounters are classified as difficult based on DDPRQ-10 score?
b) Fit a model for the association of DDPRQ-10 score with features of the attending medical resident. Is there evidence of a significant association between DDPRQ-10 score and any of the physician features?
c) Create a plot that shows the association between DDPRQ-10 score and patient mental health vulnerability score (mental.total). Describe what you see.
d) Fit a model for the association of DDPRQ-10 score with patient mental health vulnerability score while adjusting for physician features. Interpret the slope for mental.total.
e) Repeat part d), using the binary version of patient mental health vulnerability score (mental.bin). Based on your observations in part c), do you prefer this model to the one in part d), or would you prefer a model that treats patient mental health vulnerability score as a categorical variable with several levels? Explain your answer.
f) Fit a model for the association of DDPRQ-10 score with all five dimensions of patient vulnerability (using the binary versions of the variables), while adjusting for physician features.
i. Which patient features are significantly associated with DDPRQ-10 score?
ii. Interpret the coefficients for the features in part i.
g) Comment briefly on two limitations of this study, with respect to understanding factors associated with difficult doctor-patient encounters.
In the PREVEND study introduced in Unit 6, researchers measured various features of study participants, including data on BMI and diabetes status. Obesity is a known risk factor for Type 2 diabetes.
Organizations such as the Centers for Disease Control and the World Health Organization have defined weight status categories for particular BMI rangs. A BMI below 18.5 is considered underweight, while a BMI above 18.5 and lower than 25.0 is considered healthy. A BMI above 25.0 and lower than 30 is considered overweight, while a BMI higher than 30 is considered obese.
For this problem, use prevend.samp to investigate the association between BMI weight status and diabetes status.
a) Diabetes prevalence in the United States is approximately 9.4%. Suppose that individuals in prevend.samp represent a random sample of individuals from the Netherlands. Assess whether there is evidence that diabetes prevalence in the Netherlands is different from that in the United States. Summarize your findings, including reporting and interpreting an appropriate confidence interval.
b) Run the code shown in the template to create a categorical version of the BMI variable named BMI_Cat. Since there are very few individuals with BMI below 18.5, individuals with BMI lower than 25 are grouped together.
c) Analyze the data to assess whether there is evidence of an association between BMI weight status and diabetes status. Summarize your findings.