A researcher has collected data for 7681 patients who had a procedure performed at Clinic ABC in 2009 that serves the Greater Toronto Area regions (Excel file: Fall Assignment Data.xls). The data includes the unique patient identifier, sex (male = 1; female = 2), age (years) and patient region (Durham = 1; Halton = 2; Peel = 3; York = 4). Â The researchers also collected some lifestyle data including smoking frequency (every day = 1; some days = 2; not at all = 3) and whether the patient eats herbal supplements (yes = 1; no = 2). Â The researcher also collected data on the number of times in a week the patient performs vigorous activity (where 99 is the user-defined missing value with the label âunable to do vigorous activity), the number of times in a week they perform moderate activity (where 99 is the user-defined missing value with the label âunable to do moderate activity), and the number of times in a week they perform strength activity (where 99 is the user-defined missing value with the label âunable to do strength activity). Â Note that for these three activity variables, there are also system-missing values in the dataset. Â Finally, we have the procedure order date/time, procedure completion date/time and the procedure score. Â
Before proceeding with any analysis, be sure to:
⢠  Ensure that variables are of correct measure (nominal, ordinal, scale).  Remember to also do this for any newly created variables
⢠  Add labels to all categorical variables (remember to also do this for any newly created categorical variables)
⢠  Add missing value labels for the three activity variables (this will be covered in Lab 11)Â
⢠  For the procedure order and completion date/time variables, in the variable view, under column âtypeâ, ensure that the Date format is set to dd-mmm-yyyy hh:mm.
1. Â Â Generate Variable Information and Variable Values tables for the dataset. Â Copy and paste these generated SPSS tables. Â Note, while you already know how to generate these tables, you may want to hold off on this question until you learn more about missing values so that you donât have to generate these tables twice.
2. Â Â Create a new variable, âAge_cat1â that groups the variable âAgeâ into two categories: <65yrs and ?65 yrs. Â Copy and paste the screenshots of the SPSS window(s) used to show how you created the new variable. Â Remember to add labels to all values. Â Generate a pie chart of this new variable where the slices represent proportions belonging to each category. Â Show data labels to display the percentages in each slice (round to 2 decimal places and show the % symbol). Â Copy and paste the generated SPSS pie chart.
3. Â Â Create a new variable, âSmoking_binaryâ that groups the variable âSmoking Frequencyâ in two categories: yes (combines categories every day and some days) and no (category not at all). Â Copy and paste the screenshots of SPSS window(s) used to show how you created the new variable. Remember to add labels to all values. Â Generate a pivot table that includes the observed counts of âSmoking_binaryâ (placed as rows) by âRegionâ (placed as columns), layered by âSexâ. Copy and paste the generated SPSS pivot table. Â Answer the following question: Â How many female Halton region residents smoke? Â Â
4. Â Â Create a new variable called âAge_cat2â that groups the variable âAgeâ into the following: 18-24 yrs, 25-44 yrs, 45-64 yrs, 65+ yrs. Â Generate a stacked bar graph such that each bar represents a different âAge_cat2â category and the stacks represent the percentage of the various categories within the âSmoking Frequencyâ. Â Order the stacks such that the âevery dayâ category is the bottom or base stack and the ânot at allâ category is the top stack. Â Add data labels (1 decimal place with % trailing character). Â Copy and paste the generated SPSS stacked bar graph.
5. Â Â Create a new variable âzProcedurescoreâ that is the standardized variable of âProcedure Scoreâ. Â But, before you generate the z-scores for âProcedure Scoreâ, ensure that the variable is not markedly non-normal by copying and pasting the appropriate generated SPSS output table and showing your calculations to determine skewness. Â After youâve created the new variable âzProcedurescoreâ, generate case summaries of the variables âProcedure Scoreâ and âzProcedurescoreâ for the first 10 cases in your dataset. Â Copy and paste the generated SPSS table and answer the following question: Â Is case 1âs procedure score (.1578) above or below the mean procedure score? Â Justify your answer with the Case Summary table.
6. Â Â Create a new variable ârndScoretwodecimalâ that rounds the âProcedure Scoreâ to its nearest second decimal place. Â Copy and paste the SPSS compute variable window used to create this new variable. Â Generate a frequency histogram for this new variable paneled by âHerbal Supplementsâ (rows) and overlaying the normal curve. Â Set the bin width = 0.1. Â Display data labels in each bar to add clarity. Â Copy and paste the generated SPSS frequency histograms. Â
7. Â Â Create a new variable âYrs_till_retirement_peelâ that is the number of years until a Peel region patient retires with the condition that only Peel patients that are younger than retirement age (retirement age begins at age 65 yrs) are selected. Â Copy and paste the SPSS compute variable window used to create this new variable and highlight/place a box around thexpression used to create this variable. Â Generate a boxplot graph of âYrs_till_retirement_peelâ which contains the boxplots for males and females side-by-side. Â Copy and paste the generated SPSS boxplot graph. Â
8. Â Â Create a new variable called âProcedure_durationâ that is the length of time between the procedure order date/time and procedure completion date/time, measured in hours with the fractional part retained. Copy and paste the screenshots of the SPSS window(s) used to show how you created the new variable. Generate a descriptive statistics table for âProcedure_durationâ that includes the mean, standard deviation, median, quartiles, 80th percentile and skewness. Â Copy and paste the generated SPSS output table. Â
9. Â Â Create a new variable called âProcedure_order_hourâ that extracts the hour that the procedure was ordered. Â Copy and paste the screenshots of the SPSS window(s) used to show how you created the new variable. Generate a frequency table for âProcedure_order_hourâ. Â Copy and paste the generated SPSS output table. Â Answer the following question: Â What is the mode of âProcedure_order_hourâ (state using AM or PM).
10. Â Â Generate a frequency table for âWeekly Vigorous Activityâ. Â Copy and paste the generated SPSS output table. Â Conduct a Missing Values Analysis with EM estimation for the variables âWeekly Vigorous Activityâ, âWeekly Moderate Activityâ and âWeekly Strength Activityâ. Â Are values missing completely at random? Justify your answer by copying and pasting the appropriate SPSS output table to support your answer.Â
e (1) numeric expression and (2) conditionalÂ