This assignment is a computer-based exercise using the data we have collected from our first and second face-to-face (F2F) sessions. As stated at the data collection session, this survey is anonymised, to be used only for teaching purposes within this course and will not be distributed anywhere else.

Students who came to the de-briefing sessions were asked the following 5 questions (See table below) twice. These questions are intended to determine Manchester Dentistry-PG students’ perception of statistics at the beginning of the course, and how they changed once the course started. We take all PG students registered for the 2019-20 course as a sample; however, not everybody filled in the questionnaire. The questionnaires have been coded into a dataset (“Assignment2Data.csv”) by the following rules. For those who did not provide answers, a missing entry coded as “NA” will be created. The variables collected from the 1st F2F have a name “xxxx1” ending with the digit “1”; and from the 2nd F2F will have a name “xxxx2” ending with the digit “2”.

You are allowed to use any statistical software (or calculator) familiar to you to complete the exercises. While requiring you to demonstrate evidence of computer competency, you should focus more on correctly presenting and interpreting the data/analysis/results. The dataset “Assignment2Data.csv” can be downloaded from blackboard.

Understand the data [11]

How many students have been enrolled onto the course? How many students had responded to the first wave of the questionnaire collected in the first F2F session? How many students had responded in the second wave of data collection?

Create a subset dataset, which contains all students who had provided answers to both waves of the survey (1st and 2nd F2F Session). Name this new dataset “COMPLETE”. How many students had completed both waves of the questionnaire? How many variables are there in the COMPLETE dataset?

The COMPLETE dataset will be used for the rest of this assignment. Before starting any statistical analysis, describe one strategy you will use to check this dataset. Identify any unusual data entry or value if there is any. Propose a solution to sort out the problem.

Descriptive statistics/Presenting data [22]

For the list of variables collected from the first F2F: "Gender1" "LikeStat1" “ConfStat1” "Exercise1" "UseStatJob1”

Identify each of these variables’ data types (e.g. Qualitative/Quantitative; Discrete (categorical)/continuous/Binary/ordinal/etc.).

State the most suitable summary statistics to explore each of these variables.

Identify the most appropriate potential plots/figures to explore each of these variables.

Produce a table with relevant summary/descriptive statistics, stratified by gender, to describe the current cohort of PG dentistry students and their attitude towards statistics before the course begins. Write a paragraph (no more than 250 words) to describe the data and any findings from this table. This table should contain the variables: "Gender1", "LikeStat1", "Exercise1", "UseStatJob1”, “ConfStat1”.

Cross-tabulate the variables “UseStatJob1” and “UseStatJob2”. Include the table in your report.

What is the proportion of PG students who report that they are likely to use statistics in their job before the DENT70001Biostatistics course started? What is the proportion of PG students who report that they are likely to use statistics in their job after the course has started? What is the changes in proportions since the course started?

One is interested in learning if students are more willing to use statistics in their career after attending the DENT70001 course. Conduct a statistical test to examine: Is there sufficient evidence that PG students’ had changed their opinions in using statistics in their future career before and after the DENT70001 course?

Write down the null and alternative hypotheses, conduct the analysis, report the estimates, and write a paragraph regarding your interpretation.

This question concerns the Confidence in using Statistics

A researcher would like to know the gender difference in confidence in using statistics before PG students start this any biostatistics course (using “ConfStat1”, “Gender1”),

Calculate the mean scores of “ConfStat1” in each gender.

What is the difference in mean score between the two genders?

Calculate the standard error of the difference in mean score on “ConfStat1” between the two genders.

Construct the 95% confidence interval for the mean difference in the confidence score between the two genders. Is one gender more confident in using statistics than the other gender at week 1? Summarise your findings and justify your answer.

A researcher would like to know whether the confidence in using statistics in PG students changes once they started taking the Biostatistics course. For this questions, use “ConfStat1” and “ConfStat2”.

Produce plots and report the summary statistics for the variables “ConfStat1” and “ConfStat2” (Hint: use your proposal to Questions 2.1/2.2).

Use the information you have just produced to write a paragraph (no more than 150 words) to summarize UoM PG Dentistry students’ confidence level at Week 1 and Week 6

Create a new variable “ConfDiff” according to the following: ConfDiff=ConfStat1- ConfStat2.

Produce histogram and boxplot for the new “ConfDiff” variable.

Describe the different elements of the boxplot (e.g. line, box, symbols) using summary statistics calculated from the variable “ConfDiff” (e.g. calculate the mean value and state where on the boxplot it is).

Describe the distribution of this new variable “ConfDiff”.

The Head of Dentistry wants to evaluate the hypothesis “To start DENT70001 has led to PG Dental students increasing their confidence level in using statistics” using this year’s students’ data.

Propose a statistical test to test the above hypothesis. State the model assumptions behind your choice of test. Justify your choice against the data. [9]

Using the test you have proposed in to test the hypothesis. Write down the null and alternative hypotheses, conduct the analysis, and write a paragraph regarding your interpretation. Set type 1 error to be 10%.

