Section 1: Goals and format of the assignment
Paraphrase the following, there will be a similar discussions in the lectures and tutorials you can paraphrase that instead
“This assignment is supposed to help students understand the following concepts.
* You can ask a sample of people simple questions and give a numerical summary of the results
*The summary is not reliable, different samples would give different answers , This lack of reliability
can be well explained by theoretical distributions such as the z distribution.
*You can use Hypothesis testing to use sample to answer a questions about the population
*The first sections of the assignment involve explaining and summarizing a large data set of simple data and interpreting the results the later sections discuss Hypothesis testing, the issues of collecting and working with large sets of data, and the fact it is very hard to fully understand the applications of the z distribution. “
Section 2: Description of the data set
a)Expand upon and paraphrase the following, definitely describe each of the variables,
For each variable answer the question is it categorical or numerical?
“The data set is the survey responses of 100,000 people every student has their own sample of 100 people , Each person is shown one of 3 possible endings for a tv show, ending 1, ending 2 or ending 3 and they are asked two questions ,
question 1: Do they like the movie?
Question 2: How much would they pay for the DVD?”
- b) Also do one or both of the following
*Give some more questions that could be asked
*Criticise the existing questions
Section 3: Summary of the data set
You must do this section first because the descriptive statistics will help you understand the assignment, When you write up the assignment the descriptive statistics section will go in the middle of the assignment, however you really need to do it first.
You need to Open up the assignment data set (the file computer assignment with filters)
You must only use your sample of 100 numbers use the filter given in the excel sheet to find your own 100 numbers (this should become clear if you actually open the excel file)
Read instructions given in the excel file and answer the following questions.
Answer the following questions (refer to the excel file for more instructions) you must get excel to give you your data set, your numbers will be different to the other students.
Question 1
For the variable Do they like the tv show
- find
=phat1=proportion of people that liked the tv show with ending 1
Description of the data set
=phat2=proportion of people that liked the tv show with ending 2
=phat3=proportion of people that liked the tv show with ending 3
- find
phat1-phat2
phat2-phat3
- Give a chart that compares the proportion of the 3 different tv show endings
Question 2
- find
=xbar1=the average amount people would pay for the dvd of the tv show with ending 1
=xbar2= the average amount people would pay for the dvd of the tv show with ending 2
=xbar3=the average amount people would pay for the dvd of the tv show with ending 3
- find
xbar1-xbar2
xbar2-xbar3
The write-up uses the following concepts:
- Population and sampling techniques
- Calculating Descriptive values - mean, mode, median, variance, maximum, range and coefficient of variation.
- Drawing of Graphs - histograms and bar chart
- Sampling distributions
- Estimation theory – point estimates
- Hypothesis tests
This report is based on a sample drawn from a set of 10,000 survey responses of people. The sample size is 100, based on student id. The questionnaire administered relates to the person being shown one of 3 possible endings for a TV show, ending 1, ending 2 or ending 3 and they are asked two questions.
Q1: Do they like the show?
The answer is recorded as YES/ NO. The answer is therefore a categorical data type with 2 categories.
Q2: How much would they pay for the DVD?”
This answer is a quantitative variable, with positive integers only as values.
Section III
In this section we present some findings as required:
First we deal with the point estimates of the proportion of viewers in each group, defined by the ending shown.
- =phat1=.29
- =phat2=.44
- =phat3=.27
- phat1-phat2 = -0.03
- phat2-phat3 = -0.01
- The table below shows the distribution of viewers across groups with their liking. Majority of viewers in group 2 and 3 like the show( 23/44 and 21/27), unlike in group 1 where the majority do not like the show. ( 16 out of 29)
YES |
NO |
||
endg1 |
13 |
16 |
29 |
endg2 |
23 |
21 |
44 |
endg3 |
21 |
6 |
27 |
57 |
43 |
100 |
We present data on point estimates of average amounts that each group is willing to pay:
- =xbar1= 4.73
- =xbar2= 5.76
- =xbar3 = 8.2
- sample average = 6.127
- xbar1-xbar2 = -1.033
- xbar2-xbar3 = - 2.439
Next we consider each group separately and visualise the distribution of willingness to pay using histograms. The following representation shows frequency distribution as well as cumulative percentage of this distribution visually. The table below shows the frequency numerically also.
class |
Frequency |
||
|
ending 1 |
ending 2 |
ending 3 |
0 to less than 2 |
16 |
21 |
6 |
2 to less than 4 |
0 |
0 |
0 |
4 to less than 6 |
0 |
0 |
0 |
6 to less than 8 |
4 |
3 |
3 |
8 to less than 10 |
4 |
10 |
9 |
10 to less than 12 |
3 |
6 |
5 |
> 12 |
2 |
4 |
4 |
We now provide descriptive statistics for the 3 categories outlined above.
ending 1 |
ending 2 |
ending 3 |
|
Mean |
4.734483 |
5.768182 |
8.207407 |
Standard Error |
0.926529 |
0.81876 |
0.884996 |
Median |
0.9 |
8 |
9 |
Mode |
8 |
9 |
9 |
Standard Deviation |
4.989 |
5.431038 |
4.598572 |
Skewness |
0.406286 |
0.290192 |
-0.87268 |
Range |
13 |
18 |
14 |
Maximum |
13 |
18 |
14 |
Summary of data description:
- Average amount that a viewer wants to pay for a DVD is highest for show with ending 3- at 8.207
- For shows with ending 1 with TV show ending with1 the average amount that a viewer wants to pay for a DVD is lowest at 4.73.
- The average amount for shows ending with 2 is 5.768, closer to group 1 average.
- The sample average stands at 6.127.
- There is less similarity in the maximum amount that a viewer is willing to pay for a DVD across all show. It is highest at 18 for group 2 and lowest for group 1 at 13.
- The three groups differ in skewness. Shows ending with 1 and 2 are positively skewed in contrast with negative skewness for group 3. This is seen in histograms as well.
- There is zero willingness to pay for amounts between $2 and $6 across groups.
- The range is almost same across categories, and equals the maximum amount as the minimum amount is zero in all groups. Range = maximum – minimum. When minimum is zero range equals maximum.
- Variation is captured in an absolute sense by standard deviation. The standard deviation is highest for group 2 and lowest for group 3.
The use of surveys in making conclusions about populations is based on theory of estimation and hypothesis testing. The usefulness of surveys is unchallenged, especially when time and money constraints are important. They become unavoidable when the population itself is infinite or uncountable. The quality of the results ( in terms of relevance and usefulness) and conclusions drawn from a sample are subject to many considerations- size of sample as compared to population size, sampling technique used, confidence level chosen and the questions asked in any questionnaire from sample participants. After controlling for these aspects some problems still remain in the form of non sampling errors that include nature of responses being biased or participants giving wrong information. Some participants may provide frivolous answers as they are not required to prove the veracity of their answers with actions. When we ask how they are willing to pay for DVD they can sound magnanimous and state a large number, but may actually refuse to pay that amount. The questionnaire may not be exhaustive. For example for Q1, along with YES/ NO we can have CAN’T SAY also.
Summary of the data set
Let us consider shows ending with 1 and 2 first
YES |
NO |
|
Ending 1 |
13 |
16 |
Ending 2 |
21 |
21 |
We use a chi square test of association to test for association between liking show and its ending for two pairs - 1and 2, and 2 and 3. The null hypothesis is independence between ending and liking. The alternative hypothesis says there is an association between liking a shos and its ending.
First group 1 and 2, calculating expected values, we note that the chi square test value is 13.02395 and the critical value with 95% confidence is 5.02(with 1 degree of freedom). We DO NOT accept the null hypothesis.
observed |
expected |
(O-E)^2/E |
13 |
10.44 |
0.627739 |
16 |
10.73 |
2.588341 |
23 |
15.84 |
3.236465 |
21 |
16.28 |
1.368452 |
7.820997 |
. Let us consider shows ending with 2 and 3 now:
YES |
NO |
|
Ending 2 |
23 |
21 |
Ending 3 |
21 |
6 |
Again using a chi square test we note that the test value is 13.264 and the critical value with 95% confidence is 5.02. Clearly, there is NO independence between liking the show and its ending.
observed |
expected |
(O-E)^2/E |
23 |
19.36 |
0.68438 |
21 |
11.88 |
7.001212 |
21 |
11.88 |
7.001212 |
6 |
7.29 |
0.228272 |
14.91508 |
We now conduct tests for checking differences in mean amounts that people are willing to pay for shows ending with 1 and 2. We use a normal distribution for the estimated difference value
Ho: µ1 = µ2
H1: µ1 ≠ µ2
Test value = (-1.033/ SE )
SE= (5^2/29 +4.9^2/44) ^.5 = 1.816
Test value = -.871
Using a 99% level of confidence the critical z value is 2.57. As test value < critical value we can say that there is a NO significant difference in amounts people are willing to pay for DVDs with shows ending with 1 and 2.
Now for shows ending with 2 and 3 .
Ho: µ2 = µ3
H1: µ2 ≠ µ3
Test value = (-2.439/ SE )
SE= (4.9^2/44 +4.6^2/27) ^.5 = 1.115
Test value = -2.1156. Using a 99% level of confidence the critical z value is 2.57. as test value < critical value we can say that there is NO significant difference in amounts people are willing to pay for DVDs with shows ending with 2 and 3.
Sampling distributions are an integral part of estimation and hypothesis theory. This theory forms the basis of any sample analysis to derive population glimpses. As a theory it is based on mathematical theory of probability distributions and mathematical proofs.
Sampling theory uses a technique of deriving statistic values from samples. The samples are ideally infinite in number. These statistic values are then applied on probability distributions like normal z, t, F and chi square distributions to aid in the test of hypothesis. We have used two such distributions in our report
- chi square distribution to check independence of liking across categories,
- normal distribution to check differences in amount that people are willing to pay across shows.
In real life we only draw one sample, which is why a theoretical concept that uses an infinite number of samples is difficult to grasp and understand
We work with a sample that constitutes 1% of the population data, (100 out of 10000 datapoints). The conclusions are conditional on the student id that is used as the sampling method. These sample points are randomly spread over three groups based on show ending. Our sample has a disproportionate large number of data points in group 2, for shows with ending 2 (44 out of 100).
The data shows variations in terms of some descriptive statistics like mean, variance, maximum values, skewness and standard deviation across all groups formed on the basis of ending of the show. There is a similarity in terms of maximum willingness to pay and range. Also there is a statistically significant association between liking show and its ending when we choose 2 pairs of shows( ending 1 and 2=, and ending 2 and 3). The differences in average amounts that a viewer pays are not statistically significant if we choose a 99% confidence level. These results are conditional on the level of confidence ( or Type 1 error chosen) a lower confidence level may change the conclusions.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2017). Analyzing Survey Results To Answer Population Questions (70 Characters). Retrieved from https://myassignmenthelp.com/free-samples/bus105-business-statistics.
"Analyzing Survey Results To Answer Population Questions (70 Characters)." My Assignment Help, 2017, https://myassignmenthelp.com/free-samples/bus105-business-statistics.
My Assignment Help (2017) Analyzing Survey Results To Answer Population Questions (70 Characters) [Online]. Available from: https://myassignmenthelp.com/free-samples/bus105-business-statistics
[Accessed 13 November 2024].
My Assignment Help. 'Analyzing Survey Results To Answer Population Questions (70 Characters)' (My Assignment Help, 2017) <https://myassignmenthelp.com/free-samples/bus105-business-statistics> accessed 13 November 2024.
My Assignment Help. Analyzing Survey Results To Answer Population Questions (70 Characters) [Internet]. My Assignment Help. 2017 [cited 13 November 2024]. Available from: https://myassignmenthelp.com/free-samples/bus105-business-statistics.