You will need to download the Excel dataset ‘Major Assignment data Movie Downloads.xlsx' from Canvas. The data set contains customer details and the types of movies downloaded from an internet site for a given year. There are 4815

customers in this data set and eight variables as follows:

1. Customer No: is the customers number for this set

2. State: the state of residence of customer

3. City: city of residence

4. Gender: M = Male, F = female

5. First choice: Type of movie category (Action, Comedy, Drama and SciFi)

6. Second Choice: Second choice of movie category

7. Age: age of customer

8. Purchases: Number of purchases

9. DollarAmt: Total amount of purchases in dollars

Compare both intervals with their respective true means by calculating the actual population mean for the full 4815 customers, and comparing the true population mean to the sample mean and confidence interval (note: it is not usual to do this, so you are asked to do this for the purpose of this assignment).

This report uses descriptive statistics to describe a statistical summary of all the variables.Further analysis was done to establish the average dollar amount spent on all types of movies and the average number of purchases for the first choice SciFi movies only. Tests of hypothesis was also carried out to determine whether the average number of purchases differed for males and females. The other hypothesis tested was whether the average dollar amount spent for first choice comedy movies was more than the average spent for first choice drama movies. Finally, a full regression analysis was performed to investigate the linear relationship between the age of customers and the total dollar amount spent.

Analysis A random sample of 50 customers was extracted from a population of 4,815. Below is a discussion of the sample statistics.Descriptive Statistics Of the eight states represented in the sample, 36% of the customers are located in FL while only 6% are from LA.

The dollar amount spent by the customers are widely varied with the least amount spent being $53 while the most amount spent was $368. The average dollar amount spent is 164.62 with a standard deviation of 81.47. the shape of the histogram indicate that the distribution of dollar amount spent is bimodal.

Confidence intervals

The data analysis of this report is based on a sample of 50 customers. Therefore, it is important that the statistics reported are comparable to the entire population to certain level of confidence. The average dollar amount spent on all types of movies and the average number of purchases (first choice SciFi movies only) were investigated and reported below.We are 95% confident that the average number of purchases for first choice SciFi movies only for the population lies between 28.20 and 39.94. The true population mean for purchases of first choice SciFi movies was found to be 32.58, which lies within the 95% confidence interval from the sample.

We are 95% confident that the average dollar amount spent for all types of movies is likely between 141.47 and 187.77. This is the confidence interval using the sample mean dollar amount spent for all types of movies. When we calculated the true population mean of dollar spent for all types of movies, it was found to be 166.71 which lies between the 95% confidence interval from the sample.

Hypothesis Testing

There are several assumptions based on the dataset which will be verifies using hypothesis test at a certain level of confidence. The first hypothesis made based on the data is that the average amount spent (in $) for first choice Comedy is more than the average for first choice Drama movies. The second assumption is that the average purchases, differs for males and females. These hypotheses are tested at 5% significance level and reported below.

## Confidence Intervals

Hypothesis 1

The average dollar amount spent for first choice Comedy does not significantly differ from the average dollar amount spent for first choice Drama movies. The t-score was used because the population variance is unknown. The test statistics as computed by Excel was found to be 0.0443, and the p-value = 0.4826 which is greater than the significance level of 0.05 Therefore, 0.05 level of significance, we conclude that there is no sufficient evidence to state that

he average dollar amount spent for first choice Comedy is more than the average dollar amount spent for first choice Drama movies.

Hypothesis 2

The average purchases do not differ for males and females. The t-score was used because the population variance is unknown. From the Excel output, the test statistic score was found to be 1.8592, and the p-value = 0.0697. using the significance level of 5%, the p-value is larger than 0.05. therefore, we conclude that there is no sufficient evidence at 95% confidence level to state that the average purchases differ for males and female.

Correlation and Regression

To determine whether a relationship exist between multiple quantitative variables, we use correlation and regression analysis. This report investigates the relationship between the customer’s age and the dollar amount spent and determine whether we can predict dollars spent on movies from the age of the customer

Scatterplot

The chart below shows the relationship between the age (x-axis) and dollar amount spent (y-axis). From the graph, it is evident that there is no clear relationship between the two variables. It could be said that the variables have a week negative correlation as indicated by the linear trendline on the chart.

Linear regression

Since the two variables are weakly correlated, we use the line of best fit, as calculated by Excel. The equation for the linear regression model is obtained as: Where, represents the dollar amount spent, and represents the age of the customer The results indicate that for every year increase in age of the customer, the total dollars spent on movies decrease by $0.45, hence the negative linear relationship. Additionally, it can be concluded that at 0 years of age the dollar amount spent would be $183.9, which is extremely unrealistic.

Coefficients of Correlation and Determination

The coefficient of correlation is used to measure the degree of relation between two variables in terms of direction and strength of the relationship. The value of the coefficient od correlation, r, lies between 1 and -1. The sign indicates the direction of the relationship which can be positive or negative, while the number indicates the strength of the relationship. The closer the value is to 1, the stronger the relationship between the variables. The coefficient of

correlation was 0.0933. this indicates a very weak positive relationship since the value is positive and is close to zero.

The coefficient of determination, R-square, indicates the percentage variation in the dependent variable, y, that can be explained by independent variable, x. we use the Excel regression output to determine R-square. The coefficient of determination was found to be 0.0087. this indicates that only 0.87% of the variation in dollar amount spent ca be explained by the age of the customer. Therefore, the regression model is not a good predictor of dollar amount

spent.

Hypothesis 3

There is no linear relationship between the age of customers and the dollar amount spent on movies versus there is a linear relationship between the age of customers and the dollar amount spent on movies A t-test in Excel was use because we do not know the variance of the population, the results indicate a t-score of -10.3435 and p-value of 0.00. Using the level of significance of 5%, we find that the p-value is smaller than the significance level. Hence, we conclude that there no sufficient evidence to state that there is a linear relationship between age of customers and dollar amount spent.

Conclusion

It was found that the average dollar amount for all types of movies of the population was between $141.47 and $187.77 at 95% confidence level. The average dollar amount spent was actually calculated and the result was proven as the real population mean of dollar amount spent was found to be $166.71.It was also found that the population mean of purchases for first choice SciFi movies only was between 28.20 and 39.94 at 95% level of confidence. The true population mean for purchases of first choice SciFi movies was found to be 32.58 which proves the results.The first hypothesis that the average dollar amount spent for first choice comedy movies is more than the average dollar amount spent for first choice drama movies was rejected at a significant level of 5% as the p-value was 0.4826. the sample contained only 5 customers that spent on comedy movies, so this result might not be a true reflection of the population.

The second hypothesis that the average purchases differ for males and females was rejected at 0.05 significant level as the p-value was 0.0697.The regression analysis indicated that there was no linear relationship between the age of

customers and the dollar amount spent on all type of movies. This was evident from the scatterplot and was statistically verified by t-test for hypothesis testing at 5% significance level.

