Discuss about the Statistical Data Analysis for Business Decision Makers.
In each and every business data is generated. The data generated by the business can be analysed to provide useful insights into the business. The analysis of the data is generally done through the use of statistics. The results of the data analysis are important for the business manager, since on the basis of the analysis he takes decisions. The present report presents the analysis of the data on tourism sector which can be used to take business decisions.
The present data is a survey of 50 accommodation providers. From an analysis of the data is found that 2 accommodation providers have not had any guests, and thus the data on the 2 accommodation providers is missing. Hence, the data contains the survey for only 48 accommodation providers. The discrete random variable selected is the number of beds. The continuous random variable is the average time spent by the guests at the accommodations.
The continuous variable selected is the average time spent by a guest in the accommodation. The descriptive statistics are shown in table 1 of Appendices (Sheet CI in Excel File). From the table we find that the average time spent by a guest at B&B is 7.7150 hours which is more than the average time spent at the hotel, 7.5391 hours. The median hours spent by a guest at B&B is 7.41 hours, which is more than the median hours spent by a guest at hotel 7.05 hours.
The minimum and maximum time spent at B&B is 5.96 hours is 11.95 hours. The minimum and maximum time spent at Hotels is 6.25 hours is 11.46 hours.
The interquartile range of time spent at B&B is 1.8250 which is more than the interquartile range of time spent at the hotel, 1.1550.
Figure 1: Distribution of data for type of Accommodation
(source created by author)
From the figure (Sheet “CI” in Excel File) we find that the minimum and maximum time spent by a guest at the hotel is more than that at B&B.
Discrete Random Variable
The discrete random variable can be defined as that variable whose values are discrete eg., the star rating of the accommodations. We calculated the probability of the star ratings of the accommodations (table 4 – sheet “4” in Excel File).
Confidence interval (CI) can be defined as the probability that the mean will lie within the given intervals. The 95% CI is given in table 2 of appendices (Sheet “CI” in Excel File). From the table we find that the 95% of the CI of average time spent by a guest at B&B accommodations 6.9411 and 8.4889 hours with a mean of 7.7150 hours. Hence, there is a 95% probability that in a similar survey of 50 B&B accommodations we would find that the mean time that a guest spends lies between 6.9411 hours and 8.4889 hours.
The 95% CI for hotel accommodations is 7.0813 and 7.9968 hours with a mean of 7.5391 hours. Hence, we can say that in a similar survey of 50 hotel accommodations that there is a 95% probability that the mean time that a guest spends lies between 7.0813 and 7.9968 hours.
Normal distribution is used by statisticians for use in a whole array of data analysis. The normal distribution is generally depicted as a bell shaped curve. The normal distribution has also been described as a Gaussian distribution. For a normal distribution the probability distribution function is:
In the above equation the mean of the normal distribution is given by m. The standard deviation of the normal distribution is given by s. In a normal distribution, the mean, median and mode are all equal. The mean, median and mode together are known as central tendency. In addition for a normal distribution the mean is equal to zero and the standard deviation is equal to 1. Normal distribution is a continuous distribution which is symmetric about the mean (Black 2016).
Statisticians use the normal distribution to analyse the sampling distribution. In a survey (like in present case of tourism data) or business data if it is assumed that a variable is normally distributed then the mean and standard deviation of the sampling distribution can be established. The properties of normal distribution are applied for the analysis of the sampling distribution.
In inferential statistics we try to make inferences about the population from the sample data. We can also use inferential statistics to make probability estimates of the population. The central limit theorem is the basis of inferential statistics (Lomax and Hahs-Vaughn 2012). Inferential statistics provides us with the ability to draw conclusions about the populations with the help of the sample data. However the process by which the sample is collected is useful in determining the statistics that can be derived / inferred. The simple random sampling is important to make inferences since it is easy to make generalizations about the population from the sample.
Regression analysis is used to predict the dependent variable from the independent variable. In the present assignment linear regression is used to predict the response variable (revenue on sample night) from the predictor variable (number of guests taking dinner). The linear regression model used is .
In the above equation Y is the response variable and X is the predictor variable.
The regression model can be described as:
Revenue on sample night = b0 + b1 * Number of guests taking dinner.
From the regression statistics we find that the regression equation is
Revenue on sample night = 713.3846 + 46.19 * Number of guests taking dinner (table 5)
In addition the R2 value is 0.677, which means that the 67.7% of the revenue generated can be predicted with the variable number of guests (table 6 – Sheet “Tourism” in Excel Sheet).
Interpolation and Extrapolation are important concepts in Regression analysis. Interpolation means predicting within the range and extrapolation means predicting outside the range of the sample. For example lets us take that on a particular night there were 80 guests taking dinner.
Hence from the regression equation we can say that the revenue generated would be:
Revenue generated = 713.3846 + 46.19 * Number of guests taking dinner
= 713.3846 + 46.18 * 80 = 4407.78
Hence we can predict that, given on a particular night there would be 80 guests the revenue generated would be € 4407.78. In addition, we are 67.7 % confident that € 4407.78 would be generated.
In statistics a hypothesis testing is used to investigate whether a given condition is true in a sample data. All hypothesis tests examine the presence of a Null hypothesis or an alternate hypothesis.
For the present data we tested the hypothesis that “the average time spent by a guest at B&B type accommodation more than Hotel accommodation.”
Thus the null hypothesis for the present hypothesis: the average time spent at B&B accommodation = the average time spent at Hotel Accommodation.
The alternate hypothesis: the average time spent at B&B accommodation ≠ the average time spent at Hotel Accommodation.
To test the hypothesis we used the independent sample t-test assuming unequal variances.
From table 7 (Sheet “CI” in Excel File) we find that the p-value is 0.704 (two tailed). Hence we find that there are statistically no significant differences between the mean time spent by a guest at B&B accommodation compared to the mean time spent by a guest at a hotel. The p-value is used to evaluate the null hypothesis. If the p-value is more than the 0.05 (the significance level) then we fail to reject the null hypothesis. If the p-value is less than 0.05 then we accept the alternate the hypothesis.
In the present assignment we analysed a discrete random variable and a continuous random variable. After analysing the DRV and CRV we proceeded to forecast the revenue collected through the number of guests present at night through the use of regression analysis. As an extension of the regression analysis we predicted the revenue that can be collected when the number of guests would be 80. In addition we tested the hypothesis that the average occupancy at B&B accommodation is more than that at hotels.
From an analysis of the above data we find that two of the B&B accommodations did not have any guests. Also we find that B&B have lesser number of guests as compared to Hotel accommodations. In addition we find that the number of guests taking breakfast at B&B type accommodation is less as against Hotel accommodation. Thus B&B type accommodation should try to investigate why lesser number of guests is staying at these accommodations even though they are cheap. Also they should try to find the reason behind less number of guests taking breakfast. More number of guests taking at B&B accommodations can increase the revenue of B&B accommodations.
Wang, M., Lu, Q., Chi, R.T. and Shi, W., 2015. How word-of-mouth moderates room price and hotel stars for online hotel booking an empirical investigation with Expedia data. Journal of Electronic Commerce Research, 16(1), p.72.
Ladhari, R. and Michaud, M., 2015. eWOM effects on hotel booking intentions, attitudes, trust, and website perceptions. International Journal of Hospitality Management, 46, pp.36-45.
Black, K. (2016). Business Statistics, John Wiley.
Lomax, R. and Hahs-Vaughn, D. (2012). An introduction to statistical concepts. 1st ed. New York: Routledge.