Introduction
The objective to be achieved is to obtain a model which can be used to estimate the valuation of soccer franchise in Europe. In order to facilitate the same, a sample data has been provided which contains the information about 20 soccer franchisees based in Europe with regards to revenue generated and valuation. In order to develop a causal link between the revenues generated annually and the valuation of franchise, linear regression has been used as the enabling technique. The resultant equation of the linear regression model can be then used to predict the value of the franchise given the annual revenue estimates. However, the linear regression analysis is based on various assumptions about residuals being independent, normally distributed, presence of linearity and also presence of homoscedasticity. The analysis of the residuals obtained from regression has been performed in order to analyze if usage of the linear regression is appropriate for this cause and if the model would give a true estimate or not of the franchisee value.
Part A: Summated Ratings and Costs
- The scatter plot with revenues as the independent variable and value as the independent variable is indicated below.
From the visual analysis of the plot, it is apparent there is an upward sloping trend but there is scattering of values as the revenues increase which hint that the relationship between the indicated variables would be moderate to strong. Also, some degree of linearity is observable from the plot1. Besides, the positive relationship indicates that the franchisees having higher revenue tend to command higher valuations as well which seems quite logical.
- The regression output obtained from excel is as illustrated below.
From the above output, it is apparent that b0 = -202.96 and b1 = 3.25
c) The intercept or boin the given regression model indicates that for a franchise with no revenue generation, the valuation would stand at -$ 202.96 million2. Clearly, the lowest possible valuation any asset could possible command is 0 and negative valuation of franchise does not make sense.
The slope coefficient or b1 tends to highlight that rate of change of the franchise value with a unit million change in annual revenues would be $3.25 million. Typically the direction of change in both variables of interest would be the same.
- d) The regression model derived above may be captured using the below mentioned equation.
Franchise Value ($ million) = -202.96 + 3.25* Annual Revenues ($ million)
The input value of annual revenue is given as $ 150 million
Hence, franchise value = -202.96 + 3.25*150 = $ 284.54 million
- e) The coefficient of determination has a value of 0.7806 for the given model. This indicates that annual revenues fluctuations account for 78.06% fluctuations in the value of the franchise and thus is clearly a significant predictor4. Considering that 22% of the variations in franchise value are unexplained, it makes a strong case for addition of other predictor or explanatory variables.
Part B: Residual Analysis
- The plot of the residuals obtained is illustrated as follows.
For linearity, it is required that the residual plot must have random values and no pattern should be decipherable. The presence of a pattern would highlight that non-linear models would be more accurate to capture the causal relationship. In the given case, a pattern seems to be present for the residuals on the positive side as with increasing revenue, there is an increasing trend for the positive residuals. Hence, there is some degree of non-linearity present in the relationship between the given variables. Also, there is some amount of dependence amongst the residuals or errors as in case of no interdependence, the pattern would have been completely random.
Another assumption of linear regression is homoscedasticity which implies that the residual or error variance which when plotted against the independent variable (Annual Revenue), should be equal on both positive and negative side. But for the given regression and the resultant residual plot, this does not seem to be the case as greater variance is observed on the positive side in comparison to the negative side. Hence, it can be conclude that the homoscedasticity assumption does not hold true.
- The residual histogram is as illustrated below.
It is evident from the histogram above that more concentration of residuals are on the negative side and also there is a positive skew due to outlier being present on the positive side as indicated in the histogram4. Therefore, the residual distribution is apparently non-normal.
- The residuals boxplot is illustrated as follows.
The requisite residual five number summary is illustrated as follows.
In accordance with the boxplot indicated above coupled with the five number summary, it becomes apparent that certain residuals are outliers which are not restricted to a particular side but are found on both positive as well as negative side2. The separation of minimum and maximum value from first and third quartile respectively also hints at the same3. Thus, the residual distribution is apparently non-normal.
- The requisite residual normality probability plot is illustrated as follows.
The plot indicated above is indicative of right skewed distribution. Two observations from the plot hint at the same. Firstly, in the lower percentiles, the increase in very value seems to be quite slow thus indicating higher concentration of values but when it comes to higher percentiles, the rise in value is sudden which implies that on the positive side the residual concentration is lesser1. Secondly, the surge witnessed towards the higher percentile is abrupt indicating outliers presence. Thus, the residual distribution is apparently non-normal.
Conclusion
The discussion above clearly indicates that the usage of linear regression is quite convenient but it is imperative that the various assumptions need to be satisfied. Even though the given regression model has a reasonably high coefficient of determination, however, the residual analysis indicate that the model may be a misfit considering that the various assumptions do not hold true. These include homoscedasticity of data coupled with the linearity assumption which are not satisfied as apparent from the scatter plot. Further, from the histogram, boxplot normal probability plot, it has been indicated that the errors or residuals are not normally distributed which is a matter of concern for the accuracy of the regression approach to estimation of franchisee value.
Bibliography
- Eriksson, P. Kovalainen, A., Quantitative methods in business research, London: Sage Publications, 2015
- Flick, U. Introducing research methodology: A beginner's guide to doing a research project.New York: Sage Publications, 2015
- Hastie, T., Tibshirani, R. Friedman, J. The Elements of Statistical Learning, New York: Springer Publications, 2011
- Hillier, F. Introduction to Operations Research New York: McGraw Hill Publications, 2006