In hypothesis testing, we learned how to scientifically validate or refute a belief or a claim. In the next three lessons, we will shift gears to prediction. Regression analysis is a widely used statistical method that (1) helps us make predictions about a variable of interest based on known values of other variables, and (2) provides an objective measure of the change in the variable of interest as a result of change in other variables.
Can we predict future sales if we know our customers' age and income? How much more (or less) will we sell if the economy does well and our customers have $1,000 extra? Can we predict stock performance based on market information such as the S&P 500 index or the Dow Jones industrial average? What would be the stock price if the market moves up by 1 point? By 10 points?
Regression Analysis is a modeling technique to answer these types of questions. Our variable of interest is called the Dependent Variable (DV); sales and stock price in the above examples. Variable(s) that cause a change the DV are called Independent Variable (IV) or predictor variables. Customer age, income, market performance are the predictor or independent variables in the same examples. Based on the data, we create a mathematical model that captures the relationship between the dependent and independent variables.
The efficacy of the model is judged based on its predictive power (known as R ). Measures such as the regression coefficient (b or b ) inform us of the impact of change in one IV on the DV. We incorporate both categorical and continuous variables into the models.
After completing these three lessons, students should be able to evaluate the strength and direction of linear association (i.e., correlation) between two variables;
construct a linear regression model from sample data; interpret the slope and intercept of a regression line; and use the regression equation to make predictions.
Regression Analysis: Opening Cases
Case I: Concession Stand
Let’s say you are the concession manager at the Lincoln Financial Field, the home of the Philadelphia Eagles. You are in charge of stocking the concession stands. From experience, you know that fans buy more hot drinks in colder weather, but you want to know how much temperature affects coffee sales (coffee being the largest category of warm drinks at your stand). You have data from the last 50 games: the temperature and the number of cups of coffee sold during those games.
Can you use these coffee sales data to predict sales at the upcoming games? You plan to use weather.com for the day’s temperature to plan ahead.
Case II: Rating Wine
Can we rate wines based on their chemical properties?
Rating wine is considered more of an art than a science. Wine critics provide an evaluation of the wine using a rating scale (rating scales vary from country to country, 0–100, 0–20, 0–10). These ratings are highly subjective, as it depends on the individual critic’s taste, but play an important role in setting the market price and sales.
Advances in information technology have made it possible to collect, store, and process huge amounts of data. Consequently, data about the different chemical properties of wine (such as alcohol content, pH level, sugar level, etc.,) are now available.
Can these data about chemical properties of wine be used to predict wine ratings? Such objective predictors would help the wine producers in many ways. They will know the most important chemical properties that constitute a good wine. As the demand for wine grows worldwide (Cortez et al., 2009), this helps the wine producers to improve their production process as well as find niche markets that favor specific tastes.
from: http://archive.ics.uci.edu/ml/datasets/Wine+Quality. (http://archive.ics.uci.edu/ml/datasets/Wine+Quality) Looking Ahead
In this lesson, we will learn how to use correlation and regression analysis to:
1. Case I
1. explore whether and how temperature affects the number of coffee sales
2. predict coffee sales based on a given temperature
2. Case II
1. explore whether and how and how the chemical properties of wine affect its rating
2. predict wine rating based on its chemical properties