Your company, the maker of StackBook laptops (you may remember this scenario from MRT285), is interested in understanding customer loss and retention. The supplied file is a list of almost 22,000 customers spanning eight years’ worth of activity, containing information about their purchase history. Each record represents the purchase history for one customer, including the number of purchases and the dates of the first and last purchases made. (Most of the fields are self-explanatory; see notes below for the others.)
You will use this data to compute tenures for the customers, and their un/censored state. While the exact purchase dates are given in the file, we will be working in months for ease of understanding. I have added columns for you with formulas converting the dates of first and last purchase into month numbers (‘FirstMonth’ and ‘LastMonth’, respectively). Please use these columns, not the dates, in your analysis! Month numbers run from 0 (at the beginning of the time window) to 96 (the month just completed, i.e., now).
Calculating tenure would be as simple as subtracting LastMonth - FirstMonth, except for the fact that people do not buy laptops every day (or even every month!) The StackBook division is making the assumption that customers buy a new laptop every three years or less. Thus, a customer who makes a subsequent purchase within 24 months is a continuing customer, while after 24 months without a purchase, a customer is considered to be lost. (Note: For simplicity, we have not included the timing of every purchase a customer might have made, and we are not considering the case where a customermight be ‘lost’ and then won back. In this data, you can assume that the entire time betweenFirstMonth and LastMonth was a continuous relationship; we are only concerned with loss after LastMonth.)
To do survival analysis, you need to calculate the tenure and censored status of each customer. These will be new formula columns (making use of if statements).
A customer will be considered as active (censored) if they have made a purchase within 24 months preceding today. You will need to add a Censored column that has a value of 1 if LastMonth + 24 is greater than 96, 0 otherwise.
Because we don’t consider a customer lost until 24 months after their last purchase, the 24 months is included in their tenure. You need to add a new Tenure column that calculates tenure as 96 – FirstMonth if the customer is censored, LastMonth – FirstMonth + 24 otherwise.
Include in your report both the formulas you use for each column, as well as the first ten rows of your data table (including the new columns).
Before completing the rest of the assignment, you may send me an email with a screen capture of the first few rows of your table, to check that your values look correct. (I would suggest that you do this at least five days before the assignment is due, because I can’t guarantee turnaround time!) I will dothis only because if you get Question 1 wrong, all of your other work will be incorrect as well.
Using the resulting table from Question 1, do a survival analysis. Include the default output in your report.
Comment on the shape of the survival curve. What patterns are seen as time progresses? What is the median tenure? What is the mean tenure?
Using the results from Question 2 and the technique outlined in the exercises/tutorial, generate the hazard probabilities. Include only a line graph in your report.
Re-run your analysis from Question 2, but this time, grouping by job. (Do not ‘test cross groups’.) Include only the survival curve plot and the summary table in your output. Which job has the highest survival? Which job has the lowest? (You can gauge these from the average tenures in the summary table.)
Now, run a Proportional Hazards analysis on the same data, using job as the only model effect. Is the model significant?
For the jobs you identified as the best and worst in Question 4, include in your report only the rows from the Risk Ratios table that compare the two. Use the risk ratios to interpret the relative risk of the two groups (in both directions). Does this seem consistent with what you observed in Question 4?