Scenario
Your company, the maker of StackBook laptops (you may remember this scenario from MRT285), isinterested in understanding customer loss and retention. The supplied file is a list of almost 22,000customers spanning eight years’ worth of activity, containing information about their purchase history.Each record represents the purchase history for one customer, including the number of purchases andthe dates of the first and last purchases made. (Most of the fields are self-explanatory; see notesbelowfor the others.)
You will use this data to compute tenures for the customers, and their un/censored state. While theexact purchase dates are given in the file, we will be working in months for ease of understanding. Ihave added columns for you with formulas converting the dates of first and last purchase into monthnumbers (‘FirstMonth’ and ‘LastMonth’, respectively). Please use these columns, not the dates, in youranalysis! Month numbers run from 0 (at the beginning of the time window) to 96 (the month justcompleted, i.e., now).
Page 1 of 3Calculating tenure would be as simple as subtracting LastMonth - FirstMonth, except for the fact thatpeople do not buy laptops every day (or even every month!) The StackBook division is making theassumption that customers buy a new laptop every three years or less. Thus, a customer who makes asubsequent purchase within 36 months is a continuing customer, while after 36 months without apurchase, a customer is considered to be lost. (Note: For simplicity, we have not included the timing ofevery purchase a customer might have made, and we are not considering the case where a customermight be ‘lost’ and then won back. In this data, you can assume that the entire time betweenFirstMonth and LastMonth was a continuous relationship; we are only concerned with loss afterLastMonth.)Question 1
To do survival analysis, you need to calculate the tenure and censored status of each customer. Thesewill be new formula columns (making use of if statements).A customer will be considered as active (censored) if they have made a purchase within 36 monthspreceding today. You will need to add a Censored column that has a value of 1 if LastMonth + 36 isgreater than 96, 0 otherwise.Because we don’t consider a customer lost until 36 months after their last purchase, the 36 months isincluded in their tenure. You need to add a new Tenure column that calculates tenure as 96 –FirstMonth if the customer is censored, LastMonth – FirstMonth + 36 otherwise.Include in your report both the formulas you use for each column, as well as the first ten rows of yourdata table (including the new columns).Before completing the rest of the assignment, you may send me an email with a screen capture of thefirst few rows of your table, to check that your values look correct. (I would suggest that you do thisat least five days before the assignment is due, because I can’t guarantee turnaround time!) I will dothis only because if you get Question 1 wrong, all of your other work will be incorrect as well.
Question 2
Using the resulting table from Question 1, do a survival analysis. Include the default output in yourreport.Comment on the shape of the survival curve. What patterns are seen as time progresses?What is the median tenure? What is the mean tenure?
Question 3
Using the results from Question 2 and the technique outlined in the exercises/tutorial, generate thehazard probabilities. Include only a line graph in your report.
Page 2 of 3Question 4
Re-run your analysis from Question 2, but this time, grouping by job. (Do not ‘test cross groups’.)Include only the survival curve plot and the summary table in your output.Which job has the highest survival? Which job has the lowest? (You can gauge these from the averagetenures in the summary table.)
Question 5
Now, run a Proportional Hazards analysis on the same data, using job as the only model effect. Is themodel significant?For the jobs you identified as the best and worst in Question 4, include in your report only the rows fromthe Risk Ratios table that compare the two. Use the risk ratios to interpret the relative risk of the twogroups (in both directions). Does this seem consistent with what you observed in Question 4?