Congrats! Someone has signed up for your store’s newsletter, downloaded your app, or created an account on your site. Getting more users is cool, but it doesn’t pay the bills. How quickly do these users turn into a paying customer?

Sure, you could find the average time to first purchase, but just talking about the average is misleading or incomplete.

Visualizing the time to first purchase allows you identify key points in your customer’s journey.

Then, you’re able to take action at these key points and potentially convert someone into a paying customer. The lifelines Python library makes it really easy to answer to do all of this. Let’s see what we can find.

Getting Your Data Together

You’ll want your data formatted like so:

User_ID Time of Registration Time of First Purchase Source of Registration (optional)
XXXX [A Date] [Another Date] Paid Ad Campaign, Organic, etc.

Don’t know how to get this data from your databases? Learn SQL! OK, a hint: your query will look something like

SELECT user_id, created_at AS registration_date, 
MIN(order_table.order_date) AS first_purchase_date
FROM schema.user_table
LEFT JOIN schema.order_table ON
schema.user_table.user_id = schema.order_table.user_id
GROUP BY 1;

I will be working with a dataset provided by Prof McCarthy. His tutorial on visualizing user behavior in Excel is fantastic.

If you follow along with Prof McCarthy’s dataset, you'll notice that it's a transaction log of customer purchases--not time from registration to first purchase. I'll be pretending the first purchase is the customer's registration date and that the second purchase is the first purchase. Don't worry it's the same principles!

First Pass at Analyzing the Data

OK, pip install lifelines, load the CSV into your Colab or Jupyter Notebook instance, and let’s go!

!pip install lifelines
import pandas as pd

df = pd.read_csv('/content/registrationToFirstPurchase.csv')

Now, let’s work that lifelines magic. Lifelines will automatically calculate the time to first purchase and add a boolean column to indicate if the user ever purchased.

from lifelines.utils import datetimes_to_durations
df['T'],df['E'] = datetimes_to_durations(df['Registration'], df['First Purchase'])
df.head()

The column df[’T’]  is the number of days from registration to purchase, while df[‘E’] marks if the user ever purchased.

Now, the moment you’ve been waiting for! We’ll use the Kaplan Meier fitter to visualize our customers’ journey from registration to first purchase

from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter(label="users")
kmf.fit(df['T'], df['E'])
kmf.plot(figsize=(8,8), loc = slice(0., 180))
After reviewing the chart, we can say that about 35% of our users will purchase by 180 days after registration.
image

Figure 1 - Percent of Users Who Haven’t Purchased by Days After Registration

The above chart shows what is formally known as a survival function. The lifelines library leverages models from a field known as “survival analysis,” which has its roots in the insurance industry. I’ve found that it’s distracting to call it a “survival curve” to non-data folks so let’s keep it between us.

Intervening in the Customer’s Journey

Unless you’re a consultant, you probably don’t get paid to say things—you get paid to do things. Let’s figure out what we can do.

My entering hypothesis is that it’s easiest to drive incremental new buyers where the customers are naturally making their first purchases. It’s probably too much of an uphill battle to do something about the users who are 180 days after registration. You can see the curve is flattening out, appearing to approach an asymptote.

Data scientifically, I want to know where the slope of the curve in figure 1 is at its highest magnitude. That’s where we are naturally minting new buyers at the fastest pace. I intend to intervene just a little after that point.

Why just a little bit after that point of peak activation? I think it’s the customers that make it through the natural activation period without buying that could use the encouragement.

You might be tempted to export the survival function, use Excel to calculate the slope between points in time, and then graph it in Excel. That’s what I did the first time I worked with the lifelines library. Fortunately, lifelines has some functions that will plot the slope of the survival function.

For math reasons, we’re going to use the Nelson-Aalen Fitter for this.

from lifelines import NelsonAalenFitter
naf = NelsonAalenFitter()
naf.fit(df['T'], event_observed=df['E'])
naf.plot_hazard(bandwidth=3, loc=slice(0., 180))
plt.xlabel('Days After Registration')
plt.ylabel('Rate of New Buyers per Day')
image

Figure 2 - Rate of New Buyers per Day by Day After Registration

Here days ~10 and ~75 stand out to me. Around day 10 we have the highest rate of new buyers activating per day. Day 75 is a spike in an otherwise declining rate of activation. We can come back to day 75, but for now let’s zoom in on the first 25 days of our customers’ activation journey.

image

Figure 3 - Rate of New Buyers per Day by Day After Registration (First 25 Days)

OK, it looks like day 7 is the peak of new buyer activation. Here are two quick questions that won’t be found in the dataset:

  • Are our CRM channels sending out discount offers or other promotions one week after registration?
  • Is there something about our product or store that has a weekly rhythm to it? For instance, are we offering a product trial that lasts a week?

Let’s say the answers to these questions are “No, we are not deliberately doing anything to drive this behavior." And if you’re the first person looking at this data at your company, that’s probably true.

Without knowing too much about your product (or the product in the fictional dataset), let's stage some interventions between days 10-15. The frothiness of the first week has started to cool down—maybe these are the customers that need a little extra encouragement.
  1. Is there anything the customer needs to do to setup their experience with your product or store? This includes things like selecting a size or identifying a favorite item category. If so, consider an experiment that drives them to take that action during days 10-15.
  2. Consider a re-targeting paid marketing campaign that finds users off platform. How does the purchase rate of the group exposed to the re-targeting campaign compare to the group not exposed to the campaign?
  3. Depending on your unit economics, brand position, etc., can you afford to offer a discount towards the first purchase? If so, consider sending an offer then.

Next Steps and Final Thoughts

We’ve explored your customers’ time from registration to first purchase and have a framework to understand where you might intervene to drive new buyers.

In my next post, we’ll use the fourth column, Source of Registrant to determine our best source of new buyers. Then, you can double-down on  your best channel, and cut-off channels that may not be worth it.