If you work in e-commerce, you probably have these following questions:

  • Month-over-month, how are our sales doing?
  • How many new customers purchased last month? How many repeat customers purchased last month?
  • What percentage of users are still purchasing 12 months after initial purchase?

Luckily, there is a source of truth for all of these questions: your business's transaction logs! 🤩 Daniel McCarthy put together an excellent tutorial on how to characterize your business's revenue engine using Excel. But what if your transaction log history is too large for Excel? I know; that's a big flex. You'll need some other way to explore the data.

I took Prof McCarthy's PowerPoint as great excuse to start on my first Python package: Lifestream, the easiest way to make sense of your transaction log.

Walking Through Lifestream

Lifestream assumes you have the following available in your dataset:

  • order_id which is unique to each distinct order.
  • user_id which is unique to each customer.
  • monetary_val which is the $ value of the transaction.
  • order_date which is the timestamp of the transaction.

If your dataset looks a little bit different, there's a method in the package to create a compatible transaction log.

First, let's find with some data to play with. UC Irvine has a publicly available transaction log dataset found here. Download OnlineRetail.xlsx and convert it over to a CSV file.

When you explore the data, notice that the dataset shows each line item purchased. It's actually more detailed than the standard transaction log! Open up Colab or your local Jupyter Notebook and perform the following:

!pip install lifestream
 import lifestream
 import pandas as pd
 
 df = pd.read_csv({Path to Your Transaction Log})
 log = lifestream.create_transaction_log(df, 'InvoiceNo', 'InvoiceDate', 'Quantity', 'UnitPrice', 'CustomerID')

Boom! We now have a transaction log that we can work with. How were sales doing for this retailer? Let's make a bar chart of their sales.

lifestream.sales_chart(log, 'InvoiceDate', 'monetary_val', 'CustomerID')

Looks like the retailer had a nice pop in sales in Q4 of 2011.

But how many people who bought in Dec 2010 were still buying in Dec 2011? We can use the following method to plot cohort performance overtime:

lifestream.cohort_retention_chart(log, 'InvoiceDate', 'InvoiceNo', 'CustomerID', 'monetary_val', '2010-12', '2011-01', '2011-02')

You can trace the buying behavior of customers who bought in Dec 2010 by following the blue line. Notice the bump in those who returned in the 12 month mark? We might hypothesize that December is a key month for reactivating customers, since we also see corresponding "bumps" in the other cohorts for this month.

Stay Tuned for More

Please, please, please join me over at the Lifestream repo if you'd like to contribute! Help were some resources that were key for creating my first package and in transaction log analyses in general:

🔃 Git and Github for Beginners
🐍 Publishing (Perfect) Python Packages on PyPi
✍️ Cohort Analysis with Python by Greg Reda
👨‍🏫 Twitter thread on transaction log analysis by Prof Dan McCarthy