Calculating Customer Lifetime Value (LTV) in BigQuery SQL

Every organization doing business online, especially subscription-based businesses should have a pretty strong pulse on the behavior of their customers. In the subscription business, LTV or Lifetime Value is a KPI that has been utilized in multiple industries for decades and will continue. Once an organization has mastered measuring their LTV, they can pivot toward predicting future LTV and begin making changes to actively increase LTV. In this post, I’d like to speak about how a subscription-based business can measure its LTV with only raw subscription data.

In this post, I’ll use the Netflix Userbase Dataset found on Kaggle. This dataset provides a small snapshot of data on subscriptions including user IDs, start and end date timestamps, subscription types, monthly revenue costs, and some high-level demographics. For this exercise, I have edited the dataset to only pull subscription-specific data and have altered the end dates to best suit this type of analysis. With that said, the dataset is excellent for anyone looking to learn more about working with subscription data. For the sake of deeper analysis, I’d like to see how LTV changes over time rather than just a flat rate for the full date range. The standard formula for calculating involves taking the average monthly revenue from each subscriber and multiplying it by the amount of time the subscriber is subscribed. This gives a single value for all subscribers at one time. Instead, this query allows for a view of changes in LTV over multiple months based on the addition and churn of subscribers for each month.

Netflix Userbase Dataset

I pulled this data into BigQuery for this exercise as most organizations will keep this type of data in some sort of database for analysts to access.

Each row represents a user’s entire membership duration.

  • Subscription Type denotes the level of membership (Basic, Standard or Premium)
  • Monthly Revenue is the monthly cost for the membership
  • Join Date is the start date
  • joinmonth is the first day of the first month of the subscription (used for monthly summary calculations)
  • Last_Payment_Date represents the last day the subscription was paid
  • cancelmonth is the first day of the last of the month of the subscription (used for monthly summary calculations)

We want to create a query that allows us to calculate LTV which is made up of multiple metrics (each one with separate CTE). I’ll display each CTE separately to avoid confusion. I also added a calendar table as a CTE for easy expansion of date ranges:

Monthly Recurring Revenue (MRR) = Total Revenue for the Month:

Query for LTV

If you’d like to jump straight to the full query, it’s on GitHub. A step-by-step explanation of each CTE is below.

First, we join the calendar with the Netflix data. The join ensures each month the subscriber was subscribed is expanded out in the result. For each first row, mark a 1 for the “new_subs” metric. This will be counted later.

WITH cal AS (
SELECT month,
FROM `eb-project.calendar),

subs AS (
SELECT
  CASE WHEN joinmonth = c.month THEN 1 ELSE 0 END new_subs,
  joinmonth AS joinmonth,
FROM `eb-project.sandbox.netflix` n
LEFT JOIN cal c ON c.month BETWEEN joinmonth AND cancelmonth),

Average Revenue per Subscription (APRS) = Monthly Recurring Revenue (MRR) / Number of Subscriptions:

Take the above data and sum revenue and count “new_subs” by each month. Count each unique User_ID to get a total subscription count. Divide the total revenue by the unique user count for APRS.

arps AS (
SELECT 
  month,
  SUM(Monthly_Revenue) AS monthlyrevenue,
  SUM(new_subs) AS new_subs,
  COUNT(DISTINCT User_ID) AS total_subscriptions,
  ROUND(SUM(Monthly_Revenue) / COUNT(DISTINCT User_ID),2) AS arps
FROM subs
GROUP BY month
ORDER BY month),

Churn Rate = Churned Subscriptions / Number of Subscriptions:

Join the Netflix table with the calendar month by the cancelmonth to create a churn CTE. Count distinct User_IDs as “Churns”.

churn AS (
SELECT 
  cancelmonth AS churnmonth,
  COALESCE(COUNT(DISTINCT User_ID),0) AS churns
FROM `eb-project.sandbox.netflix` n
LEFT JOIN cal c ON c.month = cancelmonth
GROUP BY cancelmonth),

Create a second churn CTE to separate “Churn counts” from “Total Subscriptions”. Create a lag window function to calculate last month’s “Total Subscriptions”. Calculate the “Churn Rate” by dividing Churns by Total Subscriptions minus Churns.

churn2 AS (
SELECT
  a.month AS month,
  a.total_subscriptions,
  a.new_subs,
  a.monthlyrevenue,
  a.arps,
  LAG(a.total_subscriptions) OVER (ORDER BY a.month) AS subs_last_month,
  c.churns,
  ROUND(SAFE_DIVIDE(c.churns,(a.total_subscriptions - c.churns)),2) AS churnrate
FROM arps a
LEFT JOIN churn c ON c.churnmonth = a.month
ORDER BY month)

LTV = (MMR / Number of Subscriptions) / (Churned Subscriptions / Subscriptions Added in the Last Month):

Take all the calculated metrics and add the LTV calculation.

SELECT
month,
monthlyrevenue,
total_subscriptions,
new_subs,
subs_last_month,
churns,
churnrate,
arps / (churns / subs_last_month) AS ltv
FROM churn2

Here is the full query on GitHub!

Here are the results for my data:

We can also directly chart any time-series data with valid date values right in BigQuery with only one click!

I like BigQuery for its wealth of features like Sandbox (which I wrote about here), query scheduling, visualizations, and integrations with Google Sheets, Looker Studio, and Google Analytics. Being able to easily pull this data into BigQuery, get up and querying in a matter of seconds and all without a single credit card is pretty awesome.

Like and share!

Leave a Reply

Your email address will not be published. Required fields are marked *