How to use Snowplow Analytics for marketing attribution
A deep dive into the mechanics of a marketing analytics project
Last week I wrote about my new gig: Head of Data, Analytics and AI at Snowplow Analytics. It’s pretty exciting!
Although I did say that that this newsletter isn’t going to become about all things Snowplow, and that’s true, this edition will however be the exception …
Let’s talk marketing attribution
Today I would like to discuss a topic that I am particularly interested in: how to use Snowplow Analytics to build a very powerful marketing attribution system. If you're not familiar with marketing attribution, then this is the technique for assigning some criteria of success to different marketing channels, ideally down to the level of individual campaigns.
This newsletter will discuss the mechanics of creating a marketing attribution system that integrates event data from Snowplow with data from backend production systems, third-party marketing providers, affiliate marketing services, and internal resources like … spreadsheets! Lots and lots of spreadsheets. (I’m sure you are familiar with spreadsheet bloat). Through such a method you can provide your marketing team with multiple attribution models that drill down to the individual campaign level and provide the customer acquisition cost (CAC) and return on investment (ROI), while also allowing them to roll everything up into their own custom channel calculations.
Pretty impressive, huh?
But how to get there?
The background
There are tons of frontend tracking tools on the marketplace, but perhaps my favorite of all is Snowplow’s paid service, the Snowplow Behavioural Data Platform. Why have I chosen to become a Snowplow customer in the past?
Data ownership: You can run Snowplow as a first-party service on your own cloud infrastructure
Data visibility: You see data at all stages of the process - no black boxes!
Data quality: Built-in QA and data validation tools
Data enrichment: No limits on how you can customize data
One data pipeline for many different platforms (app, web, server, email, etc)
Price: Extremely competitive with some of the better-known players
Snowplow is better known for its open source offering, but the paid version is an ideal choice if you don’t want to devote lots of DevOps resources to setting it up and maintaining it, plus they run the pipeline for you (big deal!) and there is a lot of help with setup and ongoing support (and I can confirm that the support is really excellent in comparison to some other analytics services I’ve used).
One of the things I like most about Snowplow is how incredibly extensible it is; you can do all kinds of interesting things with custom events and custom contexts. For the sake of this discussion, however, I am not going to cover that, I will only talk about what you can do with the standard implementation of Snowplow, what my friend Matthew Brandt calls ‘Essential Tracking’, namely those events that are automatically tracked, such as page views, page pings, link clicks, and form events.
Facing the final boss first
Let’s talk about a scenario where you have a variety of business needs to service, and limited resources to do so (quick side note: do consider using Snowplow’s in-house dbt packages for data modeling), so you have to make a decision where to focus first.
Some more information about this scenario:
The marketing department has limited visibility into the effectiveness of its campaigns
Large amounts are spent monthly with minimal understanding
Many different channels where money is spent
So marketing attribution is the logical first step from a business need perspective. However, this is one of the hardest problems in analytics, so doing this is like skipping straight to a video game final boss …
Probably the most common marketing attribution use case would be e-commerce, and this funnel can be relatively straightforward:
User sees search or social ad, clicks to website
User hits product page, scrolls around a little
User adds item to shopping basket
User checks out and pays
Depending on how smooth the UX design is, this can be done in as few as four or five distinct clicks, meaning that reconstructing the customer journey is relatively simple.
Not all funnels are so simple, of course. Depending on the product, you might have a scenario where customers can come and go from the onboarding process, switching devices and platforms as they go, and taking days or weeks or even months to successfully complete the process. SaaS vendors, in particular, can have extremely long onboarding cycles.
This becomes even more challenging when you try to layer in marketing information; multiple sources, sometimes inconsistent UTM codes, and then making the connection to things like discount codes, referral codes, affiliate marketing services, direct sales, and beyond. It can be very tricky!
How to approach the attribution process?
To kick things off, what would be a good starting point for metrics and dimensions to analyze marketing performance?
Metrics:
Conversion rate
Customer Acquisition Cost (CAC)
Customer Lifetime Value (CLTV)
Costs
Impressions / clicks
Churn
Dimensions:
Channel performance
Individual Source and Campaign performance
Promo codes
Referrals
Landing page performance
The easiest way to do attribution in Snowplow is via a simple last click attribution model based on groupings of UTM parameters; where you only consider the final session of the user before becoming a customer. Depending on the length and complexity of your sales funnel, this can leave you with a very high proportion of sales being attributed to the ‘direct’ channel, with 'direct' being the catch-all term for anything you can't attribute to a specific paid or organic channel. In such a scenario, you might find feedback on the results of such a model to be pretty mixed, as your stakeholders expect a much lower proportion of directs and a more complicated attribution model that factored in earlier marketing sessions. This might lead to a rejection of the results and continued reliance on some other method of assessing campaign performance.
So you have to keep going!
There are varying schools of thought about what is an ‘acceptable’ amount of directs; but realistically your work won't be used unless you can reliably attribute a high proportion of new conversions (customers, for example) to a distinct traffic source. The most famous analytics tool, Google Analytics, uses last touch non-direct as its default attribution model, meaning that it assigns responsibility for the 'conversion' (the result) to whatever was the last channel the customer used that was NOT direct, if one was known. So that would mean that if you made five visits before becoming a customer, with the sources like Google Adwords - Direct - LinkedIn Ad - Google Search - Direct, the model would assign Google Search the responsibility for it.
With Snowplow you certainly can replicate this approach, but it is also possible to go much deeper.
A 40-20-40 solution
One powerful solution is to move to a flexible points-based attribution model, with the default being what I call the '40-20-40' model, which works like this:
Each customer is given a point
40% is attributed to the first touch marketing source
40% is attributed to the last touch marketing source
Remaining 20% is divided by the rest of the touch points
If the last touch is direct the final 40% will go to the previous non-direct last touch
If your conversion funnel involves sign-ups or subscription sales it makes sense to use Snowplow data to check for existing customers and removing them from all funnels, so that you have a correct denominator for calculating conversions. You can do this by doing user stitching, passing a value into the user_id
field in the Snowplow tracker from your backend, and then using the Snowplow domain_useri
d(s) from their first party cookie to connect users at varying stages of the user knowledge path.
Let's get a little technical to show exactly how this works:
STEP 1: Create a user_base CTE with relevant user information
STEP 2: Stitch together first party cookies (
domain_userid
) and return:user_id
where available ELSE first party cookie informationSTEP 3: Stitch in a new CTE and clearly label the different user_identities
Connecting it all together
With Snowplow data you can see where your customers came from, allowing you to build a flexible set of models for the marketing team to work with, such as:
customer attribution using a 40-20-40 last touch non-direct attribution model
user to customer conversion using a last touch non-direct attribution model
session to customer overview using a last touch non-direct attribution model
campaign performance using a 40-20-40 attribution model
You can then link this Snowplow data with other information to help optimise your marketing campaigns. For example, you could import marketing cost data into your data warehouse using a tool like Fivetran or Stitch, and then use the Snowplow marketing enrichment to link behavior to campaigns, and develop a holistic overview of marketing performance. Beyond that, the sky is the limit - you can also use integrate other data sources, such as spreadsheets from marketing and finance, or CRM data, or, well, whatever you can imagine! The end result will be that your marketing team can make better informed decisions, and this will help drive your business forward.
One last (music) thing
Every post ends with a dj mix from my massive back catalogue, and so I will end this post with something that I made in 2013 - 10 years ago! This is a rare mix done on CDJ’s and not with vinyl, and it’s a little thing I made with some of my favorite dance cd’s from the 90’s, featuring some of the biggest electronic artists of that decade, such as The Chemical Brothers, Leftfield, Daft Punk, Armand van Helden, and more.