Beyond heuristics: Algorithmic multi-channel attribution

  • Looking to improve your attribution modeling for better insights into your analytics? Columnist David Fothergill shares his method, which uses a Markov model to determine the relative value of each channel in your conversion path.


    Marketing attribution continues to provoke many discussions, theories and debates these days. As stated in the intro to Christi Olson’s recent Search Engine Land column, “Proper attribution modeling is one of the biggest challenges facing marketers today.”

    Along with difficulties caused by holes in data (such as connecting user journeys across devices), the oversimplification of traditional first- and last-click models is highlighted.

    These models fall into a class of “heuristic” rules. A heuristic, by its nature, is a simplification of problem to more of a “rule of thumb,” removing complexity in favor of a quick analysis. In the case of attribution modeling, this means assigning values across positions in the chain, regardless of actual impact on the completion of a sale.

    The step beyond this is to algorithmic attribution — complete analysis of the available data to determine the true impact of a given touch point on conversions. Rather than “shortcutting” and applying a blanket position or time-decay rule, algorithmic attribution involves having a custom model and weightings for each touch point based on your own user dynamics.

    Alongside a truer picture of channel value, the deeper understanding provides a starting point in progressing away from descriptive analytics towards the realm of predictive and prescriptive analytics. See Garnter’s useful visualization of the types of analytics and their value:


    Getting this fuller understanding of channel influence is a key step in moving towards both predictive and prescriptive analytics (as opposed to descriptive, which merely tells us what happened historically).

    This is something you may have encountered if you are a Google 360 (or formerly Adometry) customer through their “data-driven attribution” feature.

    Let me illustrate by showing an example approach, which uses a Markov model.

    What is a Markov model?

    In simplest terms, Markov chains are based on modeling the probabilities of transitioning from one state to another. An example would be forecasting tomorrow’s weather: if it’s sunny today (current state), what are the probable weather outcomes tomorrow (future state)? We can visualize some probabilities in the format below:


    If it’s sunny today, there’s a 10 percent chance of transitioning to a “rainy” state tomorrow. If it’s rainy today, 50 percent of the time it will be sunny tomorrow.

    How does this tie to marketing attribution?

    Good question! Let me explain:

    We can consider our conversion journey as a series of states based on the referring channel. In other words, we have various states and paths such as:


    As with the weather, we can crunch the data we have on these paths (easily accessible in Google Analytics, AdWords and so on) and get some probabilities for transitioning between the channels, towards a successful conversion.

    An example analysis

    Our first step is to break this down into individual “transitions” so that we can do a simple count of occurrences. Then, based on number of instances of a transition vs. possible transitions, we can work out the individual probabilities:


    Plotting this out into a graph, as per the weather example, gives us the following:


    For example, we can see that if the last visit was from the organic search channel, there is a 40-percent likelihood that this is followed by another visit via this channel, 20 percent that the user is never seen again (the Null node), 20 percent that the next engagement is via PPC and 20 percent that they convert at this stage.

    But we’ve still not cleared up how this helps with the attribution problem. Well, the application is based on assessing the impact of removing a specific node (or “channel”) and using our remaining probabilities to assess the drop in completed conversions in the absence of this touch point.

    We are essentially weighting on the basis of the “removal effect.” If the CPC channel is null, our channel interactions are simplified to the following:


    The “removal effect” of dropping the CPC touch points is dropping one of our two measured conversions.

    If we do the same process for the organic channel, our effect is slightly different. Neither of the conversions actually occurs due to the impact of the organic channel on the conversion completed by the CPC channel — so the “removal effect” is the reduction to zero conversions:


    We can then just calculate out a weighted score on the basis of the relative removal effect across the channels, thus reflecting the greater contribution of the organic channel:


    In summary

    Obviously, it’s a non-trivial problem once you step beyond “toy” examples such as above, but the hope is that understanding the inside of this “black box” is useful for anyone working (or considering working) with an algorithmic approach.

    Additionally, we didn’t address the inherent problems of getting “clean” channel data as highlighted by Olson (cross-device and so on), but hopefully, this served as a good introduction to the concept of algorithmic vs. heuristic models.

    A note on implementation:

    Much of my analysis would be done in R, and there is a great package, ChannelAttribution, which does much of the heavy lifting. And incredibly usefully, there is also a web-based version with some dummy data to illustrate how this approach compares to the standard heuristic models. The format of data required also means it’s possible to upload pretty standard outputs from Google Analytics.

    [Article on MarTech Today.]

    Some opinions expressed in this article may be those of a guest author and not necessarily Marketing Land. Staff authors are listed here.


    Marketing Land – Internet Marketing News, Strategies & Tips