Avoiding false positives, measuring the true incrementality of your performance marketing vendors

How do digital marketers measure the true impact of their advertising vendors? The most commonly cited solution to this problem is incrementality testing via an A/B Test.  Lets have a look at a typical case. An advertiser wants to compare 2 Display ad platforms, DSP A and DSP B to ascertain their impact on sales. Audiences with similar characteristics are split evenly and randomly amongst 2 groups.

The clients’ analytics platforms like GA or a Mobile measuring platform verifies DSP A’ brings revenue worth $200, and ‘DSP B’ generate $260. Thus, since DSP ‘B’ gives a 20% higher ‘lift’ in conversion value, it wins the test.

While the results may seemingly be very clear and decisive, they may often be misleading, as vendor performance may not be consistent or may not meet your objectives over time.  There several factors beyond our control amplifying the challenge of testing like:

  • Cross-device activity: We can assign unique IDs to users through cookies, but with cross-device activity, there is a risk of assigning different IDs to multiple devices owned by the same user who could be in both group A and B, hence resulting in measurement inaccuracies.
  • Privacy features: Apple’s ATT privacy features are also not making things any easier as roughly 20% of the users actually Opt-in to sharing their IDs.
  • Cookies: Most of the tracking today relies on cookies (fading away soon) to identify users on Web.
  • Miscellaneous: But one might argue there are so many factors affecting sales not just the ad, for example, seasonality, product discounts, stock outs, external advertising(billboards or tv or search) that the client is running and these problems could be shared unevenly by the ad vendors involved in the test.

So what are we to conclude? Is giving up testing better? Absolutely not, the best organizations are always testing, and refining their approaches.

Know Your Vendor Tech

What makes up an ad tech partners’ technology:

The Bidding mechanism that decides which users to bet on by scoring them on the basis of conversion probability, and the bid amount.

The Recommendation engine whose job is to score products on how relevant they are to a user, and also decide which formats will generate the desired result.

While vendor technologies may sometimes sound like the ultimate future *like they have come right out of a sci-fi movie* – the ultimate decision would should ensure they are consistent, scalable and profitable as a long term investment & partner.

Avoiding the one hit wonder

Taking a holistic and methodological approach:

 

  • Overlapping test sets: Can we exactly know that the audiences are split evenly yet randomly between the two test sets? Could luck and chance be a deciding factor to your budgets allocation? For practical purposes, running the test on overlapping sets is a viable option. The vendors essentially work on the whole audience which may include existing customers, new visitors (including cross device users)

 

  • Evaluation & Goal Setting: The vendors may or may not need to know about the test depending on the advertisers preference. You can also set goals on how you would be evaluating vendors, for example, more weightage would be given to bringing new purchasers vs existing

 

  • Longer testing duration: If the game of testing is played long enough, we can get a realistic idea of the best performing vendor. A duration between 3 to 6 months will give the advertiser the best chance of arriving at the right conclusion

 

  • Frequent Testing: The chance of the results being random is always there, hence conduct these tests frequently to eliminate false positives

 

  • Test during non-peak seasons: Seasonality can over represent the contribution of channels due to higher conversion rates as users have high intent because markdowns, holiday shopping etc

 

  • Data set: And most importantly, run it on a large population as opposed to a small one