The Danger Of Prediction Without Explanation

Dec 08, 2017


Data scientists and analysts are leveraging increasingly sophisticated tools to create models that are better and better at prediction. Take a look at Kaggle, for instance: its members, for the most part aspiring data scientists, are building ensembles of ensembles, mixing state-of-the-art machine learning with more traditional statistical modeling, and producing models with startling predictive accuracy.

On the one hand, it is very impressive that Kaggle members are able to achieve this kind of accuracy. And a Kaggle prize is nothing to sneeze at. On the other hand, the environment that Kaggle creates for these competitions is totally artificial. If these contests were actual client engagements, the winners would be hard-pressed to get buy-in from the business. And the most accurate model in the world isn’t going to create any value unless the business believes in it. In other words, most of the Kaggle-winning work would end up being useless.

And why am I so skeptical that these models would fail to get buy-in? Unfortunately, in practice there’s a tradeoff between predictive accuracy and explanatory power. These highly accurate models end up becoming very opaque: sure, they can predict an impending change with impressive accuracy, but if you ask the model builder why that change is going to happen, or what you can do to stop it, you will be met with blank stares. A model can be highly predictive, but still do little to explain what’s happening in the business.

Which brings me to my main point: everyone is talking about prediction, but when it comes to solving business problems, explanation is usually the more important factor.


In June of 2014, Criteo, an ad retargeting company, ran a Kaggle competition. The question: given a user and the page he or she is visiting, what is the probability that the user will click on a given ad?

In a style that has since become typical on Kaggle, Criteo provided traffic logs with myriad features, all unexplained apart from their types (integer, double, binary, categorical, etc.). The response was the only thing that could be reasonably interpreted: did the user click or not?

It might be the case that Criteo only wanted a more accurate model so that they could better predict the click-through-rates of the ad inventory that is purchasable on the exchanges. Selling retargeting media is really a pricing problem: you need to buy the inventory from the exchanges for less than you sold it, and the difference is your margin. If you are purchasing ads by the impression from publishers on the exchanges, and selling clicks to your advertisers, then a good CTR model (which is precisely the outcome of the Kaggle competition) can help Criteo make money in the short term.

Long term, however, there are some problems with this approach. A model like the ones that won the Criteo challenge can’t help you understand why users prefer some ads over others, or are more likely to click on some pages over others. Indeed, one can imagine some very spammy pages that would have a very high predicted CTR, but the model would do very little to help you understand why, or whether it was a good customer experience.[1]You don’t have to look far for a similar situation. In the mid-aughts, AdWords arbitrage was rampant: savvy PPC buyers would purchase cheap AdWords traffic and send it to pages loaded with AdSense ads, selling Google’s cheapest traffic back to it at a positive margin. Google’s had to add a new landing page quality score to fight back algorithmically against these landing pages. A model like this also won’t help you understand how your advertisers can improve their creative to get a higher CTR. This is something that AdWords has spent a lot of time figuring out, and does extremely well. Their model, which heavily relied on understanding what made for great, relevant creative and giving advertisers feedback, helped them move the entire industry from a flat CPM model to a PPC auction model, and massively increased their revenue potential in the process.

And as I mentioned before, even though the winners of the Kaggle competition were very good at predicting the CTR of ads, they likely still would have had difficulty selling it to the business in the real world.


I’m not sure if Criteo implemented any of the solutions, but put yourself in the shoes of the COO for a moment, and skim through this.

Alright. I’ll give you a moment…

Great. Now you’ve looked through it. You understand everything perfectly, right? How about we roll this thing out across the system? Theoretically, it’s going to be great for ROI!

/sarcasm. A good CTO wouldn’t be ready to roll it out just yet, not when we can’t explain which ads perform best, or why. Which users click the most, or why. Don’t get me wrong here. It’s some very impressive data science work. Their approach actually constitutes a new type of model. But it’s blind modeling exercises like this, promoting prediction at the expense of explanation, that are driving some of the worst abuses in digital advertising right now. And implementing them without understanding, optimizing for an outcome without understanding why or how the model drives toward that outcome, is precisely how that happens.

Footnotes   [ + ]