Don’t Fear The Model
BY LIAM HANHAM – DATA SCIENCE TEAM LEAD
I’m a big fan of documentaries. One that has stuck with me for a long time is called “The Interrupters.” The documentary follows a group of three violence interrupters who work for a social outreach program in Chicago called CeaseFire. The goal of CeaseFire is to identify at-risk youths and young adults who are at risk of joining a gang or are already part of a gang, and show them nonviolent paths to resolving problems.
By design, the violence interrupters are former gang members or people close to gang activity who have removed themselves from that lifestyle. The idea here is that people who have been through a similar experience are much more likely to be able to inspire those currently living a violent and dangerous lifestyle. It worked astoundingly well. In 2004—after 4 years of working in 15 communities and expanding from 20 to 80 workers—homicides in Chicago dropped by 25%.http://www.huffingtonpost.com/author/tio-h-124
Since then, CeaseFire has grown to be part of an international organization that helps communities in a variety of ways, but it still maintains its roots in social “boots-on-the-ground” outreach programs. Given the great success of CeaseFire and organizations like it, the city of Chicago built a model (Strategic Subject List (SSL)) to help law enforcement identify at-risk people in all Chicago neighborhoods. The goal was to take a successful model and optimize it with machine learning and apply it at scale with an established organization—the Chicago Police Department (CPD).
We see this all the time in businesses we work with—we have an established process, but we want to supercharge the process with data and machine learning. We love this kind of work because it’s exciting and innovative. However, there’s a misconception that a lot of companies and people have—that the machine or machine learned algorithms are themselves the solution. What’s missing is that the execution and strategy are vital to a model’s success. The model in a vacuum really isn’t effective.
The CPD made that same mistake when they deployed the SSL model. The SSL model was built with the understanding that social workers would be deployed to areas of high concern (as per the model established by CeaseFire). But instead of deploying social workers to areas of high concern, the SSL was used to help identify suspects in possible crimes.https://link.springer.com/article/10.1007/s11292-016-9272-0
To some outside observers there are some obvious moral and legal transgressions on the part of the CPD when we look at the results from this angle. However, the findings of what went wrong with the SSL model hits on one key theme I would like to call out as important to all businesses.
“Overall…there was no practical direction about what to do with individuals on the SSL, little executive or administrative attention paid to the pilot, and little to no follow-up with district commanders.”http://www.theverge.com/2016/8/19/12552384/chicago-heat-list-tool-failed-rand-test
The SSL model was never intended to be used this way and when it came to the attention of the CPD, they took steps to correct the processes involved. Regardless of the corrections that the CPD has made to their process (and continued optimizations made to the model), there will always be some fear in the public about how this model was used incorrectly.
This parallel can be drawn across any organization. A model built by an analytics team to be used by a marketing team in a direct marketing campaign, could subsequently be “tweaked” by the marketing department to help them with an email campaign. However, the model was never intended to be used in this way and could provide the marketing team with a bad dataset that could produce poor results on their email campaign.
This can in turn create a cascading effect in the organization—the marketing department doesn’t trust the model and the analytics team doesn’t trust the marketing team to use the model correctly. To avoid situations like this it’s important to get agreement on the question the model needs to answer. It is also important to roll out a strategy that the model can be wrapped in once the model is built. This can all be accomplished by clear communication between the builders and users of the model. The builders of the model should be sharing how the model is being built throughout the process of building it. The users of the model should also be discussing ways in which they intend on using the model with the builders of the model.
Machine learning is often referred to as a “black box”—I’m definitely guilty of this—but it’s time to stop referencing it as such. We know exactly what we’re asking the algorithms to do and we should know what the expected outcomes are. The more data scientists and marketers work together on developing innovative ways to solve problems, the better those solutions are going to be.
Footnotes [ + ]