Do Analytics Have Limits?
BY ROB CAUSEY – DATA SCIENTIST
Two years ago, while I was earning my Master’s Degree in Analytics, I saw Nate Silver speak. At the time, I had recently learned about applications of analytics in myriad industries and my mind was racing with the possibilities. Self-driving cars, medicine development, and Moneyball were all bouncing around my head. With data exponentially growing every day, I felt confident that the only thing the world lacked was a visionary like Silver to unveil the god-like powers of data—to cure every disease known to man and to predict virtually everything that would ever happen. I’ll admit that I was a bit naïve. Instead, what Silver did was dispel this very notion. When something fixes everything, check again! You might be overlooking a fatal flaw in your thinking.
I had looked to Silver as a visionary who would teach me the secrets to ruling the world with data. Why would he qualify to rule? Nate Silver is the statistician who correctly predicted which party each state would vote for in 49 out of 50 states in the 2008 presidential election, followed by all 50 American states in 2012. Silver is also the founder and editor-in-chief of the now ESPN-owned analytics blog, FiveThirtyEight. In reading his book, The Signal and the Noise, I have reflected on the talk that he gave and how both have taught me the opposite of what I expected to learn from him: do not underestimate your own biases because there are no true perfect predictions.
Nate Silver’s accurate presidential election forecasts and the movie Moneyball make it natural to think that we might be able to tackle a plethora of other problems in a similar way to Silver’s forecasts and Billy Beane’s analytics-driven baseball team. However, it is very possible to mispredict, oftentimes in the form of mis-predicting our own ability to predict. Not everything is as contained of a setting as these two scenarios that have received so much media acclaim. Silver points out how our own biases can lead us to see false signals in the noise. More data means more noise and an easier way to express bias by finding what we were already looking for. However, if we are careful, collaborative, and self-aware, we can still catch the signals by remaining wary of our own biases.
You might be asking at this point how Big Data has actually failed us. “Big Data” could easily be described as data that exceeds our ability to process or manage it, and an earlier form of this took place with a space shuttle mission in the 1980’s. Although a similar mission had succeeded prior and all of the data were there for another success, the sheer volume of data prevented us from mining out the signal from the noise before the mission failed. In 1936, George Gallup correctly predicted the outcome of the FDR election with a sample of about 50,000 people, while the Literary Digest missed severely with a sample of well over 40 times Gallup’s sample size (2.4M). What caused this conundrum? Gallup’s sample was representative of the American population, while the Literary Digest’s considerably larger sample suffered from the bias of a disproportionate representation of a particular sect.
Both of the prior examples happened decades ago, but even in more recent times, Big Data has proven to be less foolproof than we’ve cracked it up to be. Google was thought by some to have displaced the usefulness of the Center for Disease Control (CDC) when it launched Google Flu Trends in 2008 to predict flu outbreaks by locating the states in which users searched for health-related terms. Google’s model was initially successful in not just matching the CDC’s predictions, but in making them over a week faster. However, the model had largely deteriorated in five years, while the slow and the steady won the race. What had been a major selling point turned out to be the Google model’s defeat: very little human intuition was used in the hopes that the data would speak unhindered by human bias. However, so little human thought was used that the model’s makers hardly even understood how it really worked. The model could effectively rely on an abundant data supply, but without humans to interpret it, it could not survive the trials of time.
In graduate school, everyone got excited about nerdy subjects like random forests, support vector machines, neural networks, and ensembles; in a sense, we were biased towards topics that would turn off your partner on the first date. I remember hearing the quote: “Before dealing with ‘Big Data,’ we all need to get better at dealing with small data first.” At Elicit, I was shocked that people were not raving ballads about Hadoop and how the world would soon sink away forever from NoSQL databases and into parallel processing. This quote came to mind and I realized that before digging too deeply into machine learning algorithms, it was imperative to validate what we already understood about the data and to try to find ways to connect that into business insights. My own biases had been preventing me from seeing the world outside of Hadoop, but they have since been overcome in a culture of cross-collaboration, information sharing, and tool agnosticism; challenging one another creatively breaks through some of the biases.
It is often our practice at Elicit to generate and present multiple solutions, rather than hyper-focusing on the one that we think to be the answer. Nate Silver had a good run of superior predictions, but cautioned that humans love to predict things and underestimate how bad they are at doing so. A creative culture like Elicit’s helps to offset some of the bias of the individuals by pooling together new perspectives from the team. It is a delicate balance to utilize our humanity for our insight, while being wary of our bias, but it is one that underscores the importance of camaraderie in creating a variety of perspectives. We look at many answers and use many tools, being careful that the insights we form have passed through a varied series of creative minds before they are finally melded into action. At Elicit, we see the salience of Silver’s words as we challenge each other’s solutions every day, incorporating more tools and better techniques. It pays to be encouraged to think creatively and freely when the name of the game is avoiding individual biases!