Sympathy For The Weatherman

Sep 30, 2016

BY RYAN BOWER – SR. DATA SCIENTIST

This year to end our summer, our family loaded the car and headed to the Outer Banks for a relaxing week at the beach. Unfortunately, as we were heading for Nags Head, tropical storm Hermine was heading there too. When I am at home, I will usually just do a daily umbrella-check of the weather on my phone. But on our vacation, I became a Weather Channel junkie as our family watched the storm swoop in to ruin our perfect beach weather.

“This is ridiculous!” my wife exclaimed, looking at the forecast one evening, “the rain will end in exactly 33 minutes, but for every hour through the rest of the night there’s a 50% chance of rain.” This combination of overly specific and wildly vague forecasts is tough to understand. Is there really a 50% chance of rain throughout the night? Or is it really 100% chance of rain for half of the rest of the night? Or is each individual hour its own 50/50 dice roll where it could, theoretically, not rain at all?

Thinking more about this forecast though, I had a bit more sympathy for the problem faced by the weatherman. It would be easy to deliver the forecast if we knew exactly what weather was coming. But weather is unpredictable and sometimes changes. Not only does the weatherman have to tell us what he thinks the weather will be, he also has to tell us how confident he is in that prediction. What seems to be a simple weather predicting problem is just the type of problem a Data Scientist faces every day: how do you convey complex and predictive information to an unknown audience?

Of course, the forecaster could always just stick to weather emojis. The weather app on my phone consists of a couple pictograms of the next day’s weather and a temperature. These provide a simple and effective forecast, but ultimately, many of us want to know more details. The simple averages won’t do the job. Just like an analysis of data that merely scratches the surface, this forecast has limited usefulness. More often, we want the most accurate and complete forecast possible. Currently, I want to know if it will rain, how much it will rain, and when—if ever—it will stop raining. Down to the hour, if possible.

Ideally, everyone looking at the weather forecast would be a meteorologist. The weatherman’s job would be far simpler if we all understood the nuance of complex meteorological charts. But expecting the audience to understand all the details of niche charts and graphs is asking too much.

This is a trap that is so easy for Data Scientists to fall into. It would be so much simpler for us if we could just present ideas and insight in a way that makes the most sense to us. When we fall into that trap, we end up talking over the head of our audience (keep in mind, a very intelligent audience in their own areas of expertise, just not data science) and can actually add more confusion to the situation than clarity. We want to give a complete and accurate forecast, so our instinct isn’t necessarily to provide it in layman’s terms.

So what does a good forecast look like? Typically, the weatherman starts with a high-level overview over the next several days. So far, so good. Then he dives into any alerts for severe weather that might be approaching. Barring anything extraordinary, he segues to a more detailed forecast of the next day or two and then a radar map to show neighboring areas. But here’s the important part: a good forecast comes with a description. There is a story involved, so that we know rain is coming sometime, but that there is some uncertainty as to when or how much. If you listen closely, you’ll hear a lot of “probablys” and “likelys” in there. He’s not giving us any guarantees—he’s just making educated guesses based on the information he has.

This is also, incidentally, the key to a good data analysis. The big picture is important to conveying the problem. The details may be useful, but it’s the narrative that holds the analysis together. I guess maybe the weatherman is doing an admirable job of presenting uncertain information. Maybe as data scientists, we could learn something.

But honestly, right now I’d settle for a simple forecast with just a sunshine emoji.