I Wouldn’t Bet On It: Predicting Sports Outcomes with Big Data
A number of friends asked me to use my predictive analytics knowledge to predict the outcome of the World Series this week. As I write this, Kansas City has tied the series at three games apiece—who will win Game 7? It is probably best that I withhold my analysis until after the game. It turns out we’re not very good at predicting sports outcomes. But why? Let me start with basketball, which also is opening its season this week.
Looking back to the NCAA Final Four basketball tournament of 2014, analytics professionals like myself were confidently predicting the Florida Gators would easily win another national title. The primary measures correlated with winning basketball games—effective field goal percentage margin, turnover margin, rebounding margin and free throw margin—were historically excellent for Florida, as were the low variances of these four measures. The Gators were consistent and confident against opponents of varying styles and records. Florida had won 25 straight games and had been to the Elite 8 the past three years straight. This was a no-brainer.
A week later, UConn beat Kentucky for the title.
How could I have been wrong on this one? Sure, I might have been biased – I taught at the University of Florida for several years and served on several church committees with Coach Billy Donovan, who I respect highly. However, as an analytics professional, surely I could lessen the impact of such biases. Surely my prediction would be informed only by data.
There are two primary obstacles to predicting sports outcomes.
- By nature we are emotional creatures and ultimately make decisions based on emotions. After an initial decision is made, we justify it with rational thinking.
- We often impose a ranking on things that are not transitive.
Rankings: A Structural Problem
Consider a particular set of basketball outcomes from the 2012 season below.These games are often called upsets, but I maintain that there really aren’t upsets in sports. The problem comes that we believe there should be a linear ranking, and when a violation occurs, we give it a name: upset. The problem has already occurred – there is no linear ranking. Our assumption is invalid.
Consider the NBA point guards Devin Harris (DH), Tony Parker (TP), and Steve Nash (SN). In direct matchups, DH > TP, TP > SN, and SN > DH, where “>” means “outplays.” There is not a best point guard, yet we feel like we should be able to rank them and determine the best. It makes for good talk radio.
In the 2014 NBA championships, San Antonio barely survived early playoff rounds to reach the finals, then dominated one of the best NBA teams of all time in LeBron James’ Miami Heat. San Antonio simply matched up better against the Heat, and were in fact architected solely to match up better against the Heat.
Sports are best modeled as a set of matchups rather than through macro statistics such as field goal margin or turnover margin. This means a different approach to modeling, such as using agent-based simulation as opposed to descriptive statistics, but the data to drive such simulations is becoming available. These are getting to be pretty good data days.
Of course, what we have learned from the most recent brain scanning research is that humans make decisions based on emotions. After a decision is made, the brain immediately kicks on the rational side to justify that decision.
We also tend to persist the memories that we like (think Indiana > Kentucky above in an exciting game at Assembly Hall in Bloomington). We are passionate about our sports of course, and thus we tend to make emotional decisions based on those passions, and then justify them with the analytics we have chosen to remember. This process leads to the inevitable result – we’re not very good at predicting sports outcomes. We are emotionally attached to a team and we tend to bias our memories toward a history of success, discounting the poor showings as “upsets.”
What can we learn from this for business? It comes down to matchups, or context. There is no best car, no best college and no best climate for everyone. There is a small context in which we can rank, but the context is generally much smaller than the easily collected macro statistics would suggest. This is where Big Data can be helpful, as we can filter out the noise of irrelevant contexts and focus on what has happened in the context most meaningful to us – the matchup. This is the essence of one-to-one marketing.
We are wired to make decisions based on emotion. That will not change. The promise of Big Data is that we can be less biased in the rational support we give our decisions. Perhaps Kentucky is not the evil empire I was taught to hate while a student at Indiana University? I’ll have to check the data again.
Join the conversation, comment below!