CAR Conference Blog

Inside baseball: What data journalism can learn from sports

Photos by Travis Hartman

What if you could predict a hospital’s quality of care the same way baseball statisticians can predict a pitcher’s season or a team’s record?

The sports world has long been using statistics in creative ways -- and Ryan Pitts, Jeremy Bowers and Matt Waite say journalists can and should tap into that.

It starts with a single goal: to simplify complex comparisons.

Take basketball for example. Who is better between a player averaging 20 points per game and 3 rebounds per game and a player averaging 15 points per game and 8 rebounds per game?

Enter the player efficiency rating. In it, basketball experts created a calculation that:

  • Adds for good things, subtracts for bad things
  • Adjusts for minutes played and team pace
  • Normalizes against league average

Baseball stat experts have done similar things to develop better measures of the player. The key, Jeremy Bowers of NPR said, is recognizing that a past performance statistic isn’t the best predictor of future performance.

For instance, baseball pitcher’s past ERAs (earned run averages) won’t accurately predict a future ERA -- they are too dependent on things like which ballpark they pitch in and the fielders around them. Baseball statisticians have found that walks, strikeouts and home runs make a much better prediction of how a pitcher will fare, and created a more accurate index.

“The lesson is that banks and hospitals and schools are baseball players,” Bowers said. “They are complex and they have all sorts of complex statistics. When you look at a hospital or bank, there is a fleet of statistics that you need to turn into one.”

In the case of hospitals, it may not be good to explain one year's quality of care index to predict the next. The key is to find an index statistic, and to look for things that might explain variance.

Just like a pitcher’s ERA may depend on factors like where he plays, a hospital’s quality of care may be affected by factors like surrounding demographics or the number of people requiring emergency room care.

In a few cases, these kinds of things have been done in journalism outside of sports. USA Today has created the diversity index, which measures the chance that any two people in an area are from different ethnic groups. Ryan Pitts, for his work with the ongoing IRE Census Reporter project, is attempting to create difference indexes. These would provide a simple way of tracking how the population of subgroups is changing compared to the larger population, and compared to how much they are statistically expected to change. You can see some examples of how that’s done in the slides from the presentation here.

The final concept of the panel was courtesy of Matt Waite, and it was not statistical application but hardware. Waite, who runs the drone journalism lab at the University of Nebraska, explained that drones are already being considered in sports for jobs like tracking player movement on the field. Already, there are tools in baseball stadiums for tracking pitch movement by measuring where it left the pitcher’s hand and where it hit the catcher’s mitt.

And in football, for concussion research, there are tools that can track neurological function in real time. For Waite, that kind of technology offers the opportunity to truly test web development and design, rather than relying on focus group answers.

As Waite said, you may wonder why this is in the discussion of what data journalism can learn from sports. Many of these things, like drones, are currently on the cusp of being implemented in sports. And sports are lucrative -- we spend a lot of money on sports and sports research. So when these things are implemented in real ways, that’s where it will be. And then, like statistical comparisons, they might make their way into data journalism.

Top right: Matt Waite, of the University Nebraska-Lincoln, flys his tiny four propeller drone across the room during his presentation about collecting your own data at the talk titled "Inside baseball: what data journalism can learn from sports"

Middle Right: Jeremy Bowers, of NPR, explains why the baseball statistic ERA (Earned Run Average) is not a good predictor of a pitcher's future success along with Ryan Pitts and Matt Waite.

Bottom right: Matt Waite's drone.

Log in or register to comment on this story.