Rachael Bale worked at The Center for Investigative Reporting on environmental stories about pesticides and mining, crime and justice stories, among other topics. She also worked as a freelance reporter for KQED public radio, the Bay Area’s NPR affiliate. She just moved to Washington D.C., where she intends to freelance. In 2012 she attended the NICAR's boot camp. Our interview follows.
How did you get into data journalism? Was there a moment when you realized that working with data would be important in your career?
I got interested in data journalism while I was covering campaign finance during the 2012 election at the Center for Public Integrity. I got tired of handing over the Federal Election Commission data to the data reporter to clean and analyze. I wanted to do it myself. So I signed up for NICAR boot camp. When I got back, I was able to do a lot of analysis myself. And when there were things I didn't know how to do, a more experienced data reporter and I would sit down together, and I'd learn.
What are your go-to tools / programs when working on a story that involves data?
I primarily use SQL Server or Access, depending on what computer I'm working on. And of course, Excel. I love pivot tables.
How do you get from an idea to a data journalism story? How do you find relevant data?
I don't necessarily set out to do a data story. I start with a question about a specific incident or story that I want to learn more about. Then I start asking what kinds of data might be associated with it. Does a government agency (or multiple agencies) track this sort of thing? Does any organization collect information on this topic? If there's paperwork filed, there's a good chance someone has a database somewhere where the information from that paperwork is digitized and tracked. That'll give you the big picture. The next step is finding a human story that will help readers care about all those numbers.
"I don't necessarily set out to do a data story. I start with a question about a specific incident or story that I want to learn more about."
In your story "More than half of those killed by San Francisco police are mentally ill," you started with a story and then used a "By the numbers" section to introduce the data, do you think this is a good way to integrate data with stories?
"By the numbers" sections of stories usually don't work because they tend to cram a lot of numbers into a few paragraphs that interrupt the flow of the narrative. The reader's eyes will glaze over. Integrating numbers with the narrative of the story is the smoothest way to present it. If I really want to single out the data, I'd prefer a sidebar with bullet points, an infographic or some other visual to emphasize the main data findings.
For example, with CIR's pesticides investigation, we had tens of millions of pesticide records from the state of California. Much of the data analysis, which was performed by others on the data team, was integrated into the main investigation narrative. But we knew there were more hidden gems. I mined the database for weeks, until I'd pulled out five strong facts. We put those in a standalone story that was essentially, "here are other interesting things the data told us." I think with good formatting and visuals, that kind of purely data-driven story can work well.
What are the important do's and don'ts for aspiring data-driven journalists?
I couldn't have gotten into data reporting without the NICAR boot camp. I was (and still am) afraid I'd mess up the data beyond repair. The boot camp taught me the skills I needed to avoid pitfalls, vet the data and backstop the numbers every step of the way. Data reporting, at least for me, isn't a solitary endeavor. There's always another person in the newsroom who is familiar with the data I'm working on. That person can call me out on misinformed assumptions and draw my attention to potential missteps. We can bounce ideas off each other, and I can get ideas for different ways to look at the data.
NICAR Database Library student Jinghong Chen interviewed Bale.
This interview has been edited for clarity.