Joanna Lin is a data reporter for The Center for Investigative Reporting (CIR). Previously, Joanna helped launch FairWarning, a nonprofit online publication covering safety, health, and related issues of corporate and government accountability. She reported for the Los Angeles Daily Journal and Los Angeles Times, where she covered breaking news, religion and legal affairs for the metro and national desks. She also covered public health issues for California Watch.
How did you get into data journalism? Was there a moment when you realized that working with data would be important in your career?
I got into data journalism really by necessity. When I was a reporter at CIR's California Watch project, I often wanted to write stories using census data and would have a miserable time trying to figure out things like how many cities had certain types of people. I would open up spreadsheets and basically try to count the number of cells that met what I was looking for by eyeballing them – a tedious, error-prone process! I knew I needed a better way to do these stories. I was fortunate to work with the very knowledgeable and generous Agustin Armendariz, who is now at The New York Times, and he taught me basic functions in Excel. He and my editor at the time, Mark Katches, suggested I go to CAR boot camp. I went and came back evangelized.
In addition to giving me technical experience, working with data has allowed me to report and imagine stories that would otherwise be impossible – or, at the very least, be so impractical that I would never do them. It has also taught me more precise ways of thinking. That's been important in all the reporting I do – and in life in general. I got married last year and used spreadsheets and pivot tables for all my wedding planning!
What are your go-to tools / programs when working on a story that involves data?
I am always encountering new problems that require learning new tools! I write all my code in Sublime Text. I use Google spreadsheets a lot to track data- and document-heavy projects – what I have, what I need to do next. I use csvkit often to clean files. I use Postgres for data analysis and run queries on Navicat. I use Python to help me run repetitive tasks like loading and dumping a database over and over again, scraping a regularly updated spreadsheet or parsing thousands of names. I recently started learning Django so I can house a project that involves entering and connecting a lot of data. There are also tools that I don't need often but am always grateful for when I do, like Cometdocs and regular expressions.
How do you get from an idea to a data journalism story? How do you find relevant data?
"Working with data has allowed me to report and imagine stories that would otherwise be impossible – or, at the very least, be so impractical that I would never do them."
At CIR we focus on stories that almost always involve systemic problems. We think about how and when problems can be measured. Is it when someone enrolls in a public program? Files an annual report? Sells a product or gets inspected? We look for who is involved at each stage of a potential problem and what information they gather. The data doesn't always look like "data" that's ready for you to analyze; it could be in paper forms or emails, or it could come from different sources in different formats, and it could have no numbers at all. We often have to organize information into data.
What do you think is a good way to integrate data with stories?
Data is like any other source in your reporting – a human you might speak to or a document you might read. Sometimes a source gives you background information or directs you to other sources to further your reporting. Sometimes a source gives you great quotes that highlight or contextualize a problem. Data can do all the same things, so integrate data in your stories like you would any good source.
What’s your advice for newbie data journos? Do you have specific dos and don’ts?
Do: Meticulously document your data work! Would you go to an interview without recording or taking notes, and then expect to remember everything afterward? The same goes for your data. I write notes for myself like I'm leaving instructions for a stranger who needs to recreate what I did from scratch.
Don't: Feel that you need to be part of an official newsroom data desk to be a data journalist. There are always opportunities for you to use data in your reporting. Working with data regularly, even just a little bit, also helps you maintain your skills and learn new ones.
NICAR Database Library student Jinghong Chen interviewed Lin.
This interview has been edited for clarity.