Cart 0 $0.00
IRE favicon

How to build a bulletproof data story

By Meredith McGrath

Want to make sure your data is bulletproof and fact-checked so there aren’t any holes? Arm yourself with these tips from Tisha Thompson, investigative reporter for ESPN, and Sandhya Kambhampati, data reporter for ProPublica Illinois.

Get organized

When starting out, create a text file or a Word document and record basic information on the project. Make a file folder and name it as the story’s slug. Keep all your work related to the project in this folder, including PDFs of any emails you received from a FOIA officer. Save the raw file of the date here and make a copy of it. Don’t touch the original copy, so you’ll always have the pure data on hand.

Keep a data diary

While it may sound labor-intensive, keep a data log or journal and track the changes you make to your data set. This will help you reproduce your work, and if your data analysis is ever challenged, you’ll have a specific log of exactly what you did. Document how you clean your data before you start cleaning it.

Check for smelly data

When you first get a data file, check to see what’s wrong with it. There’s no such thing as a perfectly clean data set. Always look for holes. Check for totals hidden in the bottom of your Excel file and extra characters hidden in cells. Look for nulls and missing values. Are there any spelling mistakes?

Obtain a data dictionary

Ask the agency that gave you the data for a dictionary, which will define fields. Don’t assume you know what each field is. If the agency won’t give you one, call them out on it and let them know you need this for accuracy.

Slow down, and don’t take shortcuts

After you’re done interviewing a person, you know them inside and out. Know your data the same way. If something stands out as an outlier, be aware that it could be a hidden mistake. Check on this. Be meticulous.

Talk through your work

Find someone who doesn’t use data (maybe your mom or grandma) and show your work to them. Explain what you did and why. Talking out loud helps you identify mistakes. Ask them what they want to know about the data.

Check in with academics

Consult with experts and researchers about your data analysis and any gaps you find. Have them poke holes in your analysis. Keep going back to them, especially if your work is complex.

Refrain from overloading numbers

Don’t overload a reader with a lot of numbers in your story. There’s a point where the reader glazes over and doesn’t think them through. Pick the most important information to include.

Replicate your work

Overestimate the amount of time it will take you to check your work. Re-import your data from scratch. Mistakes can happen while importing data. Have your colleagues replicate your queries. The more eyes that can look over your work, the better.

Stand by what you publish

Be brave enough to stand up for what you put out for public consumption. Add a “nerd box” to your story on your site that explains how you obtained your data and summarizes your analysis.

Meredith McGrath is a journalism student at the University of Missouri.

By Tyler Wornell

The College Scorecard is a database with a treasure trove of data about higher education institutions, providing information about graduation rates, debt repayment rates and median income for career fields. There’s a wealth of story ideas sitting in the database, and knowing what data is there and how to use it can help you get started.

In Thursday’s panel “2,000+ data points on 7,000+ colleges: How to make stories & sense of the College Scorecard,” Sarah Butrymowicz from the Hechinger Report, Andrew Kreighbaum from Inside Higher Ed and Kim Clark from the Education Writers Association discussed the information available in the College Scorecard database and some key data errors to watch out for.

So, what exactly is the College Scorecard? It was created by the Department of Education during President Barack Obama’s administration to provide information about college outcomes and create a sort of college ranking system. It includes data on 7,000 colleges and is the largest-ever release of higher education data.

It shows earnings data for a typical graduate of each university, median debt held by graduates and repayment rates on graduates’ student loan debt, as well as other metrics. The data only represents students who received some sort of federal financial aid, which limits the sample size.

The data is great for comparing outcomes of schools across states and regions. It can answer questions such as, which colleges produce the biggest earners?

Additionally, the data can help raise further questions about institutions’ actions. Are they aware of their graduates’ struggles repaying loan debt compared to similar schools? What are they doing about it? Do they have information on whether graduates with certain majors struggle to repay loans? These are just some of the things to think about when analyzing the data.

There are flaws in the data to be aware of. Some fields are empty or suppressed based on enrollment at the college. If you’re looking at universities that are part of larger systems, the data gets clunky because of the codes by which schools are identified. For the debt data, systems that have multiple campuses are only tagged under one identifying code, making it impossible to disaggregate the campus-level data. Be aware of that if you see duplicates in the data.

With a little bit of cleaning, the College Scorecard is an easily accessible database that has stories ready to be written.

Tyler Wornell is a journalism student at the University of Missouri.

By Virginia Ward

In his CAR Conference session on demystifying data, Hadley Wickham said his job is to push R as far as it can possibly go.

The chief scientist at RStudio develops free tools to explore R, an open-source statistical language. He is also an adjunct professor of statistics at the University of Auckland in New Zealand, Stanford University and Rice University.

When first opening a database, Wickham said it’s common for the blinking cursor to intimidate users. His first tip in demystifying data science is to understand that programming languages are just languages with text. Not only can these texts be copied and pasted, but they can be reproduced and shared.

Wickham said the best way to learn a programming language is by joining an online community. Advanced and new users can work to troubleshoot together through organizations like https://rweekly.org or https://rladies.org.

Open-source programs like GitHub can enhance communication between developers and journalists, making the experience of learning R feel less daunting.

He said while learning a coding language for the first time can be difficult, it will pay off in the long run. Wickham encouraged data journalists to continue updating their knowledge because things keep changing.

“Embrace the change,” Wickham said. “You don’t want to end up stuck using technology from 30 years ago.”

It’s not just the individual pieces of R that give users the power, Wickham said. It’s the glue. R language can solve complex problems by combining simple pieces. These pieces can be learned along the way and help journalists solve problems in future projects.

While Wickham said R can be used to do just about anything, journalists can specifically use the language to tell great visual stories. Wickham hopes journalists can start asking what they want to do with their data rather than how to use their computers to work with it.

“The goal of R Studio is to have a positive impact on the world by creating open-source content,” Wickham said. “Giving people the tools to understand data is really important.”

Virginia Ward is a journalism student at the University of Missouri.

 

By Yue Yu

Kevin Collier from BuzzFeed News, Neena Kapur from the New York Times and Margot Williams from The Intercept shared experiences and tips at the CAR Conference on constructing a secure workstation while pursuing sensitive leads.

Collier talked briefly about the history of hackers working with journalists to produce big stories and getting jailed in the end. Although it doesn't completely wipe out the existence of hackers, Collier said, it creates a chilling effect on them.

Skepticism is necessary when working with hackers, Collier said. He shared his experience reaching out to a hacker named “Guccifer 2.0” who claimed he had sensitive information from the Democratic National Committee. Guccifer 2.0 was later identified as a persona created by Russian intelligence.

Treat hackers the same way you treat every source, Collier said, and ask them for technical details about how exactly they got the information they are sharing with you. Even if you don’t speak the language of data, getting help from a media-friendly expert would be helpful, he said.

Kapur, an information security analyst for the New York Times, offered tips for reporters to set up secure workstations and protect personal data.

A security breach is serious because it could compromise devices, personal data and source identity, and could create misunderstanding for government agencies, Kapur said. One way to reduce the risk is to use a separate device for research, she said. Using MiFi at a coffee shop, using gift cards for purchases and using encrypted USBs are all ways to minimize the risk of being hacked.

Setting up separate and inconspicuous accounts with a complex password can also help, Kapur said. Burner phones, VPNs and browsers that generate random IP addresses and allow anonymous login also increase the security of a work environment.

Williams elaborated on safe searching and using secure browsers. The tools she has used on her research computer include:

Yue Yu is a journalism student at the University of Missouri.

By John Sadler

Keeping a focus on your local coverage area can be difficult in the current information climate — idea generation, watchdogging and source cultivation all need to be juggled.

In Thursday’s panel “Putting your town under a microscope — and keeping it there,” John Diedrich of the Milwaukee Journal Sentinel, Matt Kiefer of The Chicago Reporter and Kate Howard of the Kentucky Center for Investigative Reporting shared tips for comprehensively covering your community.

First, double-check your archives. There’s nothing worse than diving into a topic to find it was already done a year before you got the job.

“You’re probably super smart — I’m sure your ideas are amazing — but there are also other people who may have had it, so make sure you’re looking first to see who else has done your beat well,” Howard said.

Flood the zone

Howard said an invaluable method of ensuring widespread coverage is spreading your reporting efforts across your community. Ask for employee lists and salary data so you can add context if crisis hits a certain agency or organization. Ask for emails, even if you don’t necessarily need them.

Record retention documents are also invaluable to request because they may give information on why records requests could be denied (and, therefore, may give you information on how to argue your case). Requesting lists of audits, both internal and external, completed and planned, will give you a sense of problems within the agency.

Reading meeting minutes and agendas is one way to stay up to date on things you may have missed, Howard said. She also stressed reading your competitors not with an eye for what you’ve been scooped on, but with an eye on what you can add to the story that’s now been brought to the public’s attention.

Fight for records

Kiefer, the data editor for the Chicago Reporter, said to make a courtesy phone call before the records request to try to clear up the format of the request, and reverse-engineer public forms to figure out what the records might look like.

If that fails, Kiefer gets creative. “If they give you pushback about exporting the data, like electronically, what I’ll often do is ask them what kind of software they use and then look up the user manual for that software and send it to them.”

Don’t neglect the human element

Follow up on your project — treat your stories like updating data sets and keep them relevant. And don’t hide from criticism.

“I think (engaging with critics) makes an impression on people that you own your stories and you’re willing to step up for them,” Howard said.

Diedrich, who worked on the Journal-Sentinel’s “Burned” investigation into drum reconditioning facilities (which relied heavily on whistleblowers), said scheduling time for source cultivation works wonders.

“I really try to carve out time to go out with sources and do source lunches on a regular basis,” he said. “The best place, for not just cops but really any beat, is to hang around a courthouse because it is just like a feeding frenzy of sources and news tips.”

John Sadler is a journalism student at the University of Missouri.

By Kelsie Schrader

For many, data journalism is a complex and daunting task. It requires time, skill and access to data and sources. Data stories on hard-to-access, marginalized communities, then, can often seem unapproachable.

The perceived difficulties of reporting on marginalized communities have resulted in a lack of data stories about and for non-white, non-elite communities. Three journalists discussed this issue at the CAR Conference and offered tips for how to develop a data story on a marginalized community.

Adriana Gallardo, an engagement reporter for ProPublica, and Anjeanette Damon, a government watchdog reporter for the Reno Gazette-Journal, have both reported on sensitive, undercovered populations. Gallardo covered maternal mortality in the U.S., which has the highest rates of maternal deaths of any developed country. Damon reported on prison deaths in Reno, which increased significantly after a new sheriff took over. Both stories came with challenges such as difficulty accessing data and sources, but positively affected marginalized communities and inspired conversation and change on local and national scales.

Eva Constantaras, a data journalist at Internews, used these stories as examples of quality data stories on marginalized communities. She offered three main steps to help journalists begin developing data stories on similarly marginalized populations.

Start with the background

Before jumping into a story, see what’s already been reported in other outlets. Stories on marginalized communities often come from breaking news stories. Use headlines in other papers as an opportunity to reveal the systemic issues underlying the breaking news of the day. Analyze what is and isn’t being covered. See what data is available on the subject. Discover what you can add to the conversation.

Form a hypothesis

After you’ve analyzed what already exists on the topic, identified what’s missing and examined available data, write your hypothesis. What do you think your story is about? What could be the causes of inequality and discrimination? Be careful to avoid common mistakes with hypotheses, such as statements that are too simple, too broad, too narrow or unable to be proved with data. 

Develop questions that will prove whether your hypothesis is true

Questions can fall into four categories: problem questions, impact questions, cause questions and solution questions.

Ultimately, Constantaras advocated for stories that have an impact. Stories should have information that marginalized communities can act on and use to engage policy makers. “These problems are everywhere,” she said. “Governments will brag about policies they say are helping people, and it our job to check it.”

Kelsie Schrader is a journalism student at the University of Missouri.

By Dariya Tsyrenzhapova

Location is a common thread that can lead a story and reveal meaningful findings to better serve a community. According to Victor Hernandez of Banjo, geodata also serves as a catalyst enlightening “a technological and a reporting breakthrough” to tell hidden or overlooked stories in underserved communities.

Joe Yerardi, a data reporter at the Center for Public Integrity, said GIS tracking and mapping can provide a leg up in reporting stories about the environment, natural disasters, health care and education. “Journalists can really better understand and master the early indicators around the location of a story,” Yerardi said.

A co-founder of Bloom, a geotagging platform for local news, Stephen Jefferson sees geodata as going beyond just a hashtag. It’s real data that provides “a bridge from digital to reality,” he said. While it offers important insights into the lives of local communities at large, newsrooms can also benefit from using geodata to better understand what their audiences need and want at a given time.

This so-called “editorial intelligence” can equip news organizations with ideas for relevant news coverage of communities they serve, Jefferson said. Metadata can yield compelling insights into user behavior: “Is the local community actually aware of the story, or are people viewing this outside of the community? For readers who are engaging the most with the story, where are they?”

The term “location intelligence” initially emerged from the business side as a strategy to target consumers. But that same idea could be extended to journalistic storytelling, said Amy Schmitz Weiss, an associate professor at San Diego State University. In the meantime, though, Weiss warned that geodata should not be taken at face value – trust, but verify. “If in doubt, don’t go with it,” she said. “There are still opportunities for manipulation. Be skeptical. Interviewing geodata is like interviewing any source.”

To bulletproof your results, Yeradi suggests posing these questions:

A tipsheet put together by the panelists provides a list of mapping tools that will help perform geospatial analysis and enhance story presentation, and also offers story ideas for using geodata to elevate the quality of enterprise reporting and storytelling.

Dariya Tsyrenzhapova is a journalism student at the University of Missouri.

The 2018 CAR Conference begins on Thursday. Below you'll find a few bits of information to help you prepare for this great conference! For the latest up-to-date information about panels, speakers and special events at the conference, please visit our conference website at https://www.ire.org/conferences/nicar18/.

 

Hotel information

The conference is taking place at the Chicago Marriott Downtown Magnificent Mile, 540 N. Michigan Ave, Chicago, IL 60611.

Thank you to GO Airport Express for offering a discount to conference attendees. Details on making reservations can be found here

Driving to Chicago? Hotel parking rates are available here.

 

Registration

Registration opens Wednesday at 1 p.m. and will be open Thursday, Friday and Saturday on the 7th floor of the Marriott.

 

Weather

Weather looks to be in the 30s and 40s during the week. See the 10-day forecast thanks to weatherchannel.com.

 

Wireless internet during the conference

Stop by the registration desk or check the app for the wireless access code to access the complimentary internet offered throughout the meeting space during the CAR Conference. 

 

Have a question or need help in a session?

Room monitors will be stationed in the hallways during sessions and will be happy to answer your questions.

 

Hands-on classes

We have a big crowd this year, and it's exciting to have so many new faces. We've added a number of hands-on sessions, but seating is limited. If there's a hands-on class you really want to take, plan on getting there early.

 

CAR Conference mobile app

You're tech-savvy and care about the environment, and so do we. Rather than printing 1,000 schedules, we're giving you two ways to track the full schedule of panels, hands-on sessions and special events with accurate, up-to-the-minute details: 

Internet is not required for the app to work once it's downloaded. However, a connection is necessary to receive any updates sent by IRE. 

If you feel most comfortable with a printed schedule, a PDF version is available here. We've also added a downloadable csv version of the schedule this year.

 

Twitter/Student Blog

Use #NICAR18 during the conference and stop by the registration desk to see live tweets on the announcement monitor. A team of bloggers will be covering panels, and you can see their work online and in the conference app.

 

Special Events

Be sure to check out the list of special events taking place this week.

 

Updated IRE Principles (Code of Conduct)

Investigative Reporters & Editors is committed to providing a friendly, safe and welcoming environment for all, regardless of gender, ethnicity, sexual orientation, physical ability, age, appearance or religion.

IRE supports vigorous debate and welcomes disagreement, while maintaining a civil and respectful community.

IRE may take any action it deems appropriate to deal with those who violate our principles, including exclusion from our events, forums, listservs and the organization itself.

Anyone who feels threatened or in immediate jeopardy during an IRE event should call building security in Chicago (312-245-4761) or local police by dialing 911.

Additional concerns can be brought to the attention of IRE staff or board members in person. Contact information for both staff and board members can also be found on IRE's website.

 

We thank you for your continued support and are looking forward to seeing you in Chicago!

The 2018 CAR Conference app is now available through Guidebook!

We encourage you to download our mobile guide to enhance your experience at the 2018 CAR Conference. You'll be able to plan your day with a personalized schedule, browse maps and connect with other attendees.

The app is free and compatible with iPhones, iPads, iPod Touches and Android devices. Windows Phone 7 and Blackberry users can access the same information via our mobile site.

To get the guide, choose one of the methods below:

Investigative Reporters and Editors is now welcoming nominations for its annual Golden Padlock award recognizing the most secretive government agency in the United States.

"The techniques of government secrecy have been elevated into high art by determined civil servants," said Robert Cribb, chair of the Golden Padlock committee. "This award brings well-deserved recognition to those who have distinguished themselves with ingenious creativity in denying the public's right to know."

To nominate an agency, please fill out this short form. You'll be asked to provide the name of the government department or individual, your reasons for nominating, and links to media coverage or documents detailing the intransigence. Entries must be submitted to IRE by April 1.

Scott Pruitt, the Oklahoma Attorney General’s Office and the U.S. Environmental Protection Agency won the 2017 Golden Padlock Award for steadfastly refusing to provide emails in the public interest and removing information from public websites about key environmental programs.

Previous winners also include:

109 Lee Hills Hall, Missouri School of Journalism   |   221 S. Eighth St., Columbia, MO 65201   |   573-882-2042   |   info@ire.org   |   Privacy Policy
crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
My cart
Your cart is empty.

Looks like you haven't made a choice yet.