Tucked away on reporters' computers are dozens of details that could benefit news coverage, if only other journalists knew where to look.
Newsrooms are swimming in data. Journalistic organizations big and small continue to collect data from local, state and federal governments, and dozens of other places. As the collection grows, making sense of that information can become more difficult.
That's what the PANDA project, a 2011 Knight News Challenge winner, wants to solve — make data analysis easier for journalists and make sharing data within a newsroom less complicated.
"We want this thing to work for all levels of organizations, regardless of what they can afford," PANDA's developer Christopher Groskopf said. "We want to be able to scale down really small. Small news organizations can run it on somebody’s desktop if they need to."
Groskopf, formerly a news applications developer at the Chicago Tribune, is PANDA's only full-time developer. Tribune news applications editor Brian Boyer, Tribune news applications developer Joe Germuska and Ryan Pitts, digital media editor for the Spokane Spokesman-Review, put together the news challenge proposal. They were part of the team that built census.ire.org. PANDA, like DocumentCloud, was an independent project, but is now part of IRE.
The idea for PANDA is to give news organizations a central location to upload data — voter registration rolls, restaurant inspections, teacher salaries — where it can be searched, analyzed and exported. If it sounds familiar, it's because the concept isn't new. Intranets, or internal searchable databases, have been a part of newsrooms for decades. IRE has been teaching how to set up an intranet since the mid-90s.
But intranets can be hard to install and difficult to scale. PANDA is designed to eliminate some of the initial hurdles, particularly for smaller newsrooms that don't have full-time developers on staff. PANDA was born out the Tribune's own intranet. Former Tribune CAR reporter Darnell Little, now at Northwestern University, created DAVE, which allowed reporters to search about 10 databases collected by the paper. In trying to add features to DAVE, the Tribune development team decided that a new system was needed.
Uploading, interacting and more
Uploading data to PANDA is designed to be easy, Groskopf said, regardless of the table structure. In the current alpha stage of the project, users can upload data as a .csv or Excel file. Users can also search for specific data sets by type, such as crime or politics. (Test the search function in PANDA.)
“Rather than try to imply a structure on newsrooms for how they have to upload their data, which is really fraught, we are taking the approach that if we put all of this information in the same place and allow you to search your data in the way that you would search Google there is a lot of value,” he said.
Once there is a central depository for newsrooms to keep their data individual reporters, regardless of their SQL knowledge, will be able to query for results and receive updates on their beat.
"Business reporters might want to have alerts on certain businesses they cover," he said. "So they won't have to keep searching the same data set uploaded by someone else there could be some sort of notifications for that reporter… like e-mail alerts. I don’t think we know what shape something like that will take at this point, but that feature will be in the final product."
Another feature Groskopf is still considering is how users will be able to export results they've found. Users are able to do this with Census.ire.org. Census data has a consistent table structure. With PANDA that won't be the case as users upload different data sets with different columns and data types.
“It’s really important to me that you’ll be able to export search results but that’s not exactly clear to me how it will work.”
For developers, PANDA's API can "be used to power custom applications or import/export data in novel ways."
Installing PANDA and securing your data
The easiest way to deploy PANDA will be to use Amazon's cloud hosting service.
"In the event that people are willing to deploy in the cloud it’s really easy," Groskopf said. "But as soon as you put it in the cloud you raise all of these discussions about privacy."
Security concerns are some of the most frequent questions he receives, Groskopf said, particularly as it relates to legal issues of using a third-party hosting service. For users not comfortable with a hosted web service like Amazon, PANDA can be installed on a server or even a computer. The benefit of using Amazon will be the ease of use and support, Groskopf said. Newsrooms will have the option to deploy PANDA internally or in the cloud.
"We are not going to host this service at this time because of financial constraints," Groskopf said. "We didn't want to build something, encourage people to upload their data and then have to shut it down because we ran out of grant money to pay for hosting."
What to expect at the Computer-Assisted Reporting Conference in St. Louis
Groskopf, Boyer, Germuska and Pitts will unveil the beta version of PANDA and provide a full demonstration of PANDA at the 2012 CAR Conference in St. Louis.
"NICAR is our big coming out party," Groskopf said. "The incomplete nature of the product means there will still be updates, but by NICAR, we’ll have a minimal viable product something that we’ll be able to give to people."
During PANDA Project show & tell the team will take questions and explain features, including:
The team will also be having a "provisioning party," where those interested in setting up their own PANDA can come with a credit card and leave with a version installed on Amazon (EC2).