By Anna Boiko-Weyrauch
@AnnaBoikoW

Tweets are tempting but tricky for data journalists.

“Twitter data is probably some of the hardest data you can work with,” Jacob Harris, senior software architect at The New York Times, said at the “Capturing and analyzing Twitter feeds” session.

Harris said tweets are hard to collect and analyze, and the tools available at dev.twitter.com are not meant for analyzing large feeds.

Daniel Lathrop, The Dallas Morning News, shared an easy way to curate Twitter feeds for your audience. His paper gave readers a way to follow the Twitter conversation on the World Series, and the death of Osama Bin Laden.

Lathrop said he used JavaScript to run a program on server that pulled down tweets from list of names provided to him by the paper’s editors.

There is a way to get a grip on the “firehose” of information contained in the world’s corpus of feeds, but it’s much more complex. During the session, The Guardian’s Alastair Dant demonstrated the paper’s efforts to track tweets as colorful balls that wiggle, swell and contract based on popularity.

Two projects, produced by a team of staffers working in sync, cover the World Cup and the Murdoch trialsThe Guardian partnered with Datasift, which collects the full ocean of Twitter feeds. Then, one team member used ruby script to process the feeds and calculate the most common words over time.

As popular as the projects were online, Dant said he wasn’t sure of the visualizations’ informational value, but instead the two animations were good at showing “the roar of the crowd as it passes through Twitter. Instead, Dant was interested in using the tools to illustrate the rumors, such as the ones flying around during the London riots in 2011.  The Guardian’s team focused on a few rumors and showed how each one spread and eventually died. One rumor that rioters broke into the zoo and let out all the animals was started by a black and white photo of a penguin and an anonymous figure grabbing onto a fence. The rumor was tweeted and picked up by more people who had larger followings.

Dant said there are plans to “share the wealth” of these projects to other news organizations. The Miso Project wants to make interactive content easier by collecting and sharing visualization components among large and small newsrooms. There’s no website yet, but you can follow the project on Twitter: @themisoproject.

Anna Boiko-Weyrauch is a graduate student at the University of Missouri’s School of Journalism.