IRE favicon

Learning to liberate data

By Anna Boiko-Weyrauch
@AnnaBoikoW

Syntax error. What does this bit of code do? Syntax error. Let’s go back to the source. Syntax error. Maybe try this?

After two hours of educated guesses, trial, error and some friendly help, Pam Dempsey, of cu-citizenaccess.org, and I had finally scraped our first bit of text: the word “2011” from a page of Illinois nursing home inspection records.

The ScraperWiki session on Thursday night was aimed at “liberating” a number of data sets from their online prisons by working late into the night scraping real websites.

The session started with a two-hour lesson on Scraperwiki and fundamentals of scraping. Thomas Levine, Developer Advocate with Scraperwiki, said if you’d get bored copying and pasting information off of multiple webpages for hours at a time, that’s a good reason to look into scraping, which would pull the information off the web more efficiently.

After the tutorial, small groups clustered around the room, bent over laptops, plugging in code to liberate data. We used Levine's cheatsheet. Levine and Chief Data Scientist, Julian Todd, helped the journalists scrape data on fracking companies, the US Defense Department Budget and Skagit County jail records.

For those already familiar with programming in Python, Ruby, Javascript or PHP, Scraperwiki allows you to see and edit code from each language in the browser window without downloading any libraries or setting up any software. If you get lost, just click the “documentation” button to read up on what went wrong. When you’re ready to export your product, you can download your data as a CSV, feed it into an app or query it with SQL using the Scraperwiki platform. There’s also collaborative features, which allow users to work on scraping the same sites from opposite sides of the globe. Even if you can’t scrape the data yourself, you can make a request to the ScraperWiki community and 60% of the time they’ll respond, Chief Marketing Officer Aine McGuire said.

The session was schedule to last from 6 p.m. to 6 a.m., but the last web scrapers left around 1 a.m., Dempsey said. Levine said he hoped data enthusiasts of all levels would get something out of the exercise, from learning about what ScraperWiki can do, to encouraging beginnings to “be less scared” of programming, and try scraping on their own.

Anna Boiko-Weyrauch is a graduate student at the University of Missouri's School of Journalism. 

141 Neff Annex   |   Missouri School of Journalism Columbia, MO 65211   |   573-882-2042   |   info@ire.org   |   Privacy Policy
crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram