By Anna Boiko-Weyrauch
Syntax error. What does this bit of code do? Syntax error. Let’s go back to the source. Syntax error. Maybe try this?
After two hours of educated guesses, trial, error and some friendly help, Pam Dempsey, of cu-citizenaccess.org, and I had finally scraped our first bit of text: the word “2011” from a page of Illinois nursing home inspection records.
The ScraperWiki session on Thursday night was aimed at “liberating” a number of data sets from their online prisons by working late into the night scraping real websites.
The session started with a two-hour lesson on Scraperwiki and fundamentals of scraping. Thomas Levine, Developer Advocate with Scraperwiki, said if you’d get bored copying and pasting information off of multiple webpages for hours at a time, that’s a good reason to look into scraping, which would pull the information off the web more efficiently.
After the tutorial, small groups clustered around the room, bent over laptops, plugging in code to liberate data. We used Levine's cheatsheet. Levine and Chief Data Scientist, Julian Todd, helped the journalists scrape data on fracking companies, the US Defense Department Budget and Skagit County jail records.
The session was schedule to last from 6 p.m. to 6 a.m., but the last web scrapers left around 1 a.m., Dempsey said. Levine said he hoped data enthusiasts of all levels would get something out of the exercise, from learning about what ScraperWiki can do, to encouraging beginnings to “be less scared” of programming, and try scraping on their own.
Anna Boiko-Weyrauch is a graduate student at the University of Missouri's School of Journalism.