Scraping, APIs and data extraction

  • Event: 2015 CAR Conference
  • Speakers: Cedric Sam of South China Morning Post; Nils Mulvad of Kaas & Mulvad
  • Date/Time: Friday, Mar. 6 at 2:10pm
  • Location: International 10
  • Audio file: Only members can listen to conference audio

We'll take a look at a variety of issues and techniques for web scraping including: How to deal with limitations in scraping from Google, Facebook and LinkedIn? How to scrape data and social media content using API's. Challenges and different approaches in dealing with repeated scraping, as well as smaller-scale and big data scrapes. And how to scrape for stories - not just data.

This session is good for: Anyone. Experienced data users and beginners are welcome.

Speaker Bios

  • Nils Mulvad is partner and CEO at Kaas & Mulvad. He is specialized in getting data by extracting websites, negotiating, using FOI-requests and scraping. He started using scrapers in 2004 and have all the years mainly worked with Kapow, Helium Scrapers, and python-scrapers. Nils analyzes data - looking for patterns and the most interesting conclusions to be drawn out of data. He has trained journalists and others in data for more than 20 years. @nmulvad

  • Cedric Sam is currently an interactive data journalist for the South China Morning Post in Hong Kong. He shares his time between database work and online production / data visualization. Cedric is originally a coder, but has been working in journalism since 2007. He has previously worked for CBC/Radio-Canada, the Journalism and Media Studies Centre at HKU and La Presse. @cedricsam

Related Tipsheets

No tipsheets have yet been uploaded for this event.