Web scraping without programming

  • Event: 2016 CAR Conference
  • Speakers: Robert Gebeloff of The New York Times; Tom Johnson of Institute for Analytic Journalism
  • Date/Time: Thursday, Mar. 10 at 4:45pm
  • Location: Penrose
  • Audio file: Only members can listen to conference audio

Web scraping without programming The data's there. But it's trapped in an unfriendly format. In this session, we'll show you how to use free and easy tools to liberate information from various formats common to the Web ( HTML, PDF, JSON, XML) and get it into a format more ideal for data analysis.

Speaker Bios

  • Robert Gebeloff has been a staff editor for data projects at The New York Times since 2008. He was a Pulitzer Prize finalist twice, and also once produced a 59-part Census series later published as the collection "Saginaw in the 1980s" @gebeloffnyt

  • Tom Johnson: Mgr. Dir., Institute for Analytic Journalism, Santa Fe, NM. Retired professor who taught at San Francisco State University, Columbia University and Boston University in addition to training journalists in the U.S., Latin America, Africa, and Europe in Analytic Journalism.  He has been a contract reporter for TIME, freelance contributor to many U.S. magazines, Deputy Editor of the St. Louis Post-Dispatch and coordinator of It’s The People’s Data initiative.

Related Tipsheets

  • Web scraping without programming
    You can scrape websites without programming skills. The following slides and tipsheets detail how to use online tools such as Web Scraper and Import.io extract information from the web. You can also get the tipsheets, slides and a practice set from this repo: https://github.com/gebelo/nicar2016.