Cleaning data with OpenRefine
OpenRefine is the best tool to clean really dirty data -- the kind of data in which the same name might be spelled in 30 different ways. It has built-in cleaning tools for analysts and journalists.
Robert Gebeloff has been a staff editor for data projects at The New York Times since 2008. He was a Pulitzer Prize finalist twice, and also once produced a 59-part Census series later published as the collection "Saginaw in the 1980s" @gebeloffnyt
Nils Mulvad is partner and CEO at Kaas & Mulvad. He is specialized in getting data by extracting websites, negotiating, using FOI-requests and scraping. He started using scrapers in 2004 and have all the years mainly worked with Kapow, Helium Scrapers, Import.io and python-scrapers. Nils analyzes data - looking for patterns and the most interesting conclusions to be drawn out of data. He has trained journalists and others in data for more than 20 years. @nmulvad
Advanced data cleaning with OpenRefine
Sarah Cohen, computer-assisted reporting editor at the NYT, walks you through cleaning data using OpenRefine. Screenshots and step-by-step instructions included.
Cleaning data with OpenRefine - repo
Repo for Robert Gebeloff's hands-on presentation on OpenRefine. https://github.com/gebelo/nicar2016