Using machine learning to deal with dirty data: a Dedupe demonstration

  • Event: 2015 CAR Conference
  • Speakers: Jeff Ernsthausen of ProPublica; Derek Eder of DataMade; Eric van Zanten of DataMade; Forest Gregg of DataMade
  • Date/Time: Sunday, Mar. 8 at 10:10am
  • Location: International 10
  • Audio file: Only members can listen to conference audio

Tired of popping the same data set into OpenRefine every time you want to answer basic questions like “who gave the most money to politicians in Idaho this year?” Sure you are. We all are. Then come see a demonstration of Dedupe, a tool that uses machine learning to identify unique individuals, organizations and other entities in the kinds of messy datasets that journalists encounter the most. We’ll go over the basics of how the tool works and give a demonstration of how to use it to find the unique entities in your datasets.


Speaker Bios

  • Derek Eder is an entrepreneur, technologist, organizer and one of the leaders of the civic tech community in Chicago. He is founder and partner at DataMade, a company that tells stories and builds tools with data and the lead organizer for Chi Hack Night, Chicago’s premier weekly event for building, sharing and learning about civic tech.

  • Jeff Ernsthausen is a data reporter at ProPublica. He joined ProPublica from the Atlanta Journal-Constitution, where he worked as a data reporter on the investigative team. Prior to his time in journalism, he worked as an economic analyst and researcher at the Federal Reserve.

  • Forest Gregg is a partner at DataMade a Chicago civic technology consulting firm that works to democratize access to information and power. DataMade works with governments, nonprofits, journalists, and grassroot organizations to gather, organize, analyze, and use data. @forestgregg

  • Eric has spent the last several years working as a web developer to pay off his art school education. Currently working in Chicago for civic technology company DataMade doing everything from data janitorial work to fighting with Javascript. Mostly in Python. @evanzanten

Related Tipsheets

No tipsheets have yet been uploaded for this event.