Using machine learning to deal with dirty data: a Dedupe demonstration
Tired of popping the same data set into OpenRefine every time you want to answer basic questions like “who gave the most money to politicians in Idaho this year?” Sure you are. We all are. Then come see a demonstration of Dedupe, a tool that uses machine learning to identify unique individuals, organizations and other entities in the kinds of messy datasets that journalists encounter the most. We’ll go over the basics of how the tool works and give a demonstration of how to use it to find the unique entities in your datasets.
Derek Eder is an entrepreneur, technologist, organizer and one of the leaders of the civic tech community in Chicago. He is founder and partner at DataMade, a company that tells stories and builds tools with data and the lead organizer for Chi Hack Night, Chicago’s premier weekly event for building, sharing and learning about civic tech.
Jeff Ernsthausen is a data reporter at ProPublica. He joined ProPublica from the Atlanta Journal-Constitution, where he worked as a data reporter on the investigative team. Prior to his time in journalism, he worked as an economic analyst and researcher at the Federal Reserve.
Forest Gregg is a partner at DataMade a Chicago civic technology consulting firm that works to democratize access to information and power. DataMade works with governments, nonprofits, journalists, and grassroot organizations to gather, organize, analyze, and use data. @forestgregg
No tipsheets have yet been uploaded for this event.