Advanced data cleaning with Python: Machine learning techniques

  • Event: 2016 CAR Conference
  • Speakers: Cathy Deng of independent journalist; Forest Gregg of DataMade
  • Date/Time: Thursday, Mar. 10 at 4:45pm
  • Location: Denver II
  • Audio file: No audio file available.

People and corporations are of interest to reporters, but data about them are often messy. Fundamentally, natural language lacks structure and the same thing can be represented in many different ways. In many cases, simple deterministic approaches (e.g. regex) can't get you very far. We'll show you some of the powerful tools that DataMade uses to efficiently clean and link the worst data, including dedupe, usaddress, and probablepeople

Prereqs: Comfort with python

Speaker Bios

  • Forest Gregg is a partner at DataMade a Chicago civic technology consulting firm that works to democratize access to information and power. DataMade works with governments, nonprofits, journalists, and grassroot organizations to gather, organize, analyze, and use data. @forestgregg

     

Related Tipsheets