Finding needles in haystacks with fuzzy matching
Fuzzy matching is a process for linking up names that are similar, but not quite the same. It has become an increasingly important part of data-led investigations as a way to identify connections between public figures, key people and companies that are relevant to a story. This class will cover how fuzzy matching typically fits into the investigative process, with some story examples. Max Harlow, who developed the CSV Match command line tool, will show you how to run some of the different types of fuzzy matching on some real datasets, including the pros and cons of each.
This class is good for: Anyone frustrated by the joys of matching dirty data. It helps to be familiar with the command line but is not necessary.
Max Harlow is a newsroom developer at the Financial Times in London. He has previously worked on investigations at the Guardian and at the Bureau of Investigative Journalism. He co-runs Journocoders, a group for journalists who want to develop technical skills for use in their reporting. @maxharlow
No tipsheets have yet been uploaded for this event.