How to find reporting leads and publishable facts in text data you already have
Let's discuss some published projects that have extracted useful, newsy information from big piles of text data — so you can use similar techniques. We'll walk you through real-world examples of every step of the process: gathering text data, dividing it into chunks the computer can understand, analyzing it with fancy or simple techniques and the challenges you'll face in analyzing, bulletproofing and presenting what you find. This session isn't quite a hands-on, but the panelists will discuss the tools, practical techniques and tricks they used to transform giant piles of text into publishable insights and reporting leads. These techniques are often called "natural language processing," but we're going to keep it practical: no obscure mathematical formulas, guaranteed!
Jeff Ernsthausen is a data reporter at ProPublica. He joined ProPublica from the Atlanta Journal-Constitution, where he worked as a data reporter on the investigative team. Prior to his time in journalism, he worked as an economic analyst and researcher at the Federal Reserve.
Jeremy B. Merrill is a machine learning journalist with Qz's AI Studio. Previously, he has reported with a vast crowd-sourced database of political Facebook ads at ProPublica. @jeremybmerrill
Youyou Zhou is a Things reporter at Quartz, reporting stories using visual and data tools with a keen interest in immigration, global issues and algorithmic accountability. She has recently shifted to work on Quartz membership offerings. Youyou previously worked for The Associated Press and graduated from University of Missouri. Find her on Twitter @zhoyoyo
No tipsheets have yet been uploaded for this event.