How to find reporting leads and publishable facts in text data you already have

  • Event: 2018 CAR Conference
  • Speakers: Jeff Ernsthausen of ProPublica; Jeremy Merrill of Quartz; Youyou Zhou of Quartz
  • Date/Time: Friday, Mar. 9 at 9:00am
  • Location: Grand Ballroom III
  • Audio file: Only members can listen to conference audio

Let's discuss some published projects that have extracted useful, newsy information from big piles of text data — so you can use similar techniques. We'll walk you through real-world examples of every step of the process: gathering text data, dividing it into chunks the computer can understand, analyzing it with fancy or simple techniques and the challenges you'll face in analyzing, bulletproofing and presenting what you find. This session isn't quite a hands-on, but the panelists will discuss the tools, practical techniques and tricks they used to transform giant piles of text into publishable insights and reporting leads. These techniques are often called "natural language processing," but we're going to keep it practical: no obscure mathematical formulas, guaranteed!

Speaker Bios

  • Jeff Ernsthausen is a data reporter at ProPublica. He joined ProPublica from the Atlanta Journal-Constitution, where he worked as a data reporter on the investigative team. Prior to his time in journalism, he worked as an economic analyst and researcher at the Federal Reserve.

  • Jeremy B. Merrill is a machine learning journalist with Qz's AI Studio. Previously, he has reported with a vast crowd-sourced database of political Facebook ads at ProPublica. @jeremybmerrill

  • Youyou Zhou is a Things reporter at Quartz, reporting stories using visual and data tools with a keen interest in immigration, global issues and algorithmic accountability. She has recently shifted to work on Quartz membership offerings. Youyou previously worked for The Associated Press and graduated from University of Missouri. Find her on Twitter @zhoyoyo

Related Tipsheets

No tipsheets have yet been uploaded for this event.