Getting started with machine learning for reporting

  • Event: 2018 CAR Conference
  • Speakers: Peter Aldhous of BuzzFeed News; Chase Davis of Star Tribune; Anthony Pesce of Los Angeles Times; Rachel Shorey of The New York Times
  • Date/Time: Thursday, Mar. 8 at 4:45pm
  • Location: Grand Ballroom I
  • Audio file: Only members can listen to conference audio

Many tasks in investigative journalism boil down to classification problems. Is my police department cooking its crime stats by assigning incident reports to the wrong categories? Of the thousands of planes in the air each day, which ones might be involved in government surveillance? How can we identify political ads on Facebook? 

Drawing on examples including the LA Times' investigation into the misclassification of violent crimes by the LAPD, BuzzFeed News' identification of spy planes operating in U.S. airspace, and ProPublica's tracking of political ads on Facebook, we'll consider practical questions like: I'm not a data scientist, I'm a reporter. What's in it for me? What type of story or reporting task can machine learning help with? When is machine learning *not* the answer? Which algorithm should I choose? How can I structure my data to give the algorithm more to work with?

 

Speaker Bios

  • Peter is a reporter on the science desk at BuzzFeed News. Data projects include analysis of the text of a year of tweets by Donald Trump and all members of Congress in the first year of Trump's Twitter-led presidency, maps showing projected future coastal flooding under climate change and the risk of wildfires, and the use of machine learning to identify surveillance aircraft from flight-tracking data. @paldhous

  • Chase Davis is a senior digital editor at the Star Tribune in his hometown of Minneapolis. Previously he ran the Interactive News desk at The New York Times and worked as as reporter and editor in Texas, Iowa and California. He also teaches a class in advanced data journalism at his alma mater, Mizzou. @chasedavis

  • Anthony Pesce is a data journalist and reporter on the Los Angeles Times Data Desk. He builds news applications, develops data visualizations and conducts data analysis for reporting projects. @anthonyjpesce

  • Rachel Shorey (@rachel_shorey) is a Software Engineer on the Interactive News team at The New York Times where she writes software to handle campaign finance data, voter data, and whatever other data she manages to get her hands on. Want to start a conversation with her? Tell her your favorite prime number.

Related Tipsheets

No tipsheets have yet been uploaded for this event.