Getting started with machine learning for reporting
Many tasks in investigative journalism boil down to classification problems. Is my police department cooking its crime stats by assigning incident reports to the wrong categories? Of the thousands of planes in the air each day, which ones might be involved in government surveillance? How can we identify political ads on Facebook?
Drawing on examples including the LA Times' investigation into the misclassification of violent crimes by the LAPD, BuzzFeed News' identification of spy planes operating in U.S. airspace, and ProPublica's tracking of political ads on Facebook, we'll consider practical questions like: I'm not a data scientist, I'm a reporter. What's in it for me? What type of story or reporting task can machine learning help with? When is machine learning *not* the answer? Which algorithm should I choose? How can I structure my data to give the algorithm more to work with?
Peter is a reporter on the science desk at BuzzFeed News. Data projects include analysis of the text of a year of tweets by Donald Trump and all members of Congress in the first year of Trump's Twitter-led presidency, maps showing projected future coastal flooding under climate change and the risk of wildfires, and the use of machine learning to identify surveillance aircraft from flight-tracking data. @paldhous
Chase Davis is a senior digital editor at the Star Tribune in his hometown of Minneapolis. Previously he ran the Interactive News desk at The New York Times and worked as as reporter and editor in Texas, Iowa and California. He also teaches a class in advanced data journalism at his alma mater, Mizzou. @chasedavis
Anthony Pesce is a data journalist and reporter on the Los Angeles Times Data Desk. He builds news applications, develops data visualizations and conducts data analysis for reporting projects. @anthonyjpesce
Rachel Shorey (@rachel_shorey) is a Software Engineer on the Interactive News team at The New York Times where she writes software to handle campaign finance data, voter data, and whatever other data she manages to get her hands on. Want to start a conversation with her? Tell her your favorite prime number.
No tipsheets have yet been uploaded for this event.