If you're plugged into the tech scene these days -- or even, really, if you're not -- it's almost impossible to escape the exuberance surrounding the ad hoc field known as data science. A combination of math, data munging, visualization and computer programming, data science experts are among the most in-demand hires in the tech industry, responsible for everything from tuning search engines and analyzing billions of Facebook's social connections to programming self-driving cars.
A couple years ago, a New York data scientist named Drew Conway put together a Venn Diagram outlining the key skills that comprise the ideal data scientist. And to the data journalists in the crowd, most of those skills will look pretty familiar. Problem is, where some of us are heavy on hacking skills and substantive expertise, most of us come up a little short on the math and statistics component. And that puts us, according to the diagram, in the "danger zone."
In other words, we still have a lot to learn. That's why The Center for Investigative Reporting and IRE have teamed up with the San Francisco data science company Kaggle to help to bridge that gap. Kaggle, a company that crafts and hosts competitive data challenges for its community of more than 40,000 data scientists, has helped us put together a competition that will answer a single fascinating question: How would data scientists, armed with tools and perspectives rarely seen in journalism, approach a dataset that journalists have looked at a million times over? Namely: federal campaign contributions.
Journalists are good at spotting basic trends: who raised the most; who raised the least; how certain donors and candidates are changing their patterns and why. But in many ways, what we've done with this data is limited by our perspective.
What if the same algorithms that are used to guide self-driving cars could be used to find interesting campaign contributions? What if we could use anomaly detection systems to notify us when a donor or candidate received a noteworthy contribution? How can these tools be used to advance Ben Welsh's well-articulated idea of human-assisted reporting?
The winner of this contest will receive a trip to next year's NICAR conference to help spread their knowledge. And it's our hope that we can use this opportunity to pair up interested data scientists with journalists who can help their work make an impact.
Finally, we owe a great debt of thanks to Kaggle, without whose support this contest would not have happened. Like many of the data scientists I've met in Silicon Valley, the team there has a genuine interest in the public service potential of using data science as a force for good. We're all looking forward to seeing the interesting ideas this contest generates.
And a note to the data journalists in the room: Don't be shy about submitting some ideas yourselves. Just be sure to think outside the box!
Chase Davis is the director of technology at the Center for Investigative Reporting, where he supervises a team of 10 data analysts, news applications developers and engineers.