NewsCamp: Data science master class
Thursday, February 28, 2013
Level Up your data skills using R and Data Science! This day-long, hands-on class is taught by Hadley Wickham, author of the ggplot2 R plug-in, Assistant Professor of Statistics at Rice University, and developer at RStudio.
You’ll learn intermediate to advanced R and get an understanding of the techniques and methodologies of data science that you can bring back to your newsroom and impact your work right away. You’ll need a good foundation in stats. Before the class we’ll provide videos and exercises that cover the basics of R so we can hit the ground running.
Participants are expected to have experience with a programming language, an understanding of statistics up to linear regression and a solid foundation of work in spreadsheets, database managers, data cleaning etc. Preregistration is required for this session. Only 24 seats are available and there is an additional $70 registration fee. Pre-registrations are accepted on a first-come, first-served basis. Prospective attendees must register and fill out an application for this day-long track.
Registration for NewsCamp is full. We apologize for the inconvenience. We’ll be offering a panel session, NewsCamp: Data science for the perplexed Friday, March 1 for those who are interested (no preregistration required).
Here is an outline of what the day will entail:
Tidy data is a way of storing data that makes it easy to visualize, manipulate and model. By starting the day with the principles of tidy data we’ll build a strong foundation for learning the tools of data analysis. Tidy data enables a lean, efficient workflow that avoids distractions of computer programming to focus on actual data analysis. You’ll learn:
- The principles of tidy data
- How to tidy common forms of messy data
- The basics of data input in R
R is well known for its beautiful graphics, and in this session you’ll learn the basics of using ggplot2, an R package that makes elegant graphics with a rigorous underlying theory. You’ll learn:
- How to create scatterplots, histograms and bar charts
- Add extra variables through facetting and aesthetics
- Some good practices for coding exploratory data analysis
- Where to go for more help.
R’s built in subsetting is powerful, but can be verbose. The plyr package an powerful alternative to express most data manipulations, using a consistent family of functions. You’ll learn:
- How to use the subset, mutate, summarize and arrange functions
- About the split-apply-combine strategy to perform groupwise operations
- How to use ddply, an implementation of the split-apply-combine strategy
In a short amount of time, you’ll learn the absolute basics of R’s modelling tools. We’ll keep a razor focus on how to use models (and what they can do for you), avoiding any details about how models work. We’ll spend most of our time on lm, the function that fits linear models, but you’ll also see how mastery of lm translates to using many other models. You’ll learn:
- The basics of R’s formula syntax and how to fit models
- How to compute residuals and make predictions for new data
- How to use resampling methods to better understand model variability