NewsCamp: Data science master class

Thursday, February 28, 2013 

Level Up your data skills using R and Data Science! This day-long, hands-on class is taught by Hadley Wickham, author of the ggplot2 R plug-in, Assistant Professor of Statistics at Rice University, and developer at RStudio.

You’ll learn intermediate to advanced R and get an understanding of the techniques and methodologies of data science that you can bring back to your newsroom and impact your work right away. You’ll need a good foundation in stats. Before the class we’ll provide videos and exercises that cover the basics of R so we can hit the ground running.

Participants are expected to have experience with a programming language, an understanding of statistics up to linear regression and a solid foundation of work in spreadsheets, database managers, data cleaning etc. Preregistration is required for this session.  Only 24 seats are available and there is an additional $70 registration fee. Pre-registrations are accepted on a first-come, first-served basis.  Prospective attendees must register and fill out an application for this day-long track.

Registration for NewsCamp is full. We apologize for the inconvenience. We’ll be offering a panel session, NewsCamp: Data science for the perplexed Friday, March 1 for those who are interested (no preregistration required).

Here is an outline of what the day will entail:

Tidy data
Tidy data is a way of storing data that makes it easy to visualize, manipulate and model. By starting the day with the principles of tidy data we’ll build a strong foundation for learning the tools of data analysis. Tidy data enables a lean, efficient workflow that avoids distractions of computer programming to focus on actual data analysis. You’ll learn:

  • The principles of tidy data
  • How to tidy common forms of messy data
  • The basics of data input in R

 

Visualizing data
R is well known for its beautiful graphics, and in this session you’ll learn the basics of using ggplot2, an R package that makes elegant graphics with a rigorous underlying theory. You’ll learn:

  • How to create scatterplots, histograms and bar charts
  • Add extra variables through facetting and aesthetics
  • Some good practices for coding exploratory data analysis
  • Where to go for more help.

 

Data manipulation
R’s built in subsetting is powerful, but can be verbose. The plyr package an powerful alternative to express most data manipulations, using a consistent family of functions. You’ll learn:

  • How to use the subset, mutate, summarize and arrange functions
  • About the split-apply-combine strategy to perform groupwise operations
  • How to use ddply, an implementation of the split-apply-combine strategy

 

Modelling
In a short amount of time, you’ll learn the absolute basics of R’s modelling tools. We’ll keep a razor focus on how to use models (and what they can do for you), avoiding any details about how models work. We’ll spend most of our time on lm, the function that fits linear models, but you’ll also see how mastery of lm translates to using many other models. You’ll learn:

  • The basics of R’s formula syntax and how to fit models
  • How to compute residuals and make predictions for new data
  • How to use resampling methods to better understand model variability

 

Speaker: Hadley Wickham received his Ph.D. from Iowa State University and is currently both an Assistant Professor at Rice University and working for RStudio. He is interested in tools (computational and cognitive) for making data preparation, visualization and analysis easier, as well as figuring out how to teach those tools in the most effective way possible. He is the author of the ggplot2, reshape, plyr, stringr,lubridate R packages (as well as over 20 others), and of the book ggplot2: Elegant Graphics for Data Analysis (Use R), published by Springer.