R for data science

The best place to start learning the tidyverse is R for Data Science (R4DS for short), an O’Reilly book written by Hadley Wickham and Garrett Grolemund. It’s designed to take you from knowing nothing about R or the tidyverse to having all the basic tools of data science at your fingertips. You can read it online for free, or buy a physical copy.

We highly recommend pairing R4DS with the RStudio cheatsheets. These cheatsheets have been carefully designed to pack a lot of information into a small amount of space. You can keep them handy at your desk and quickly jog your memory when you get stuck. Most of the cheatsheets have been translated into multiple languages.


Online courses

  • Writing functions in R by Hadley and Charlotte Wickham, hosted on datacamp. This course will teach you the fundamentals of writing functions in R so that, among other things, you can make your code more readable, avoid coding errors, and automate repetitive tasks.

  • Data visualisation with ggplot2 by Rick Scavetta, hosted on datacamp. Covers the basics of ggplot2. Followed by part 2 which covers more advanced topics.

  • Exploratory data analysis in R: case study by David Robinson, hosted on datacamp. This course brings ggplot2 and dplyr to action on a real dataset, also introducing broom for tidying model output and how to tidy up data to help you explore your dataset.

University courses

  • Data Challenge Lab. Stanford University; Hadley Wickham and Bill Behrman. This is a 5-unit course using a flipped classroom. The curriculum is designed to cover each main thread of R4DS multiple times, diving a little deeper at each pass.

  • M.Sc. Industrial Analysis: An International Perspective. HEC Montreal; Thierry Warin. Graduate program in Data Science for International Business (DS4IB), where students learn how to use RStudio, RMardown, the tidyverse and open data in a reproducible research workflow. Hosted at Dr.HECtoR.

  • Better Living with Data Science. Duke University; Mine Cetinkaya-Rundel. Data Science course for first year undergradiates with little to no computing background. Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Data wrangling, exploratory data analysis, predictive modeling, data visualization, and effective communication of results. Discussions around reproducibility, data sharing, data privacy.

  • Statistical Computing Duke University; Colin Rundel. MS level statistical computing course focusing on Best practices and software development for reproducible results, selecting topics from: use of markup languages, understanding data structures, design of graphics, object oriented programming, vectorized code, scoping, documenting code, profiling and debugging, building modular code, and version control-all in contexts of specific applied statistical analyses.


  • Stat545; UBC; Jenny Bryan. Data wrangling, exploration, and analysis with R


  • Stat405; Hadley Wickham, Rice University. Mainly included for historical interest - you can see some of the work that lead up to the creation of the tidyverse.
