R for data science
The best place to start learning the tidyverse is R for Data Science (R4DS for short), an O’Reilly book written by Hadley Wickham and Garrett Grolemund. It’s designed to take you from knowing nothing about R or the tidyverse to having all the basic tools of data science at your fingertips. You can read it online for free, or buy a physical copy.
We highly recommend pairing R4DS with the RStudio cheatsheets. These cheatsheets have been carefully designed to pack a lot of information into a small amount of space. You can keep them handy at your desk and quickly jog your memory when you get stuck. Most of the cheatsheets have been translated into multiple languages.
(Do you have a book you’d like to see listed here? Please submit a pull request!)
ModernDive: An Introduction to Statistical and Data Sciences via R by Chester Ismay and Albert Y. Kim. “Help! I’m new to R and RStudio and I need to learn them! What do I do?” If you’re asking yourself this, this book is for you.
ggplot2: elegant graphics for data science by Hadley Wickham. Goes into greater depth into the ggplot2 visualisation system.
Solutions and notes for R4DS by Jeffrey B. Arnold. Work in progress.
Data Manipulation in R by Steph Locke. Covers data manipulation in a tidyverse way.
DataCamp is an excellent way to improve your R skills, including the tidyverse. Take a look at all tidyverse courses or see selected favourites below:
Writing functions in R by Hadley and Charlotte Wickham, hosted on DataCamp. This course will teach you the fundamentals of writing functions in R so that, among other things, you can make your code more readable, avoid coding errors, and automate repetitive tasks.
Introduction to the tidyverse by David Robinson, hosted on DataCamp. This is an introduction to the dplyr and ggplot2 packages through exploration and visualization of country data over time. This is a suitable course for people who have no or limited experience in R and are interested in learning to perform data analysis.
Exploratory data analysis in R: case study by David Robinson, hosted on DataCamp. This course brings ggplot2 and dplyr into action in an in-depth analysis of United Nations voting data. The course also introduces broom for tidying model output and the tidyr package for wrangling data into an explorable shape.
- Mastering the Tidyverse by Jumping Rivers. A one day crash course covering tidyverse fundamentals. The course is a mixture of lectures, short exercises and longer tutorial questions. During the day, we’ll cover dplyr, tidy data, tibbles, dates/times and string manipulation.
- Introduction to R by Locke Data. A two day course covering data manipulation and reporting fundamentals using the tidyverse, rmarkdown, and shiny. The course blends lectures, exercises, and practicals over two days to cover the 80% of work that almost everyone needs to do.
Data Challenge Lab. Stanford University; Hadley Wickham and Bill Behrman. This is a 5-unit course using a flipped classroom. The curriculum is designed to cover each main thread of R4DS multiple times, diving a little deeper at each pass.
M.Sc. Industrial Analysis: An International Perspective. HEC Montreal; Thierry Warin. Graduate program in Data Science for International Business (DS4IB), where students learn how to use RStudio, RMarkdown, the tidyverse and open data in a reproducible research workflow. Hosted at Dr.HECtoR.
Better Living with Data Science. Duke University; Mine Cetinkaya-Rundel. Data Science course for first year undergraduates with little to no computing background. Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Data wrangling, exploratory data analysis, predictive modeling, data visualization, and effective communication of results. Discussions around reproducibility, data sharing, data privacy.
Statistical Computing Duke University; Colin Rundel. MS level statistical computing course focusing on Best practices and software development for reproducible results, selecting topics from: use of markup languages, understanding data structures, design of graphics, object oriented programming, vectorized code, scoping, documenting code, profiling and debugging, building modular code, and version control-all in contexts of specific applied statistical analyses.
FE8828 Programming Web Applications in Finance Nanyang Technological University; Dr. Yang Ye Master for Financial Engineering. An intermediate-to-advanced level programming course in R for data analytics and interactive content via web. It teaches R Markdown, Shiny, Tidyverse (dplyr/tidyr/ggplot2/lubridate).
Computing for the Social Sciences University of Chicago; Benjamin Soltoff. This is an applied course for social scientists with little-to-no programming experience who wish to harness growing digital and computational resources. The focus of the course is on generating reproducible research through the use of programming languages and version control software. Major emphasis is placed on a pragmatic understanding of core principles of programming and packaged implementations of methods. Students will leave the course with basic computational skills implemented through many computational methods and approaches to social science; while students will not become expert programmers, they will gain the knowledge of how to adapt and expand these skills as they are presented with new questions, methods, and data.
Applied Media Analytics Elon University; Brian Walsh. An Undergraduate introduction to R programming for Media Analytics majors. Students learn ggplot2, dplyr, and lubridate, as well as basic sentiment analysis, Twitter insights, and Google Analytics.
- Stat545; UBC; Jenny Bryan. Data wrangling,
exploration, and analysis with R
- Stat405; Hadley Wickham, Rice University. Mainly included for historical interest - you can see some of the work that lead up to the creation of the tidyverse.