R for data science
The best place to start learning the tidyverse is R for Data Science (R4DS for short), an O’Reilly book written by Hadley Wickham and Garrett Grolemund. It’s designed to take you from knowing nothing about R or the tidyverse to having all the basic tools of data science at your fingertips. You can read it online for free, or buy a physical copy.
We highly recommend pairing R4DS with the RStudio cheatsheets. These cheatsheets have been carefully designed to pack a lot of information into a small amount of space. You can keep them handy at your desk and quickly jog your memory when you get stuck. Most of the cheatsheets have been translated into multiple languages.
Statistical Inference via Data Science: A ModernDive into R and the tidyverse by Chester Ismay and Albert Y. Kim. “Help! I’m new to R and RStudio and I need to learn them! What do I do?” If you’re asking yourself this, this book is for you.
ggplot2: elegant graphics for data science by Hadley Wickham. Goes into greater depth into the ggplot2 visualisation system.
Solutions and notes for R4DS by Jeffrey B. Arnold. Work in progress.
Data Manipulation in R by Steph Locke. Covers data manipulation in a tidyverse way.
- Mastering the Tidyverse by Jumping Rivers. A one day crash course covering tidyverse fundamentals. The course is a mixture of lectures, short exercises and longer tutorial questions. During the day, we’ll cover dplyr, tidy data, tibbles, dates/times and string manipulation.
- Introduction to R by Locke Data. A two day course covering data manipulation and reporting fundamentals using the tidyverse, rmarkdown, and shiny. The course blends lectures, exercises, and practicals over two days to cover the 80% of work that almost everyone needs to do.
Data Challenge Lab. Stanford University; Hadley Wickham and Bill Behrman. This is a 5-unit course using a flipped classroom. The curriculum is designed to cover each main thread of R4DS multiple times, diving a little deeper at each pass.
M.Sc. Industrial Analysis: An International Perspective. HEC Montreal; Thierry Warin. Graduate program in Data Science for International Business (DS4IB), where students learn how to use RStudio, RMarkdown, the tidyverse and open data in a reproducible research workflow. Hosted at Dr.HECtoR.
Better Living with Data Science. Duke University; Mine Cetinkaya-Rundel. Data Science course for first year undergraduates with little to no computing background. Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Data wrangling, exploratory data analysis, predictive modeling, data visualization, and effective communication of results. Discussions around reproducibility, data sharing, data privacy.
Statistical Computing Duke University; Colin Rundel. MS level statistical computing course focusing on Best practices and software development for reproducible results, selecting topics from: use of markup languages, understanding data structures, design of graphics, object oriented programming, vectorized code, scoping, documenting code, profiling and debugging, building modular code, and version control-all in contexts of specific applied statistical analyses.
FE8828 Programming Web Applications in Finance Nanyang Technological University; Dr. Yang Ye Master for Financial Engineering. An intermediate-to-advanced level programming course in R for data analytics and interactive content via web. It teaches R Markdown, Shiny, Tidyverse (dplyr/tidyr/ggplot2/lubridate).
Computing for the Social Sciences University of Chicago; Benjamin Soltoff. This is an applied course for social scientists with little-to-no programming experience who wish to harness growing digital and computational resources. The focus of the course is on generating reproducible research through the use of programming languages and version control software. Major emphasis is placed on a pragmatic understanding of core principles of programming and packaged implementations of methods. Students will leave the course with basic computational skills implemented through many computational methods and approaches to social science; while students will not become expert programmers, they will gain the knowledge of how to adapt and expand these skills as they are presented with new questions, methods, and data.
Applied Media Analytics Elon University; Brian Walsh. An Undergraduate introduction to R programming for Media Analytics majors. Students learn ggplot2, dplyr, and lubridate, as well as basic sentiment analysis, Twitter insights, and Google Analytics.
- Stat545; UBC; Jenny Bryan. Data wrangling,
exploration, and analysis with R
- Stat405; Hadley Wickham, Rice University. Mainly included for historical interest - you can see some of the work that lead up to the creation of the tidyverse.