R for Data Science, 2nd edition

We’re thrilled to announce the publication of the 2nd edition of R for Data Science.

The second edition is a major reworking of the first edition, removing material we no longer think is useful, adding material we wish we included in the first edition, and generally updating the text and code to reflect changes in best practices.

You can read the book online for free at https://r4ds.hadley.nz, or buy a physical copy.

Read below to find out what’s new and what’s gone compared to the first edition.

What’s new?

We have renamed the first part of the book to “Whole game”, with the goal of giving you the rough details of the "whole game" of data science, including data visualization, transformation, tidying, and import, before we dive into the details. The data visualization chapter has gained a new section written with the “cake first” approach, which starts with the final visualization you will learn to make, and then builds up to it layer-by-layer. The data tidying chapter introduces the basics of lengthening and widening data and the data import chapter introduces reading tabular data.

The second part of the book is "Visualize", which gives data visualization tools and best practices a more thorough coverage compared to the first edition.

The third part of the book is now called "Transform"and gains new chapters on numbers, logical vectors, and missing values. Much of this content was previously part of the data transformation chapter. In this edition we have expanded them to cover all the details.

The fourth part of the book is called "Import", it's a new set of chapters that goes beyond reading flat text files to working with spreadsheets (Excel and GoogleSheets), databases, and big data (with Arrow) as well as rectangling hierarchical data and scraping data from web sites.

The "Program" part has been rewritten from scratch to focus on the most important parts of function writing and iteration. Function writing now includes details on how to wrap tidyverse functions (dealing with the challenges of tidy evaluation), since this has become much easier and more important over the last few years. We have also added a new chapter on important base R functions that you're likely to see in wild-caught R code.

Finally, the "Communicate" part remains, but has been thoroughly updated to feature Quarto instead of R Markdown. This edition of the book has been written in Quarto, and it's clearly the tool of the future.

What’s gone?

The first edition of the book featured a part on modeling, which has now been removed. We never had enough room to fully do modelling justice, and there are now much better resources available. We generally recommend using the tidymodels packages and reading Tidy Modeling with R by Max Kuhn and Julia Silge.

Acknowledgements

This book isn't just the product of Hadley, Mine, and Garrett, but is the result of many conversations (in person and online) that we've had with many people in the R community. Huge thanks to all contributors for the conversations, issues, and pull requests. And, as always, feedback and suggestions are welcome on the book repository.