The summer of ggplot2 - wooooo!

This summer I had the great fortune to accept the ggplot2 internship baton from Thomas Lin Pedersen and spend ten weeks developing new features and fixing bugs in ggplot2. My internship was a great experience, and I learned a ton from working with Hadley and digging into the ggplot2 codebase.

Daily life as a ggplot2 intern

My ggplot2 work consisted of a few different things: issue and pull request triage, bug fixes, and developing new features. The tidyverse gets a lot of new issues opened on GitHub, so to help keep things organized I tagged each ggplot2 issue with labels describing the type of issue and which part of ggplot2 it related to (scales, themes, layers, etc.). In some cases I helped the authors create reprexes so that we could diagnose the problems and determine when they were fixed. I triaged new pull requests as well, merging anything that was very straightforward, helping authors conform to the tidyverse style guide, and requesting reviews from Hadley for bigger changes.

Fixing bugs was fun detective work and helped me build an understanding of how ggplot2 works internally. Along the way I got pretty good at using R’s debugging tools and have come to really appreciate the value of following a consistent coding style. It is vastly easier to understand what a piece of code is doing when it is written in a readable and consistent way. Enforcing a style guide is time very well spent.

I implemented some significant new features during this summer as well. The first was an overhaul of certain scale types in ggplot2. Scales are what control how data gets mapped to visual elements on a plot, and they vary based on the type of data (for instance, you can’t map a continuous variable to discrete shapes). As part of this project I improved support for datetime scales and ordered factors. I also added built-in support for viridis color palettes, a move that has garnered me some significant Twitter popularity. 😎

I created a new position function, position_dodge2(), for placing box plots, bars, and rectangles. This project started out as a fix for a bug where boxes of differing widths could not be dodged from one another using the existing position_dodge() function, but it quickly grew into a larger project that fixed not only the original issue, but three other issues as well.

The last big effort was a mostly behind-the-scenes refactor of the way text gets placed on plots. This was a fun and challenging project that required me to get familiar with the grid graphics system for the first time. ggplot2 users should notice only very minor changes as a result of this project—primarily to the way facet strip labels get customized—but internally we’ve really cleaned up the code, reduced duplication, and documented a number of internal functions to make them easier to work with in the future.

Some reflections

I’m not usually prone to a lot of impostor syndrome, but before I started my work on ggplot2 I worried about whether I was going to be up to the task of contributing to a complex and unfamiliar package. Before this summer I had used ggplot2 extensively, but didn’t know all that much about how it works under the hood. I’d never submitted a package to CRAN, and I’d never worked on any piece of software that had more than a couple users, let alone the popularity of ggplot2. I’ve learned a lot in the last ten weeks, and with Hadley’s mentorship it’s been a great experience.

Working remotely adds its own challenges, and I think that a key to being successful in this kind of position is a willingness to dive into a codebase, tinker around, try lots of things (and break lots of things in the process) until you figure out how the pieces fit together. Being able to work independently helped me a lot, but I never lacked guidance from Hadley; our regular calls and chats on Slack kept me on track and got me out of a lot of pickles.

I am very grateful to Hadley for this opportunity, and sad that my summer of ggplot2 has come to an end. It has been really rewarding to contribute back to a package that I’ve used so much, and though I’ll no longer be the intern I hope to keep contributing to ggplot2 in the future.

Upcoming events
San Francisco, CA
Sep 19-20
An intense two day workshop that gives you the skills to build your own tidy tools. Take this class if you have some experience programming in R and you want to learn how to effectively tackle larger scale problems.
Washington, DC
Oct 5-Oct 6
This two-day workshop covers the most important parts of “R for Data Science”, giving you a running start in learning the tidyverse.