ggplot2 4.0.0

We’re tickled pink to announce the release of ggplot2 4.0.0. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

The new version can be installed from CRAN using:

install.packages("ggplot2")

This is a substantial release meriting a new major version, and contains a series of changes from a rewrite of the object oriented system from S3 to S7, large new features to smaller quality of life improvements and bugfixes. It is also the 18th anniversary of ggplot2 which is cause for celebration! In this blog post, we will highlight the most salient new features that come with this release. You can see a full list of changes in the release notes

library(ggplot2)
library(patchwork)

Adopting S7

In ggplot2, we use major version increments to indicate that something at the core of the package has changed. In this release, we have replaced many of ggplot2’s S3 objects with S7 objects. Like S3 and S4, S7 is also an object oriented system that uses classes, generics and methods. S7 is a newer system that aims to strike a good balance between the flexibility of S3 and formality of S4.

Mostly, this change shouldn’t be very noticeable when you’re just using ggplot2 for building regular plots. At best, you may notice that we’re more strictly enforcing types for certain arguments. For example, most ludicrous input is now rejected right away. This is due to how properties in S7 work, which get validated when a new object is instantiated.

element_text(hjust = "foo")
#> Error: <ggplot2::element_text> object properties are invalid:
#> - @hjust must be <NULL>, <integer>, or <double>, not <character>

However, it may require some adaptation on your end if you use ggplot2’s innards in unusual ways. For extension builders, a major benefit of using S7 is that one can now use double dispatch. This is most important for the update_ggplot() function (the successor of ggplot_add()), which determines what happens when you + an object to a plot. Now with S7, you can control what happens not only for right-hand side objects (which is how it used to work in S3), but also for the left-hand side objects.

We have put various pieces of backwards compatibility in to not break many packages that assumed the S3 structures of ggplot2. For example, we still return the data property with ggplot()$data, whereas the S7 way of accessing this should be ggplot()@data. Expect these to be phased out over time in favour of S7. We are preparing another blog post to help migrating from S3 to S7 for ggplot2 related packages.

Theme improvements

Themes in ggplot2 have long served the role of capturing any non-data aspects of styling plots. We have come to realise that the default look of layers, from what the default shape of points is to what the default colour palette is, are also not truly data-driven choices. The idea to put these defaults into themes has been around for a while and Dana Page Seidel did pioneering work implementing this as early as 2018. Now, years of waiting have come to fruition and we’re proud to announce this new functionality.

Ink and paper

The way layer defaults are now implemented differs slightly from typical aesthetics you know and love. Whereas layers aesthetics distinguish colour and fill, the theme defaults distinguish ink (foreground) and paper (background). A boxplot is unreadable without colour, but is perfectly interpretable without fill. In the boxplot case, the ink is thus clearly the colour whereas paper is the fill. In bar charts or histograms, the proportional ink principle prescribes that the fill aesthetic is considered foreground, and thus count as ink. To accommodate special cases, like lines in geom_smooth() or geom_contour(), we also added a third accent option. In short, the theme defaults have role-oriented settings that differ from the property-oriented settings in layers.

We’ve added these three options to all built-in complete themes. Not only propagate these automatically to the layer defaults, they are also used to style additional theme components. You may notice that the panel background colour is a blend between paper and ink, which is now how many elements are parametrised in complete themes.

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  theme_gray(paper = "cornsilk", ink = "navy", accent = "tomato")

If you’re customising a theme, you can use the theme(geom) argument to set a collection of defaults. The new function element_geom() can be used to set these properties. Additionally, if you want a layer to read the property from this theme element, you can use the from_theme() function in the mapping to access these variables¹.

ggplot(mpg, aes(class, displ)) +
  geom_boxplot(aes(colour = from_theme(accent))) +
  theme(
    geom = element_geom(accent = "tomato", paper = "cornsilk")
  )

A second conceptual difference in element_geom() pertains the the use of lines. In one role, like in a line graph, the line represents the data directly. In a second role, a line serves as separation between two units. For example, you can display countries as polygons and the line connecting the vertices separate out places that are inside a country versus places that are outside that country. These two roles are captured in a linewidth and linetype pair and a borderwidth and bordertype pair.

ggplot(faithful, aes(waiting)) +
  geom_histogram(bins = 30, colour = "black") +
  geom_freqpoly(bins = 30) +
  theme(geom = element_geom(
    bordertype = "dashed",
    borderwidth = 0.2,
    linewidth = 2,
    linetype = "solid"
  ))

Scales and palettes

In addition to the defaults for layers, default palettes are now also encapsulated in the theme. The relevant theme settings have the pattern palette.{aesthetic}.{type}, where type can be either discrete or continuous. This allows you to coordinate your colour palettes with the rest of the theme.

ggplot(mpg, aes(displ, hwy, shape = drv, colour = cty)) +
  geom_point() +
  theme(
    palette.colour.continuous = c("chartreuse", "forestgreen"),
    palette.shape.discrete = c("triangle", "triangle open", "triangle down open")
  )

The way this works is that all defaults scales now have palette = NULL as their default. During plot building, any NULL palettes are replaced by those declared in the theme.

Shortcuts

We like to introduce a new family of short cuts. Looking at code in the wild, we’ve come to realise that theme declarations are very often chaotic. The theme() functions has lots of arguments, long argument names (hello there, axis.minor.ticks.length.x.bottom!) and very little structure. To make themes a little bit more digestible, we’ve created the following helper functions:

These helper functions pass on their arguments to theme() after they’ve prepended a relevant prefix. For example, using theme_sub_legend(justification) will translate to theme(legend.justification). When you have >1 theme element to change in a cluster of settings, it quickly becomes less typing to enlist the relevant shortcut. As a bonus, your theme code will tend to self-organise and become somewhat more readable.

# Tired, verbose, chaotic
theme(
  panel.widths = unit(5, "cm"),
  axis.ticks.x = element_line(colour = "red"),
  axis.ticks.length.x = unit(5, "mm"),
  panel.background = element_rect(fill = NA),
  panel.spacing.x = unit(5, "mm")
)

# Wired, terse, orderly
theme_sub_axis_x(
  ticks = element_line(colour = "red"),
  ticks.length = unit(5, "mm")
) +
theme_sub_panel(
  widths = unit(5, "cm"),
  spacing.x = unit(5, "mm"),
  background = element_rect(fill = NA)
)

In addition to shortcuts for clusters of theme elements, we’ve also added a few variants to declare margins.

margin_auto() sets the margins in a CSS-like fashion similar to the margin and padding property.
- margin_auto(1) sets all four sides at once. It expands to margin(t = 1, r = 1, b = 1, l = 1).
- margin_auto(1, 2) sets horizontal and vertical sides. It expands to margin(t = 1, r = 2, b = 1, l = 2).
- margin_auto(1, 2, 3) expands to margin(t = 1, r = 2, b = 3, l = 2).
margin_part() has NA units as default, which will get replaced when the theme gets resolved. It roughly equates to ‘set some of the sides, keep others as they are’.

merge_element(
  margin_part(r = 20), # child
  margin_auto(10) # parent
)
#> [1] 10points 20points 10points 10points

New settings

To coordinate (non-text) margins and spacings in a theme, we’ve introduced spacing and margins as new root elements in the theme. Other spacings and margins at the leaf elements inherit from (scale with) these root elements. To facilitate the different spacings in ggplot2, unit elements can now use rel() to modify the inherited value. For example the default axis.ticks.length is now rel(0.5), making the y-axis ticks 0.5 cm in the plot below. If we set the axis.ticks.length.x to rel(2), it will double the value coming from axis.ticks.length, not double the value of spacing.

p <- ggplot(penguins, aes(bill_dep, bill_len, colour = species)) +
  geom_point(na.rm = TRUE)

p + theme(
  spacing = unit(1, "cm"), 
  margins = margin_auto(1, unit = "cm"),
  axis.ticks.length.x = rel(2)
)

We also made it easier to set plot sizes. Using the panel.widths and panel.heights arguments, you can control the sizes of the panels. This mechanism is distinct from using ggsave(width, height), where the whole plot, including annotations such as axes and titles is included. There are two ways to use these arguments:

Give a vector of units: each one will be applied to a panel separately and the vector will be recycled to fit the number of panels.
Give a single unit: which sets the total panel area (including panel spacings and inner axes) to that size.

Naturally, if you only have a single panel, these approaches are identical. If you have multiple panels and you want to set individual panels all to the same size (as opposed to the total size), you can take advantage of the recycling and use a length 2 unit vector.

In the plots below, you can notice that the panels span a different width despite the units adding up to the same amount (9 cm). This is because the ‘single unit’ approach also includes the panel spacings, but not the ‘separate units’ approach.

p1 <- p + facet_grid(~ island) +
  labs(title = "Separate units (per panel)") +
  # Using the new shortcut for panels
  theme_sub_panel(
    widths = unit(c(2, 3, 4), "cm"),
    heights = unit(3, "cm")
  )

p2 <- p + facet_grid(~ island) +
  labs(title = "Single unit (all panels)") +
  theme_sub_panel(
    widths = unit(9, "cm"),
    heights = unit(3, "cm")
  )

p1 / p2

Labels

We have added new ways that a plot retrieves labels for your variables. It is an informal convention in several packages including gt, Hmisc, labelled and others to use the ‘label’ attribute to store human readable labels for vectors. Now ggplot2 joins this convention and uses the ‘label’ attribute as the default label for a variable if present.

# The penguins dataset was incorporated into base R 4.5
df <- penguins

# Manually set label attributes.
# Other packages may offer better tooling than this.
attr(df$species, "label") <- "Penguin Species"
attr(df$bill_dep, "label") <- "Bill depth (mm)"
attr(df$bill_len, "label") <- "Bill length (mm)"
attr(df$body_mass, "label") <- "Body mass (g)"

ggplot(df, aes(bill_dep, bill_len, colour = sqrt(body_mass))) +
  geom_point(na.rm = TRUE)

It has also been entrenched in some workflows to use a ‘data dictionary’ or codebook. For labelling purposes these dictionaries often contain column metadata that include labels or descriptions for variables (columns) in the dataset. To make it easier to work with column labels, we added the labs(dictionary) argument. It takes a named vector of labels, that can easily be generated from a data dictionary by setNames() or dplyr::pull().

dict <- tibble::tribble(
  ~var,    ~label,
  "species",  "Penguin Species",
  "bill_dep", "Bill depth (mm)",
  "bill_len", "Bill length (mm)",
  "body_mass", "Body mass (g)"
)

ggplot(penguins, aes(bill_dep, bill_len, colour = body_mass)) +
  geom_point(na.rm = TRUE) +
  # Or:
  # labs(dictionary = dplyr::pull(dict, label, name = var))
  labs(dictionary = setNames(dict$label, dict$var))

One benefit to the label attributes or data dictionary approaches is that it is linked to your variables, not aesthetics. This means you can easily rearrange your aesthetics for a different plot, without having to painstakingly reorient the labels towards the correct aesthetics.

last_plot() +
  aes(body_mass, bill_len, colour = species)

There are a few caveats to these label attributes and data dictionary approaches though:

If the aesthetic is not a pure variable name the label is not used. You can see this in the sqrt(body_mass) in the first example, which does not use the ‘Body mass (g)’ label. We assume when a variable is adjusted in this way, this would need to be reflected in the label itself. It would therefore be inappropriate to use the label of the unadjusted variable. Use of the .data-pronoun counts as a pure variable name for labelling purposes.
Some attributes are more stable than others, and it is not ggplot2’s responsibility to babysit attributes. For example using head(<data.frame>) will typically drop attributes from atomic columns, whereas head(<tibble>) will not.

In addition, we’re also allowing to use functions in all the places you can declare labels. The labs() function, scale names and guide titles now accept functions that take in the labels generated by the lower hierarchies and return amended labels. It should be spelled out that the hierarchy from lowest priority to highest priority is the following:

The expression given in aes().
The entry in labs(dictionary).
The label attribute of the column.
The entry in labs(<aesthetic> = <label>).
The scale_*(name) argument.
The guide_*(title) argument.

We can see this hierarchy in action in the plot below: the function in the axis guide transforms the input from the labs() function.

ggplot(penguins, aes(bill_dep, bill_len, colour = species)) +
  geom_point(na.rm = TRUE) +
  scale_colour_discrete(name = toupper) +
  guides(x = guide_axis(title = tools::toTitleCase)) +
  labs(
    y = \(x) paste0(x, " variable"),
    x = "the label for the x variable"
  )

In addition to the labs()-labels, we also made labelling the levels of discrete scales easier. When the scale’s breaks are named, the scale’s labels will adopt the break’s names by default. This already was the case in continuous scales but now discrete scales have parity. A nice benefit of specifying labels this way is that they are directly linked to the breaks, which prevents the common mistake of specifying the labels argument without also setting the breaks argument, which may accidentally mismatch labels.

ggplot(penguins, aes(bill_dep, bill_len, colour = species)) +
  geom_point(na.rm = TRUE) +
  scale_colour_discrete(
    breaks = c(
      "Pygoscelis adeliae"     = "Adelie",
      "Pygoscelis papua"       = "Gentoo",
      "Pygoscelis antarcticus" = "Chinstrap"
    )
  )

Discrete scales

In this release we have tried to improve the ‘freedom’ afforded by discrete position scales. Previously, discrete values were always mapped to an integer sequence starting at 1 going up to the number of levels. Instead, we wanted to allow for different mappings that deviated from that pattern. While it is a bit foreign for position scales, ggplot2 already had a mechanism to assign alternate values to the levels of a scale: palettes! You can now use the palette argument like you would for non-position scales. It makes it easier to indicate any grouping structure along the axis, like separating the orange juice (OJ) groups from the vitamin C (VC) groups in the plot below.

ggplot(ToothGrowth, aes(interaction(dose, supp, sep = "\n"), len)) +
  geom_boxplot() +
  scale_x_discrete(
    palette = scales::pal_manual(c(1:3, 5:7))
  )

A second improvement we made to the placement of discrete levels is that we give greater control over the continuous limits. The continuous limits of a discrete scale used to be an implementation detail that kept track of any ‘additional space’ layers were taking up, for example because they use a width parameter. Now, these can be declared directly, making it easier to synchronise limits across plots or even facets. In the plot below, we’re using the continuous.limits argument to ensure that all the bars have the same width; regardless of how many levels the x scale has to accommodate.

p1 <- ggplot(mpg, aes(class)) +
  geom_bar() +
  facet_wrap(~ drv, ncol = 1, scales = "free_x")

p2 <- p1 + scale_x_discrete(continuous.limits = c(1, 5))

(p1 + labs(title = "Free limits")) | 
(p2 + labs(title = "Fixed limits"))

Also absent from discrete scales was the ability to set minor breaks. Admittedly, they are less useful than minor breaks in continuous scales. In contrast to discrete (major) breaks, minor_breaks uses numeric input instead, allowing you to fine-tune placement without being bound by the scale’s levels. With a few tweaks of the theme, you can conceivably use minor breaks to visually separate levels as an alternative to the centre-lines for major breaks.

p1 <- ggplot(mpg, aes(drv, hwy, colour = factor(year))) +
  geom_point(position = "jitterdodge") +
  guides(colour = "none") +
  scale_x_discrete(
    minor_breaks = scales::breaks_width(1, offset = 0.5),
    # To show that the minor axis ticks take on these values
    guide = guide_axis(minor.ticks = TRUE)
  )

p2 <- p1 + 
  theme(panel.grid.major.x = element_blank()) +
  theme_sub_axis_bottom(ticks = element_blank(), minor.ticks = element_line())

p1 | p2

Discrete position scales now also have access to secondary axes. In contrast to continuous scales, discrete scales don’t support transformations. So instead of sec_axis(), it is recommended to use dup_axis(). To allow for arbitrary positions for dup_axis(breaks), these can take numeric values or one of the discrete levels. They are not truly useful to for showing two aligned datasets of different scales, but they can serve as annotations. For example, they can display some summary statistics about the groups.

ggplot(mpg, aes(class, cty)) +
  geom_boxplot() +
  scale_x_discrete(
    sec.axis = dup_axis(
      name = "Counts",
      # You can use numeric input for breaks
      breaks = seq_len(length(unique(mpg$class))),
      # Watch out for the order of `table()` and your levels!
      labels = paste0("n = ", table(mpg$class))
    )
  )

Position aesthetics

Layers consist of three components: stats, geoms and positions. While stats and geoms have their own aesthetics, like weight or linewidth, the position adjustments did not. In this release, positions can also declare their own aesthetics. You can map data to these aesthetics like you would for geom or stat aesthetics.

In position_nudge() for example, we now have the nudge_x and nudge_y parameters as aesthetics. ² Two benefits are that we can now use expressions in aes() to declare these and they are vectorised. We use that advantage in the plot below where we use sign() in a divergent bar chart to determine the left-right direction of the nudge.

# Taken from:
# https://ourworldindata.org/grapher/share-electricity-coal?tab=table&tableFilter=continents
coal <- tibble::tribble(
  ~continent,  ~pct_1985, ~pct_2024,
  "Africa",        53.87, 24.68,
  "Asia",          32.60, 51.19,
  "Europe",        32.84, 12.91,
  "North America", 48.93, 13.79,
  "South America",  2.91,  3.31,
  "Oceania",       58.75, 39.26
) |>
  dplyr::mutate(pp_difference = pct_2024 - pct_1985)

ggplot(coal, aes(pp_difference, continent)) +
  geom_col() +
  geom_text(
    aes(nudge_x = sign(pp_difference) * 3, 
        label = pp_difference)
  ) +
  labs(x = "Change in electricity generated by coal (pp)")

A second position adjustment that has gotten its own aesthetic is position_dodge(). In the plot below, we see for sports where we do not have records for both ‘sex = “f”’ and ‘sex = “m”’ only one box is drawn just beneath the centre line. This is true for ‘water polo’ where we have no records for ‘f’, but also netball and gymnastics where there are no records for ‘m’. For sports where there are records for both sexes, the “f” is depicted beneath the centre line and “m” is depicted above the centre line. Depending on your aesthetic sensibilities, this inconsistency can be a major pain.

sports <- c("water polo", "swimming", "gymnastics", "field", "netball")
p <- ggridges::Aus_athletes |>
  dplyr::filter(sport %in% sports) |>
  ggplot(aes(height, sport, fill = sex)) +
  geom_boxplot(position = position_dodge(preserve = "single"))

p

The origin of this inconsistency is that ggplot2 doesn’t have an understanding of groups other than that they exist. It doesn’t know that groups are formed by fill and what levels populate this aesthetic. To break ggplot2’s ignorance, we now have the order aesthetic for position_dodge().

p + aes(order = sex)

Using that aesthetic to the position adjustment soothes the soul by putting all the right groups in the right places.

Wrapping directions

The facet_wrap() function has had two arguments controlling the layout: dir which can be "h" or "v", and as.table with can be TRUE or FALSE. Together, these gave a total of 4 layout options. Arguably there are 8 sensible options in total though, so we were missing out on the layout. To simplify having to juggle two arguments for 4 options, we’re now just using one argument (dir) for 8 options. The new options are all two letter codes using combinations of t (top), r (right), b (bottom) and l (left). The combination will tell you where the first facet level will be. Both br and rb will start in the bottom-right with the first facet. Then the order will tell you about the filling direction, where starting with b will fill bottom-to-top and starting with r will fill right-to-left.

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point()

p1 <- p + 
  facet_wrap(~ vctrs::vec_group_id(class), dir = "br") +
  labs(title = "dir = 'br'")

p2 <- p +
  facet_wrap(~ vctrs::vec_group_id(class), dir = "rb") +
  labs(title = "dir = 'rb'")

p1 | p2

To cover all 8 options, we list them here:

"lt": start in the top-left, start filling left-to-right.
"tl": start in the top-left, start filling top-to-bottom.
"lb": start in the bottom-left, start filling left-to-right.
"bl": start in the bottom-left, start filling bottom-to-top.
"rt": start in the top-right, start filling right-to-left.
"tr": start in the top-right, start filling top-to-bottom.
"rb": start in the bottom-right, start filling right-to-left.
"br" start in the bottom-right, start filling bottom-to-top.

Free space in wrapping

The facet_grid(space) argument can ensure that panels are allocated space in proportion to their data range. This works because all data within a row share a y-axis, and data within a column share an x-axis. Historically, this argument did not have an equivalent in facet_wrap() because axes aren’t shared. We realised that there is a narrow circumstance in which each column has a consistent axis, and this is when there is only one row. The inverse also holds true for rows when there is only one column. In this release, we’ve added facet_wrap(space) that sets the panel sizes in these circumstances.

ggplot(penguins, aes(bill_dep, bill_len, colour = species)) +
  geom_point(na.rm = TRUE) +
  facet_wrap(~ island, scales = "free_x", space = "free_x")

We can note that the Dream and Torgersen islands have narrower panels because they don’t have the Gentoo penguin with low bill depths.

Layer layout

We’ve added the argument layer(layout), which instructs facets on how to handle the data. Generally speaking, facets or custom layouts are free to interpret instructions as they see fit, so it is not set in stone. Nonetheless, we’ve come up with the following interpretations for facet_wrap() and facet_grid().

layout = NULL (the default) uses the faceting variables to assign data to a panel.
layout = "fixed" repeats the data for every panel and ignores faceting variables.
layout = <integer> assigns to data to a specific panel.

In addition, specifically for facet_grid() the following options also apply:

layout = "fixed_cols" pools data for every column and repeats it within the column’s panels.
layout = "fixed_rows" pools data for every row and repeats it within the row’s panels.

ggplot(penguins, aes(bill_dep, bill_len)) +
  # Repeat within every row
  geom_point(na.rm = TRUE, colour = "grey", layout = "fixed_rows") +
  # Use facetting variables (default)
  geom_point(na.rm = TRUE, layout = NULL) +
  # Pick particular panel
  annotate(
    "text", x = I(0.5), y = I(0.5),
    label = "Panel 6", layout = 6
  ) +
  facet_grid(island ~ species)

In previous incarnations of ggplot2, people went through some acrobatics to get data to repeat across panels. With these new options, this should be a walk in the park.

Styling updates

Boxplots

In geom_boxplot(), you may have become accustomed to all the different options for styling outliers like outlier.colour or outlier.shape. Now, we’re also enabling styling the different parts of the boxplot: the median line, the box, the whiskers and the staples. You can assign different colours, line type or line width to these parts of the boxplot.

ggplot(mpg, aes(class, hwy, colour = class)) +
  geom_boxplot(
    whisker.linetype = "dashed",
    box.colour = "black",
    median.linewidth = 2,
    staplewidth = 0.5, # show staple
    staple.colour = "grey50"
  ) +
  guides(colour = "none")

For consistency, geom_crossbar() has been given the same treatment, but uses the middle.* prefix where geom_boxplot() uses the median.* prefix. Because middle.linewidth and median.linewidth have taken over the role of fatten and are aligned with other graphical properties, the fatten argument is now deprecated.

Violin & quantiles

It has been an inconvenience for some time that the quantile computation in violin layers was computed on the density data rather than the input data. To make the quantile computation more faithful to the real data, we had to properly delegate the responsibilities to the correct parts of the layer. The stat part of the layer is now in charge of calculating quantiles of the input data via the stat_ydensity(quantiles) arguments. By default, the quantiles are the 25th, 50th and 75th percentiles and are always computed. Whether these quantiles are also displayed, is under the purview of the geom part of the layer. We’ve taken a similar approach as boxplots shown above, in that we now have quantile.colour, quantile.linetype and quantile.linewidth arguments to style the quantile lines. Previously, quantiles were not displayed by default. To mirror that behaviour, we’ve set quantile.linetype = 0 (blank, no line) by default. This means that to turn on the display of quantiles, you have to set a non-blank line type.

ggplot(mpg, aes(class, hwy, fill = class)) +
  geom_violin(
    quantiles = c(0.1, 0.9),
    quantile.linetype = 1
  ) +
  guides(fill = "none")

Labels

geom_label() also has new styling options. It now has a linetype and linewidth aesthetic, which can be mapped from the data. The linewidth aesthetic replaces the label.size argument, which used to determine the line width of the label border. In addition to the new aesthetics, geom_label() has two new arguments: border.colour and text.colour which set the colour for the border and text respectively. When these are set, it overrules the colour aesthetic for a part of the label. In the plot below, we fix the text.colour to black, so the colour aesthetic applies to the border, not the text.

ggplot(mtcars) +
  aes(
    wt, mpg,
    label = rownames(mtcars),
    colour = factor(cyl),
    linetype = factor(vs),
    linewidth = factor(am)
  ) +
  geom_label(text.colour = "black") +
  scale_linewidth_manual(values = c(0.3, 0.6))

Area and ribbons

Both geom_area() and geom_ribbon() now allow a varying fill aesthetic within a group. Such a fill is displayed as a gradient, and therefore requires R 4.1.0+ and a compatible graphics device.

ggplot(economics, aes(date, unemploy)) +
  geom_area(aes(fill = uempmed))

New stats

Manual

Many ggplot extensions are based on stats, which allows you to perform arbitrary computations on data before handing it off the drawing functions. The new stat_manual() aims to give you the same extension powers, but without doing the formal ritual of defining a class and constructor. You can provide it any function that both ingests and returns a data frame. It can create new aesthetics or modify pre-existing aesthetics as long as eventually the geom part of the layer has their required aesthetics. In the example below, we use stat_manual() with a geom and a function, but also show you how to use a geom with stat = "manual".

make_centroids <- function(df) {
  transform(
    df,
    xend = mean(x, na.rm = TRUE),
    yend = mean(y, na.rm = TRUE)
  )
}

make_hull <- function(df) {
  df <- df[complete.cases(df), , drop = FALSE]
  hull <- chull(df$x, df$y)
  df[hull, , drop = FALSE]
}

ggplot(penguins, aes(bill_len, bill_dep, colour = species)) + 
  geom_point(na.rm = TRUE) + 
  # As a stat, provide a geom
  stat_manual(
    geom = "segment", # function creates new xend/yend for segment
    fun = make_centroids,
    linewidth = 0.2,
    na.rm = TRUE
  ) +
  # As a geom, provide the stat
  geom_polygon(
    stat = "manual",
    fun = make_hull,
    fill = NA,
    linetype = "dotted"
  )

Connection

It has come to our attention that generalisations of geom_step() have become commonplace in several extensions. Stairstep ribbons are used in Kaplan-Meier curves to indicate uncertainty. Stairstep area plots make for some great histograms. To this end, we’re introducing stat_connect(), which can connect observations in a stairstep fashion without constraining a geom choice. In the plot, you can see it work on the y, ymin and ymax aesthetics indiscriminately with distinct geoms.

eco <- economics |>
  dplyr::mutate(year = lubridate::year(date)) |>
  dplyr::summarise(
    min = min(unemploy),
    max = max(unemploy),
    mid = median(unemploy),
    .by = year
  )

ggplot(eco, aes(year, y = mid, ymin = min, ymax = max)) +
  geom_line(stat = "connect") +
  geom_ribbon(stat = "connect", alpha = 0.4)

However, we aren’t necessarily limited to stairstep connections. We can use a 2-column numeric matrix to sketch out other types of connections. For example if we use plogis() to create a logistic transition, we can make ‘bump chart’-like connections. Or you can use a zigzag pattern if silliness is your cup of tea.

x <- seq(0, 1, length.out = 20)[-1]
smooth <- cbind(x, scales::rescale(plogis(x, location = 0.5, scale = 0.1)))
zigzag <- cbind(c(0.4, 0.6, 1), c(0.75, 0.25, 1))

ggplot(head(eco, 10), aes(year, y = mid)) +
  geom_point() +
  stat_connect(aes(colour = "smooth"), connection = smooth) +
  stat_connect(aes(colour = "zigzag"), connection = zigzag)

Coord reversal

It has been possible to use reverse transformations for scales to flip a plot direction. You could even use scales::transform_compose() to do, for example, a reversed log₁₀ transformation to highlight the smallest p-values. However, the transformation approach has a few limitations, notably that discrete scales do not support transformations and not all coords obeyed transformed scales. You can’t really combine coord_sf() + scale_x_log10() for example. To remedy this limitation, coords now have a reverse argument that can typically be "none", "x", "y" or "xy" that reverse some directions. If you are from the lands down under, you can now plot a map in your native orientation.

world <- sf::st_as_sf(maps::map('world', plot = FALSE, fill = TRUE))

ggplot(world) +
  geom_sf() +
  coord_sf(reverse = "y")

In coord_radial(), the reverse argument replaces the direction argument that only worked for the theta-direction. Contrary to many coords, coord_radial(reverse) takes "none", "theta", "r" and "thetar" instead of the x/y directions.

Goodies for extensions

Layers

If you’ve ever written a Geom class, chances are that you’ve danced with grid::gpar() and frowned at the use of .pt and .stroke and whatnot. We’ve made a wrapper for grid::gpar() that applies the ggplot2 interpretation of settings and translates them to grid settings. For example, linewidth (or the lwd grid setting) is interpreted in millimetres in ggplot2, whereas grid expects them in points. The gg_par() function helps these translations, protects against NAs in line types and strokes, removes 0-length vectors, and has additional logic for point strokes.

gg_par(lwd = 5)
#> $lwd
#> [1] 14.22638
grid::convertUnit(unit(5, "mm"), "pt")
#> [1] 14.2263779527559points

For geom and stat extensions, the magic usually happens in the Geom* or Stat* classes and the constructor is simply boilerplate code used to populate a layer. To reduce the amount of boilerplate code, you can now use make_constructor() on Geom* and Stat* classes. It produces a typical constructor function that adheres to several conventions, like exposing arguments to compute/drawing methods. To illustrate, notice how the following constructor for GeomPath includes arguments for lineend and linejoin automatically because they are arguments to the GeomPath$draw_panel() method.

geom_foo <- make_constructor(GeomPath, position = "stack")
print(geom_foo)
#> function (mapping = NULL, data = NULL, stat = "identity", position = "stack", 
#>     ..., arrow = NULL, arrow.fill = NULL, lineend = "butt", linejoin = "round", 
#>     linemitre = 10, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) 
#> {
#>     layer(mapping = mapping, data = data, geom = "path", stat = stat, 
#>         position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
#>         params = list2(na.rm = na.rm, arrow = arrow, arrow.fill = arrow.fill, 
#>             lineend = lineend, linejoin = linejoin, linemitre = linemitre, 
#>             ...))
#> }
#> <environment: 0x00000223e1277e08>

In addition, you can now also use the #' @aesthetics <Geom/Stat/Position> roxygen tag to automatically populate an ‘Aesthetics’ section of your documentation. The code below;

GeomDummy <- ggproto("GeomDummy", Geom, default_aes = aes(foo = "bar"))

#' <rest of roxygen comments>
#' @aesthetics GeomDummy
geom_foo <- make_constructor(GeomDummy)

will generate the following .Rd code:

\section{Aesthetics}{

\code{geom_dummy()} understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
\tabular{rll}{
 • \tab \code{foo} \tab → \code{"bar"} \cr
 • \tab \code{\link[ggplot2:aes_group_order]{group}} \tab → inferred \cr
}

Learn more about setting these aesthetics in \code{vignette("ggplot2-specs")}.
}

Themes

To replicate how themes are handled internally, you can now use complete_theme(). It fills in all missing elements and performs typical checks.

my_theme <- theme(plot.background = element_rect(fill = NA))
length(my_theme)
#> [1] 1

completed <- complete_theme(my_theme)
length(completed)
#> [1] 144

# You should give rect elements to text settings
completed <- theme(legend.text = element_rect()) |>
  complete_theme()
#> Error in `plot_theme()`:
#> ! Can't merge the `legend.text` theme element.
#> Caused by error in `method(merge_element, list(ggplot2::element, class_any))`:
#> ! Only elements of the same class can be merged.

# Unknown elements
completed <- theme(foobar = 12) |>
  complete_theme()
#> Warning in plot_theme(list(theme = theme), default = default): The `foobar` theme element is not defined in the element hierarchy.

We’re also introducing point and polygon theme elements. These aren’t used in any of the base ggplot2 theme settings, but you can use them in extensions. The example below demonstrates registering new theme settings and that points and polygons follow inheritance and can be rendered.

# Let's say your package 'my_pkg' registers custom point/polygon elements
register_theme_elements(
  my_pkg_point = element_point(colour = "red"),
  my_pkg_polygon = element_polygon(fill = NA),
  element_tree = list(
    my_pkg_point = el_def(element_point, inherit = "point"),
    my_pkg_polygon = el_def(element_polygon, inherit = "polygon")
  )
)

# Which should inherit from the root point/polygon theme elements
my_theme <- theme(
  point = element_point(shape = 17),
  polygon = element_polygon(linetype = "dotted")
) |>
  complete_theme()

# Rendering your elements
pts <- calc_element("my_pkg_point", my_theme) |>
  element_grob(
    x = c(0.2, 0.5, 0.8),
    y = c(0.8, 0.2, 0.5)
  )
poly <- calc_element("my_pkg_polygon", my_theme) |>
  element_grob(
    x = c(0.1, 0.5, 0.9),
    y = c(0.9, 0.1, 0.5)
  )

# Drawing the elements
grid::grid.newpage()
grid::grid.draw(pts)
grid::grid.draw(poly)

Acknowledgements

Thank you to all the people who contributed their issues, code and comments to this release: @83221n4ndr34, @Abiologist, @acebulsk, @adisarid, @agila5, @agmurray, @agneeshbarua, @aijordan, @amarjitsinghchandhial, @amkilpatrick, @amongoodtx, @Andtise, @andybeet, @antoine4ucsd, @aphalo, @aravind-j, @arcresu, @arnaudgallou, @assaron, @baderstine, @BajczA475, @bakaburg1, @BegoniaCampos, @benjaminhlina, @billdenney, @binkleym, @bkohrn, @bnprks, @botanize, @Breeze-Hu, @brianmsm, @brunomioto, @btupper, @bwu62, @carljpearson, @catalamarti, @cbrnr, @ccani007, @ccsarapas, @cgoo4, @clauswilke, @Close-your-eyes, @collinberke, @const-ae, @dafxy, @DanChaltiel, @danli349, @dansmith01, @daorui, @david-romano, @davidhodge931, @dinosquash, @dominicroye, @dsconnell, @EA-Ammar, @EBukin, @elgabbas, @eliocamp, @elipousson, @erinnacland, @etiennebacher, @EvaMaeRey, @evanmascitti, @eyayaw, @fabian-s, @fkohrt, @FloLecorvaisier, @fmarotta, @Fugwaaaa, @fwunschel, @g-pacheco, @gaborcsardi, @gregorp, @guqicun, @hadley, @heinonmatti, @heor-robyoung, @herry23xet, @HMU-WH, @HRodenhizer, @hsiaoyi0504, @Hy4m, @IndrajeetPatil, @jack-davison, @JacobBumgarner, @JakubKomarek, @jansim, @japhir, @jbengler, @jdonland, @jeraldnoble, @Jigyasu4indp, @jiw181, @jmbuhr, @JMeyer31, @jmgirard, @jnolis, @joaopedrosusselbertogna, @johow, @jonocarroll, @jpquast, @JStorey42, @JThomasWatson, @julianbarg, @julou, @junjunlab, @JWiley, @kauedesousa, @kdarras, @kevinushey, @kevinwolz, @kieran-mace, @kkellysci, @kobetst, @koheiw, @krlmlr, @KryeKuzhinieri, @kylebutts, @laurabrianna, @lbenz730, @lcpmgh, @lgaborini, @lgibson7, @LGraz, @llrs, @louis-heraut, @ltierney, @Lucielclr, @luhann, @m-muecke, @marcelglueck, @margaret-colville, @markus-schaffer, @Maschette, @MathiasAmbuehl, @MathieuYeche, @mattansb, @MauricioCely, @MaxAtoms, @mcol, @mfoos, @MichaelChirico, @MichelineCampbell, @mikmart, @misea, @mjskay, @mkoohafkan, @mlaparie, @MLopez-Ibanez, @mluerig, @mohammad-numan, @MoREpro, @mtrsl, @muschellij2, @mzavattaro, @nicholasdavies, @njspix, @nmercadeb, @noejn2, @npearlmu, @Olivia-Box-Power, @olivroy, @oracle5th, @oskard95, @palderman, @PanfengZhang, @paulfajour, @PCEBrunaLab, @petrbouchal, @pgmj, @phispu, @PietrH, @pn317, @ppoyk, @pradosj, @psoldath, @py9mrg, @qli84, @randyzwitch, @raphaludwig, @RaynorJim, @rdboyes, @reechawong, @rempsyc, @rfgoldberg, @rikivillalba, @rishabh-mp3, @RodDalBen, @rogerssam, @rsh52, @rwilson8, @salim-b, @sambtalcott, @samuel-marsh, @schloerke, @schmittrjp, @sierrajohnson, @smouksassi, @stitam, @stragu, @strengejacke, @sunta3iouxos, @szkabel, @taozhou2020, @tdhock, @telenskyt, @teunbrand, @the-Hull, @thgsponer, @thomasp85, @ThomasSoeiro, @Tiggax, @tikkss, @TimTaylor, @tombishop1, @tommmmi, @totajuliusd, @trafficfan, @tungttnguyen, @tvatter, @twest820, @ujtwr, @venpopov, @vgregoire1, @victorcat4, @victorfeagins, @vivekJax, @wbvguo, @willgearty, @williamlai2, @withr, @wvictor14, @XdahaX, @yjunechoe, @yoshidk6, @YUCHENG-ZHAO, @Yunuuuu, @yutannihilation, @yzz32, @zhengxiaoUVic, and @zjwinn.

Normally, aes() is strictly used to map data instead of setting a fixed property. We diverge from this API for pragmatic reasons, not theoretical ones. ↩︎
Aesthetics of the position adjustment are not be confused with position aesthetics. Position aesthetics like x and y are transformed by a scale, whereas aesthetics of the position adjustment like nudge_x and nudge_y are not (akin to width and height). ↩︎

ggplot2 4.0.0

Adopting S7

Theme improvements

Ink and paper

Scales and palettes

Shortcuts

New settings

Labels

Discrete scales

Position aesthetics

Facets

Wrapping directions

Free space in wrapping

Layer layout

Styling updates

Boxplots

Violin & quantiles

Labels

Area and ribbons

New stats

Manual

Connection

Coord reversal

Goodies for extensions

Layers

Themes

Acknowledgements