Survival analysis for time-to-event data with tidymodels

  censored, tidymodels, workflows, workflowsets, tune, parsnip, yardstick

  Hannah Frick

We’re tickled pink to announce the support of survival analysis for time-to-event data across tidymodels. The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles. This new support makes survival analysis a first-class citizen in tidymodels and gives censored regression modeling the same flexibility and ease as classification or regression.

The functionality resides in multiple tidymodels packages. The easiest way to install them all is to install the tidymodels meta-package:

install.packages("tidymodels")

This blog post will highlight why this is useful, explain which additions we’ve made to the framework, and point to several places to learn more.

You can see a full list of changes in the release notes:

Increasing usefulness: Two perspectives

We’d like to situate the changes from two different perspectives: How this is useful for people already familiar with survival analysis as well as for people already familiar with tidymodels.

If you are already familiar with both: Excellent, this is very much for you! Read on for more details on how these two things come together.

Adding tidymodels to your tool kit

If you are already familiar with survival analysis but maybe not tidymodels, these changes now unlock a whole framework for predictive modelling for you. It applies tidyverse principles to modeling, meaning it strives to be consistent, composable, and human-centered. The framework covers the modeling process from the initial test/train split of the data all the way to tuning various models. Along the way it offers a rich selection of preprocessing techniques, resampling schemes, and performance metrics along with safe-guards against accidental overfitting. We make the full case for tidymodels at tidymodels.org.

Adding survival analysis to your tool kit

If you are already familiar with tidymodels but maybe not survival analysis, these changes let you leverage the familiar framework for an additional type of modeling problem. Survival analysis offers methods for modeling time-to-event data. While it has its roots in medical research, it has broad applications as that event of interest can be so much more than a medical outcome. Take customer churn as an example: We are interested in how long someone is a customer for and when they churn. For customers who churned, we have the complete time for which they were customers. For existing customers, we only know how long they’ve been customers for so far. Such observations are called censored. So what are our modeling choices here?

We could look at the time and model that as a regression problem. We could look at the event status and model that as a classification problem. Both options might get us somewhere close to an answer to our original modeling question but not quite there. Censored regression models let us model an outcome that includes both aspects, the time and the event status. And with that, it can deal with both censored and uncensored observations appropriately. With this type of model, we can predict the survival time, or in more applied terms, how long someone will stay as a customer. We can also predict the probability of survival at a given time point. This lets us answer questions like “How likely is it that this customer will churn after 3 months?". See which prediction types are available for which models at censored.tidymodels.org.

Ch-ch-changes: What’s new for censored regression?

The main components needed for this full-fledged integration of survival analysis into tidymodels were

  • Survival analysis models that can take censoring into account
  • Survival analysis performance metrics that can take censoring into account
  • Integrating changes required by these models and metrics into the framework

For the models, parsnip gained a new mode, "censored regression", for existing models as well as new model types such as proportional_hazards(). Engines for these reside in censored, the parsnip extension package for survival models. The "censored regression" mode has been around for a while and we’ve previously shared posts on our initial thoughts and the release of censored.

Now we’ve added the metrics: yardstick v1.3.0 includes new metrics for assessing censored regression models. Somewhat similar to how metrics for classification models can take class predictions or probability predictions as input, these survival metrics can take predicted survival times or predictions of survival probabilities as input.

The new metrics are

  • Concordance index on the survival time via concordance_survival()
  • Brier score on the survival probability and its integrated version via brier_survival() and brier_survival_integrated()
  • ROC curve and the area under the ROC curve on the survival probabilities via roc_curve_survival() and auc_roc_survival() respectively

The probability of survival is always defined at a certain point in time. We call that time point the evaluation time because it is then also the time point at which we want to evaluate model performance. Metrics that work on the survival probabilities are also called dynamic metrics and you can read more about them here:

The evaluation time is also the best example to illustrate the changes necessary to the framework. Most of them were under the hood but the evaluation time is user-facing. Let’s take a look at that.

While the need for evaluation times is dependent on type of metric, it is not actually specified as an argument to the metric functions. Like yardstick’s other metrics, those take pre-made predictions as the input. So where do you specify it then?

  • You need to specify it to directly predict survival probabilities, via predict() or augment(). We introduced the corresponding eval_time argument first for fitted models in parsnip and censored and have added it now for workflows.
  • You also need to specify it for the tuning functions tune_*() from tune and finetune as they will predict survival probabilities as part of the tuning process.
  • Lastly, the eval_time argument now shows up when working with tuning/resampling results such as in show_best() or autoplot(). Those changes span the packages generating and working with resampling results: tune, finetune, and workflowsets.

As we said, plenty of changes under the hood but you shouldn’t need to notice them. Everything else should work “as usual,” allowing the same ease and flexibility in combining tidymodels functionality for censored regression as for classification and regression.

The pieces come together: A case study

To see it all in action, check out the case study How long until building complaints are dispositioned? on the tidymodels website!

The city of New York publishes data on complaints received by the Department of Buildings that include how long it takes for a complaint to be dealt with (“dispositioned”) as well as several characteristics of the complaint. The case study covers a full analysis. We start with splitting the data into test and training sets, explore different preprocessing strategies and model types via tuning, and predict with a final model. It should give you a good first impression of how to use tidymodels for predictive survival analysis.

We hope you’ll find this new capability of tidymodels useful!

Acknowledgements

Many thanks to the people who contributed to our packages since their last release:

parsnip: @AlbanOtt2, @birbritto, @christophscheuch, @EmilHvitfeldt, @Freestyleyang, @gmcmacran, @hfrick, @jmunyoon, @joscani, @jxu, @marcelglueck, @mattheaphy, @mesdi, @millermc38, @nipnipj, @pgg1309, @rdavis120, @seb-mueller, @SHo-JANG, @simonpcouch, @topepo, @vidarsumo, and @wzbillings.

censored: @bcjaeger, @brunocarlin, @EmilHvitfeldt, @hfrick, @noahtsao, and @tripartio.

yardstick: @aecoleman, @asb2111, @atsyplenkov, @bgreenwell, @Dpananos, @EduMinsky, @EmilHvitfeldt, @heidekrueger, @hfrick, @iacrowe, @jarbet, @jxu, @mattwarkentin, @maxwell-geospatial, @moloscripts, @rdavis120, @ruddnr, @SimonCoulombe, @simonpcouch, @tbrittoborges, @tonyelhabr, @tripartio, @TSI-PTG, @vnijs, @wbuchanan, and @zkrog.

workflows: @Milardkh, @simonpcouch, and @topepo.

tune: @AlbertoImg, @dramanica, @EmilHvitfeldt, @epiheather, @hfrick, @joranE, @jrosell, @jxu, @kbodwin, @kenraywilliams, @KJT-Habitat, @lionel-, @marcozanotti, @MasterLuke84, @mikemahoney218, @PathosEthosLogos, @Peter4801, @simonpcouch, @topepo, and @walkerjameschris.

finetune: @EmilHvitfeldt, @hfrick, @jdberson, @jrosell, @mfansler, @ruddnr, @simonpcouch, and @topepo.

workflowsets: @dchiu911, @hfrick, @jkylearmstrong, @PathosEthosLogos, and @simonpcouch.