Visualizing Survival Data with the {ggsurvfit} R Package

Daniel D. Sjoberg & Mark Baillie

{ggsurvfit}

Licensing


This work is licensed under Creative Commons Zero v1.0 Universal.

Authors

Daniel D. Sjoberg

Mark Baillie

A whistle stop tour of {ggsurvfit}

  • Motivation and why {ggsurvfit} ?

  • Overview of package functionality

  • Additional examples

Motivation

A widely used statistical summary

Time-to-event (a.k.a Survival or Kaplan-Meier) summaries are commonplace, from oncology to cardiovascular.

KM displays the estimated cumulative proportion over time of patients who survive (or report an event)

However, it was not such a simple start:

John Tukey insisted Edward L. Kaplan and Paul Meier combine their efforts after submitting independent papers to JASA, which took 4 more years for the authors of the “KM” estimator to publish

Afterwards “Paul Meier himself is said to have spoken with bemusement about people plotting KM estimates over time and was not convinced he actually liked it” Morris et al, 2018

library(survival)
fit <- survfit(Surv(time, status) ~ surg, data = df_colon) 
plot(fit)

Can we do better?

There are many design and analytical choices to be made when designing even a simple KM plot – these choices can lead to wide variation in presentation quality.

library(ggsurvfit)
fit2 <- survfit2(Surv(time, status) ~ surg, data = df_colon) 
ggsurvfit(fit2) + scale_ggsurvfit()

{ggsurvfit} overview

  • Ease the creation of time-to-event summary figures with ggplot2

  • Concise and modular code

  • Ready for publication or sharing figures

  • Sensible defaults

  • Supports CDISC data model

  • Also supports competing risks cumulative incidence summaries

Why {ggsurvfit} ?

Use ggplot2 functions

  • Each ggsurvfit add-on function is written as a proper ggplot2 geom
  • Enables package functions woven with ggplot2 functions seamlessly
  • Don’t need to learn to style with ggsurvfit functions
  • Use your ggplot2 knowledge if you want to customize

Limitless customization

  • Modify x-axis scales or any other plot feature and risk table will still align with plot

Simple saving and export through ggplot2::ggsave()

Ready to publish legends

  • Raw variable names do not appear in the figure
fit <- survfit2(Surv(time, status) ~ surg, data = df_colon) 
survminer::ggsurvplot(fit)
ggsurvfit(fit) + scale_ggsurvfit()

Basic Example

survfit2(Surv(time, status) ~ surg, data = df_colon) |> 
  ggsurvfit() +
  add_risktable()

  • The Good
    • Simple code and figure is nearly publishable
    • Risk table with both no. at risk and events easily added
    • x-axis label taken from the time column label
    • Can use ggplot2 + notation
  • The Could-Be-Better
    • y-axis label is incorrect, and the range of axis is best at 0-100%
    • Axis padding a bit more than I prefer for a KM figure
    • x-axis typically has more tick marks for KM figure

Basic Example

survfit2(Surv(time, status) ~ surg, data = df_colon) |> 
  ggsurvfit() +
  add_risktable() +
  scale_ggsurvfit() +
  labs(y = "Recurrence-free Progression")

  • Padding has been reduced and curves begin in the upper left corner of plot

  • x-axis reports additional time points (and as a result, the risk table as well)

  • We updated the y-axis label weaving standard ggplot2 functions

Basic Example

survfit2(Surv(time, status) ~ surg, data = df_colon) |> 
  ggsurvfit() +
  add_risktable() +
  scale_ggsurvfit() +
  labs(y = "Recurrence-free Progression") +
  ggeasy::easy_move_legend("top")

  • Padding has been reduced and curves begin in the upper left corner of plot

  • x-axis reports additional time points (and as a result, the risk table as well)

  • We updated the y-axis label weaving standard ggplot2 functions

  • We can even use ggplot2-extender functions

Additional examples

Default and styled modifications with ggplot2

{ggsurvfit} defaults

gg_default <-
  survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit() +
  add_confidence_interval() +
  scale_ggsurvfit() +
  labs(title = "Default")

gg_default

{ggplot2} styled

gg_styled <-
  gg_default +
  coord_cartesian(xlim = c(0, 8)) +
  scale_y_continuous(
    limits = c(0, 1),
    labels = scales::percent, 
    expand = c(0.01, 0)
  ) +
  scale_x_continuous(breaks = 0:9, expand = c(0.02, 0)) +
  scale_color_manual(values = c('#54738E', '#82AC7C')) +
  scale_fill_manual(values = c('#54738E', '#82AC7C')) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(ncol = 1)) +
  labs(
    title = "{ggplot2} styled",
    y = "Percentage Survival"
  )

gg_styled

{ggplot2} styled

Side-by-side

library(patchwork)
gg_default + gg_styled

Risk tables

{ggsurvfit} defaults

survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit() +
  add_risktable()

Group by statistic or strata

ggrisktable <-
  survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit() +
  scale_ggsurvfit() +
  add_risktable(risktable_group = "risktable_stats") 
ggrisktable

Colour encoding strata

ggrisktable +
  add_risktable_strata_symbol(symbol = "\U25CF", size = 10)

Customizing the risktable statistics

survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit() +
  add_risktable(risktable_stats = "{n.risk} ({cum.event})") 

Quantiles

Median summary

survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit(linewidth = 0.8) +
  add_censor_mark() +
  add_quantile(y_value = 0.5) +
  scale_ggsurvfit()

At a given timepoint

survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit(linewidth = 0.8) +
  add_censor_mark() +
  add_quantile(x_value = 5, linetype = "solid", 
               linewidth = 1.0, alpha = 0.3) +
  scale_ggsurvfit()

KMunicate and themes

What are the elements of an effective and publishable KM plot?

There are many options to consider and many guidances available:

  • Morris et al. 2018 provide useful guidance for publication figures

  • To get figures that align with KMunicate use the theme_ggsurvfit_KMunicate() theme along with these function options.

A note of caution on standards:

  • Design for your purpose, one size does not fit all

  • Designing means you need to think carefully about your audience and aims

KMunicate

survfit2(Surv(time, status) ~ surg, data = df_colon) |>
  ggsurvfit(linetype_aes = TRUE) +
  add_confidence_interval() +
  add_risktable(
    risktable_stats = c("n.risk", "cum.censor", "cum.event")
  ) +
  theme_ggsurvfit_KMunicate() +
  scale_ggsurvfit() +
  theme(legend.position = c(0.85, 0.85)) +
  labs(y = "Recurrence-free Progression") 

KMunicate

{ggsurvfit} wrap up

  • Ease the creation of time-to-event summary figures with ggplot2

  • Concise and modular code

  • Ready for publication or sharing figures

  • Sensible defaults

  • Also supports competing risks cumulative incidence summaries

# Install ggsurvfit from CRAN with:
install.packages("ggsurvfit")

# install.packages("devtools"):
devtools::install_github("pharmaverse/ggsurvfit")