Adopting gtsummary at Scale

How Roche Built a Companion to gtsummary to Standardise and Simplify ARD-Based Reporting

Daniel D. Sjoberg

PHUSE US Connect 2026

What is {crane} ?

  • {crane} is an R package extending {gtsummary} for Roche’s reporting requirements

  • {crane} exports a {gtsummary} theme

  • {crane} exports functions to bespoke summary tables

But First, What is {gtsummary} ?

How it started

  • Began to address reproducibility issues while working in academia

  • Goal to build a package to summarize study results with code that was both simple and customizable

How it’s going

  • The stats

    • 2,000,000 installations from CRAN
    • 1,200 GitHub stars
    • 1,200 citations in peer-reviewed articles
    • 50 code contributors
  • Won the 2025 Brian Bole Award of Excellence from R in Pharma

  • Won the 2021 American Statistical Association (ASA) Innovation in Programming Award

  • Won the 2024 Posit Pharma Table Contest

Monthly {gtsummary} CRAN Downloads

{gtsummary} + LLMs

  • Since {gtsummary} is widely adopted, our LLMs besties work wonderfully out of the box. No additional training needed!

  • Recently added an AI assistant to the pakcage site and it’s AMAZING! (Thanks kapa.ai)

This Talk is Not about {gtsummary}

But, I want to touch on two items

  1. {gtsummary} creates beautiful tables that are easy to customize

  2. {gtsummary} supports themes that allow users to change defaults and other details of summary tables

A Little Data Preparation

library(gtsummary)
library(tidyverse)

adsl <- pharmaverseadam::adsl |> 
  filter(SAFFL == "Y") |> 
  mutate(ARM2 = word(ARM), FEMALE = SEX == "F") |> 
  labelled::set_variable_labels(FEMALE = "Female")

adae <- pharmaverseadam::adae |> 
  filter(
    USUBJID %in% adsl$USUBJID,
    AESOC %in% c("CARDIAC DISORDERS", "EYE DISORDERS"),
    AEDECOD %in% c("ATRIAL FLUTTER", "MYOCARDIAL INFARCTION", "EYE ALLERGY", "EYE SWELLING")
  ) |> 
  mutate(ARM2 = word(ARM))

adtte <- pharmaverseadam::adtte_onco |> 
  dplyr::filter(PARAM == "Progression Free Survival") |> 
  mutate(ARM2 = word(ARM))

{gtsummary} Tables

We will review briefly just one summary table function.

  • tbl_summary()

Other functions helpful functions we’re not covering:

  • tbl_hierarchical(): Summarize AE, Con Meds, and other similar rates

  • tbl_hierarchical_count(): similar to tbl_hierarchical() for counts instead of rates

  • tbl_cross(): cross tabulations

  • tbl_continuous(): summarizing continuous variables by 2 categorical variables

  • tbl_wide_summary(): similar to tbl_summary() but statistics are presented in separate columns

  • many more!

Basic tbl_summary()

library(gtsummary)

adsl |> 
  tbl_summary(
    include = c(AGE, ETHNIC, FEMALE)
  )
Characteristic N = 2541
Age 77 (70, 81)
Ethnicity
    HISPANIC OR LATINO 12 (4.7%)
    NOT HISPANIC OR LATINO 242 (95%)
Female 143 (56%)
1 Median (Q1, Q3); n (%)
  • Four types of summaries: continuous, continuous2, categorical, and dichotomous

  • Statistics are median (IQR) for continuous, n (%) for categorical/dichotomous

  • Variables coded 0/1, TRUE/FALSE, Yes/No treated as dichotomous by default

  • Label attributes are printed automatically

Customize tbl_summary() output

adsl |> 
  tbl_summary(
    include = c(AGE, ETHNIC, FEMALE),
    by = ARM2,
  )
Characteristic Placebo
N = 861
Xanomeline
N = 1681
Age 76 (69, 82) 77 (71, 81)
Ethnicity

    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%)
    NOT HISPANIC OR LATINO 83 (97%) 159 (95%)
Female 53 (62%) 90 (54%)
1 Median (Q1, Q3); n (%)
  • by: specify a column variable for cross-tabulation

Customize tbl_summary() output

adsl |> 
  tbl_summary(
    include = c(AGE, ETHNIC, FEMALE),
    by = ARM2,
    type = AGE ~ "continuous2",
  )
Characteristic Placebo
N = 861
Xanomeline
N = 1681
Age

    Median (Q1, Q3) 76 (69, 82) 77 (71, 81)
Ethnicity

    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%)
    NOT HISPANIC OR LATINO 83 (97%) 159 (95%)
Female 53 (62%) 90 (54%)
1 n (%)
  • by: specify a column variable for cross-tabulation

  • type: specify the summary type

Customize tbl_summary() output

adsl |> 
  tbl_summary(
    include = c(AGE, ETHNIC, FEMALE),
    by = ARM2,
    type = AGE ~ "continuous2",
    statistic = 
      list(
        AGE ~ c("{mean} ({sd})", 
                "{min}, {max}"), 
        FEMALE ~ "{n} / {N} ({p}%)"
      ),
  )
Characteristic Placebo
N = 861
Xanomeline
N = 1681
Age

    Mean (SD) 75 (9) 75 (8)
    Min, Max 52, 89 51, 88
Ethnicity

    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%)
    NOT HISPANIC OR LATINO 83 (97%) 159 (95%)
Female 53 / 86 (62%) 90 / 168 (54%)
1 n (%); n / N (%)
  • by: specify a column variable for cross-tabulation

  • type: specify the summary type

  • statistic: customize the reported statistics

Customize tbl_summary() output

adsl |> 
  tbl_summary(
    include = c(AGE, ETHNIC, FEMALE),
    by = ARM2,
    type = AGE ~ "continuous2",
    statistic = 
      list(
        AGE ~ c("{mean} ({sd})", 
                "{min}, {max}"), 
        FEMALE ~ "{n} / {N} ({p}%)"
      ),
    label = 
      AGE ~ "Age, years",
  )
Characteristic Placebo
N = 861
Xanomeline
N = 1681
Age, years

    Mean (SD) 75 (9) 75 (8)
    Min, Max 52, 89 51, 88
Ethnicity

    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%)
    NOT HISPANIC OR LATINO 83 (97%) 159 (95%)
Female 53 / 86 (62%) 90 / 168 (54%)
1 n (%); n / N (%)
  • by: specify a column variable for cross-tabulation

  • type: specify the summary type

  • statistic: customize the reported statistics

  • label: change or customize variable labels

Customize tbl_summary() output

adsl |> 
  tbl_summary(
    include = c(AGE, ETHNIC, FEMALE),
    by = ARM2,
    type = AGE ~ "continuous2",
    statistic = 
      list(
        AGE ~ c("{mean} ({sd})", 
                "{min}, {max}"), 
        FEMALE ~ "{n} / {N} ({p}%)"
      ),
    label = 
      AGE ~ "Age, years",
    digits = AGE ~ list(sd = 1) # report SD(age) to one decimal place
  )
Characteristic Placebo
N = 861
Xanomeline
N = 1681
Age, years

    Mean (SD) 75 (8.6) 75 (8.1)
    Min, Max 52, 89 51, 88
Ethnicity

    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%)
    NOT HISPANIC OR LATINO 83 (97%) 159 (95%)
Female 53 / 86 (62%) 90 / 168 (54%)
1 n (%); n / N (%)
  • by: specify a column variable for cross-tabulation

  • type: specify the summary type

  • statistic: customize the reported statistics

  • label: change or customize variable labels

  • digits: specify the number of decimal places for rounding

Add-on functions in {gtsummary}

tbl_summary() objects can also be updated using related functions.

  • add_*() add additional column of statistics or information, e.g. p-values, q-values, overall statistics, treatment differences, N obs., and more

  • modify_*() modify table headers, spanning headers, footnotes, and more

Update tbl_summary() with add_*()

adsl |>
  tbl_summary(
    by = ARM2,
    include = c(AGE, ETHNIC, FEMALE)
  ) |> 
  add_overall(last = TRUE)
Characteristic Placebo
N = 861
Xanomeline
N = 1681
Overall
N = 2541
Age 76 (69, 82) 77 (71, 81) 77 (70, 81)
Ethnicity


    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%) 12 (4.7%)
    NOT HISPANIC OR LATINO 83 (97%) 159 (95%) 242 (95%)
Female 53 (62%) 90 (54%) 143 (56%)
1 Median (Q1, Q3); n (%)
  • add_overall(): adds a column of overall statistics

Add-on functions in {gtsummary}

And many more!

See the documentation at http://www.danieldsjoberg.com/gtsummary/reference/index.html

And a detailed tbl_summary() vignette at http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html

Cobbling Table with {gtsummary}

Two or more {gtsummary} tables can be combined by either merging or stacking.

  • tbl_merge() for horizontal combining

  • tbl_stack() for vertical combining



But more on this later in the {crane} section

Finally, Introducing {crane}

Wrapping Functions

The first function we added to {crane} was tbl_roche_summary(): a very thin wrapper for gtsummary::tbl_summary().

  • Continuous variables default to continuous2.

  • tbl_summary(missing*) arguments have been changed to tbl_roche_summary(nonmissing*).

    • We highlight non-missing counts over missing counts, which are the default in {gtsummary}
  • Counts represented by 0 (0%) print as 0.

library(crane)

adsl |> 
  dplyr::mutate(ETHNIC = forcats::fct_expand(ETHNIC, "REFUSED")) |> 
  tbl_roche_summary(
    by = ARM2, 
    include = c(AGE, ETHNIC),
    nonmissing = "always"
  )

Wrapping Functions

Table 1
Placebo
(N = 86)
Xanomeline
(N = 168)
Age

    n 86 168
    Mean (SD) 75.2 (8.6) 75.0 (8.1)
    Median 76.0 77.0
    Min - Max 52 - 89 51 - 88
ETHNIC

    n 86 168
    HISPANIC OR LATINO 3 (3.5%) 9 (5.4%)
    NOT HISPANIC OR LATINO 83 (96.5%) 159 (94.6%)
    REFUSED 0 0

Extending with New Functions

Lab values are summarized by visit and include the change from baseline.

This is a simple table that is just a tbl_merge() of the AVAL summary and the CHG summary.

But the general structure appears enough times in our catalog, we make it simple for our programmers to create.

library(crane)

adlb |> 
  dplyr::filter(PARAM == "Albumin (g/L)") |> 
  tbl_baseline_chg(
    by = "ARM",
    baseline_level = "Baseline",
    denominator = adsl
  )

Extending with New Functions

Extending with New Functions

Extending with New Functions

Extending with New Functions

Create a Company Theme

Our theme is implemented in crane::theme_gtsummary_roche()

Primary changes include:

  • Sets a custom function for rounding percentages.

  • Round all p-values to four decimal places.

  • Headers default to include the N in parenthesis without bold, e.g. 'Placebo \n (N = 184)'.

  • All tables are printed with {flextable} and we add Roche-specific styling to the table.

    • Update the default font, font size, table borders, cell padding, etc. to meet our guidelines.

Create a Company Theme

theme_gtsummary_roche()

adsl |> 
  dplyr::mutate(ETHNIC = forcats::fct_expand(ETHNIC, "REFUSED")) |> 
  tbl_roche_summary(
    by = ARM2, 
    include = c(AGE, ETHNIC),
    nonmissing = "always"
  )

Placebo
(N = 86)

Xanomeline
(N = 168)

Age

n

86

168

Mean (SD)

75.2 (8.6)

75.0 (8.1)

Median

76.0

77.0

Min - Max

52 - 89

51 - 88

ETHNIC

n

86

168

HISPANIC OR LATINO

3 (3.5%)

9 (5.4%)

NOT HISPANIC OR LATINO

83 (96.5%)

159 (94.6%)

REFUSED

0

0

Extend with ARD-first Functionality

  • We don’t have time to cover in detail, but there is another wonderful way to create bespoke tables and functions.

  • The {gtsummary} package supports creating tables using ARDs (Analysis Results Datasets).

    • Data ➡️ ARD ➡️ Table
  • This method is particularly useful for efficacy tables, as they contain statistics that are not our standard rates, counts, and univariate descriptor statistics.

  • Review the ARD-first Vignette for a detailed walk through.

Extend with ARD-first Functionality

tbl_survfit_times(
  data = adtte, 
  times = 12, 
  by = "ARM2", 
  label = "Month {time}"
)
Placebo
(N = 86)
Xanomeline
(N = 168)
Month 12

    Patients remaining at risk 5 6
    Event Free Rate (%) 80.0 100.0
    95% CI (51.6, 100.0) (100.0, 100.0)



A “coding partner” that strictly adheres to your domain-specific standards without constant reminders.

SKILLS.md

# Skills: Clinical Trial Summary Table Development

Standardized workflow for R-based clinical reporting using `crane`, `gtsummary`, and `cards`.

## Configuration & Principles
* **Setup:** Always run `crane::theme_gtsummary_roche()` after library imports.
* **Data Prep:** Use **tidyverse** for data preparation before calling summary functions.

## Table Workflows

### Path A: Data-First (Primary)
Use for standard summaries.
1.  **Function:** `crane::tbl_*()` (if applicable), else `gtsummary::tbl_*()`.
2.  **Traceability:** Extract ARD immediately using `gtsummary::gather_ard()`.

### Path B: ARD-First (Fallback)
Use for complex/custom nesting.
1.  **Calculate:** Generate ARD via `cards::ard_stack()` or `cards::ard_*()` functions.
2.  **Render:** Pass ARD to `gtsummary::tbl_ard_*()` functions.

## Core Libraries
`library(tidyverse)`, `library(crane)`, `library(gtsummary)`, `library(cards)`

When it comes time to build your custom tables, use the {crane} package as a blueprint.