ARD-based Reporting in R with {cards} and {gtsummary} packages

CDISC COSA Spotlight Q2 2025

Daniel D. Sjoberg

Today’s Plan

  • Briefest Introduction to Analysis Results Standard (ARS) and Analysis Results Data (ARD)

  • ARDs with the {cards}+{cardx} packages

  • How {gtsummary} and {cards} work together

  • ARDs for QC

  • ARDs + LLMs

CDISC’s Analysis Results Standard (ARS)

CDISC’s Analysis Results Standard (ARS)

CDISC’s Analysis Results Standard (ARS)

  • The ARS provides a metadata-driven infrastructure for analysis

  • {cards} serves as the engine for the analysis

  • {gtsummary} is the engine for summary tables

Analysis Results Data (ARD)

  • Encodes statistical analysis outcomes in a machine-readable format.

  • The ARD model specifies how statistical results are saved into a structured format.

  • The ARD can be used to to subsequently create tables and figures.

  • The ARD does not describe the layout of the results

Analysis Results Data (ARD)

  • After the initial creation of an ARD, the results can later be re-used again and again for subsequent reporting needs.

ARDs using {cards}

cards website

{cards}: Introduction

  • Part of the Pharmaverse

  • Contains a variety of utilities for creating ARDs

  • Can be used within the ARS workflow and separately

  • 52K downloads per month 🤯

What does an ARD look like?

library(cards)

# create ARD with default summary statistics
ADSL |> 
  ard_continuous(
    variables = AGE
  )
{cards} data frame: 8 x 8
  variable   context stat_name stat_label   stat fmt_fn
1      AGE continuo…         N          N    254      0
2      AGE continuo…      mean       Mean 75.087      1
3      AGE continuo…        sd         SD  8.246      1
4      AGE continuo…    median     Median     77      1
5      AGE continuo…       p25         Q1     70      1
6      AGE continuo…       p75         Q3     81      1
7      AGE continuo…       min        Min     51      1
8      AGE continuo…       max        Max     89      1
ℹ 2 more variables: warning, error

What does an ARD look like?

  • We just saw the default statistics returned in the previous example

  • It’s simple to pass any function to ard_continuous() (base R functions, functions from other package, user-defined functions, etc.)

ADSL |> 
  ard_continuous(
    by = ARM,
    variables = AGE,
    statistic = ~list(cv = \(x) sd(x) / mean(x))
  )
{cards} data frame: 3 x 10
  group1 group1_level variable stat_name stat_label  stat
1    ARM      Placebo      AGE        cv         cv 0.114
2    ARM    Xanomeli…      AGE        cv         cv 0.106
3    ARM    Xanomeli…      AGE        cv         cv  0.11
ℹ 4 more variables: context, fmt_fn, warning, error

{cards}: ard_categorical()

ADSL |> 
  ard_categorical(
    by = ARM,
    variables = AGEGR1
  ) 
{cards} data frame: 27 x 11
   group1 group1_level variable variable_level stat_name stat_label  stat
1     ARM      Placebo   AGEGR1            <65         n          n    14
2     ARM      Placebo   AGEGR1            <65         N          N    86
3     ARM      Placebo   AGEGR1            <65         p          % 0.163
4     ARM      Placebo   AGEGR1            >80         n          n    30
5     ARM      Placebo   AGEGR1            >80         N          N    86
6     ARM      Placebo   AGEGR1            >80         p          % 0.349
7     ARM      Placebo   AGEGR1          65-80         n          n    42
8     ARM      Placebo   AGEGR1          65-80         N          N    86
9     ARM      Placebo   AGEGR1          65-80         p          % 0.488
10    ARM    Xanomeli…   AGEGR1            <65         n          n    11
ℹ 17 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 4 more variables: context, fmt_fn, warning, error

Any unobserved levels of the variables appear in the ARD.

{cards}: Other Summary Functions

  • ard_dichotomous(): similar to ard_categorical(), but for dichotomous summaries

  • ard_hierarchical(): similar to ard_categorical(), but built for nested tabulations, e.g. AE terms within SOC

  • ard_complex(): similar to ard_continuous(), but the summary functions can be more complex and accepts other arguments like the full and subsetted (within the by groups) data sets.

  • ard_missing(): tabulates rates of missingness

The results from all these functions are entirely compatible with one another, and can be stacked into a single data frame. 🥞

{cardx} (read: extra cards)

{cardx}

  • Extension of the {cards} package, providing additional functions to create Analysis Results Datasets (ARDs)

  • The {cardx} package exports many ard_*() function for statistical methods.

cards and cardx package logos

{cardx}

  • Exports ARD frameworks for statistical analyses from many packages
  - {stats}
  - {car}
  - {effectsize}
  - {emmeans}
  - {geepack}
  - {lme4}
  - {parameters}
  - {smd}
  - {survey}
  - {survival}
  • This list is growing (rather quickly) 🌱

  • Supports 40+ regression modeling types including {mmrm}

{cards} provides a general framework to easily implement any method. {cardx} provides a nicer user experience for common methods.

Other {cards} Resources

2025 PHUSE US Connect Analysis Results Datasets Using Open-Source Tools from the {pharmaverse}
2025 PHUSE US Connect Harnessing Analysis Results Datasets (ARD) for Clinical Reporting in R Our First ARD-Based Filing Experience with {cards}+{gtsummary}
2024 R/Pharma Harnessing CDISC’s Emerging Analysis Results Datasets Standard
2024 R/Pharma Workshop: Unlocking Analysis Results Datasets
posit::conf(2024) {pharmaverse} workshop
2024 China Pharma R User Conference Keynote Address

Listing available on the {cards} website: https://insightsengineering.github.io/cards/latest-tag/#other-resources

Tables with {gtsummary}

How it started

  • Began to address reproducible issues while working in academia

  • Goal was to build a package to summarize study results with code that was both simple and customizable

  • First release in May 2019

How it’s going

  • The stats

    • 1,500,000+ installations from CRAN
    • 1100+ GitHub stars
    • 300+ contributors
    • 50+ code contributors

  • Won the 2021 American Statistical Association (ASA) Innovation in Programming Award

  • Agustin Calatroni and I won the 2024 Posit Pharma Table Contest by re-creating an entire CSR with the {gtsummary} package

{gtsummary} runs on ARDs!

Demographics Example

library(gtsummary)

tbl <- dplyr::filter(pharmaverseadam::adsl, SAFFL == "Y") |> 
  tbl_summary(
    by = TRT01P,
    include = c(AGE, AGEGR1),
    type = AGE ~ "continuous2",
    statistic = AGE ~ c("{mean} ({sd})", "{median} ({p25}, {p75})")
  ) |> 
  add_overall() |> 
  add_stat_label()
tbl
Characteristic Overall
N = 254
Placebo
N = 86
Xanomeline High Dose
N = 84
Xanomeline Low Dose
N = 84
Age



    Mean (SD) 75 (8) 75 (9) 74 (8) 76 (8)
    Median (Q1, Q3) 77 (70, 81) 76 (69, 82) 76 (71, 80) 78 (71, 82)
Pooled Age Group 1, n (%)



    >64 221 (87%) 72 (84%) 73 (87%) 76 (90%)
    18-64 33 (13%) 14 (16%) 11 (13%) 8 (9.5%)

Demographics Example

  • Extract the ARD from the table object
gather_ard(tbl) |> purrr::pluck("tbl_summary")
{cards} data frame: 79 x 12
   group1 group1_level variable variable_level stat_name stat_label  stat
1  TRT01P      Placebo   AGEGR1            >64         n          n    72
2  TRT01P      Placebo   AGEGR1            >64         N          N    86
3  TRT01P      Placebo   AGEGR1            >64         p          % 0.837
4  TRT01P      Placebo   AGEGR1          18-64         n          n    14
5  TRT01P      Placebo   AGEGR1          18-64         N          N    86
6  TRT01P      Placebo   AGEGR1          18-64         p          % 0.163
7  TRT01P    Xanomeli…   AGEGR1            >64         n          n    73
8  TRT01P    Xanomeli…   AGEGR1            >64         N          N    84
9  TRT01P    Xanomeli…   AGEGR1            >64         p          % 0.869
10 TRT01P    Xanomeli…   AGEGR1          18-64         n          n    11
ℹ 69 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 5 more variables: context, fmt_fn, warning, error, gts_column

Demographics Example: ARD-first

dplyr::filter(pharmaverseadam::adsl, SAFFL == "Y") |> 
  cards::ard_stack(
    .by = TRT01A, .overall = TRUE, .attributes = TRUE,
    ard_continuous(variables = AGE),
    ard_categorical(variables = AGEGR1)
  ) |> 
  tbl_ard_summary(
    by = TRT01A,
    type = AGE ~ "continuous2",
    statistic = AGE ~ c("{mean} ({sd})", "{median} ({p25}, {p75})"),
    overall = TRUE
  ) |> 
  add_stat_label()
Characteristic Overall Placebo Xanomeline High Dose Xanomeline Low Dose
Age



    Mean (SD) 75.1 (8.2) 75.2 (8.6) 73.8 (7.9) 76.0 (8.1)
    Median (Q1, Q3) 77.0 (70.0, 81.0) 76.0 (69.0, 82.0) 75.5 (70.0, 79.0) 78.0 (71.0, 82.0)
Pooled Age Group 1, n (%)



    >64 221 (87.0%) 72 (83.7%) 61 (84.7%) 88 (91.7%)
    18-64 33 (13.0%) 14 (16.3%) 11 (15.3%) 8 (8.3%)

Other {gtsummary} structures

  • Hierarchical summary tables, e.g. AE or Con Med summaries

  • Cross tabulations

  • “Continuous” summaries, i.e. summarize statistics by two categorical variables

  • General summaries; great for highly bespoke tables and listings

Other {gtsummary} resources

Best resource is the {gtsummary} website with ~10 vignettes

posit::conf(2024) Tables for Research and Regulatory Submissions
2025 George Washington University Talk Clinical Reporting with {gtsummary}
R/Medicine 2022 Workshop Reporting with {gtsummary}
Weill Cornell Biostatistics/Computing Dept Presentation-ready Summary Tables with {gtsummary}
The R Journal Article Reproducible Summary Tables with the gtsummary Package.

{gtsummary} + Pharma Reporting 👷🏗️

  • {crane} R package

    • An extension to {gtsummary} specifically for pharma reporting.

    • Includes function to make the most common reporting at Roche a breeze.

    • There is a lot there already, and we anticipate its first official release this Fall. https://insightsengineering.github.io/crane/main/

{gtsummary} + Pharma Reporting 👷🏗️

  • Cardinal Initiative

    • Industry collaborative effort under pharmaverse that brings together pharmaceutical companies with the aspiration of building and open-sourcing a catalog of harmonized tables, listings, and graphs (TLGs) in clinical study reporting.

    • The project focusses on ARD solutions (hence the name cARDinal) using {cards}+{gtsummary}

    • Check out the progress at https://pharmaverse.github.io/cardinal/

ARDs for QC

How to use ARDs for QC

  • The ARD framework makes QC a breeze 🌬️

  • The ARD contains every statistic in a summary table or figure AND how that statistic is formatted.

  • You can fully QC a TLG by recreating the ARD and comparing the ARD to the one extracted from the TLG.

Example: ARDs for QC

Begin by building a TLG (we saw this code earlier), then extract the ARD from the table.

tbl_ard <- 
  pharmaverseadam::adsl |> 
  dplyr::filter(SAFFL == "Y") |> 
  tbl_summary(
    include = c(AGE, AGEGR1),
    type = AGE ~ "continuous2",
    statistic = 
      AGE ~ c("{mean} ({sd})", 
              "{median} ({p25}, {p75})"),
    digits = AGEGR1 ~ list(p = 1)
  ) |> 
  gather_ard() |> 
  purrr::pluck("tbl_summary") |> 
  apply_fmt_fn()
tbl_ard
   variable variable_level stat_name stat_label   stat stat_fmt
1    AGEGR1            >64         n          n    221      221
2    AGEGR1            >64         N          N    254      254
3    AGEGR1            >64         p          %   0.87     87.0
4    AGEGR1          18-64         n          n     33       33
5    AGEGR1          18-64         N          N    254      254
6    AGEGR1          18-64         p          %   0.13     13.0
7       AGE                     mean       Mean 75.087       75
8       AGE                       sd         SD  8.246        8
9       AGE                   median     Median     77       77
10      AGE                      p25         Q1     70       70

Example: ARDs for QC

Now, create the ARD for the table

library(cards)
qc_ard <-
  pharmaverseadam::adsl |> 
  dplyr::filter(SAFFL == "Y") |> 
  ard_stack(
    ard_continuous(
      variables = AGE, 
      statistic = 
        ~continuous_summary_fns(
          c("mean", "sd", "median", 
            "p25", "p75"))
    ),
    ard_categorical(
      variables = AGEGR1
    )
  ) |> 
  apply_fmt_fn()
qc_ard
   variable variable_level stat_name stat_label   stat stat_fmt
1       AGE                     mean       Mean 75.087     75.1
2       AGE                       sd         SD  8.246      8.2
3       AGE                   median     Median     77     77.0
4       AGE                      p25         Q1     70     70.0
5       AGE                      p75         Q3     81     81.0
6    AGEGR1            >64         n          n    221      221
7    AGEGR1            >64         N          N    254      254
8    AGEGR1            >64         p          %   0.87     87.0
9    AGEGR1          18-64         n          n     33       33
10   AGEGR1          18-64         N          N    254      254
11   AGEGR1          18-64         p          %   0.13     13.0

Example: ARDs for QC

Let’s compare 🕺

waldo::compare(
  tbl_ard |> 
    dplyr::filter(variable %in% c("AGE", "AGEGR1"), context %in% c("continuous", "categorical")) |> 
    dplyr::select(all_ard_groups(), all_ard_variables(), contains("stat"), -stat_label) |> 
    cards::unlist_ard_columns() |> 
    dplyr::arrange(dplyr::pick(c(all_ard_groups(), all_ard_variables(), contains("stat")))),
  qc_ard |> 
    dplyr::filter(variable %in% c("AGE", "AGEGR1"), context %in% c("continuous", "categorical")) |> 
    dplyr::select(all_ard_groups(), all_ard_variables(), contains("stat"), -stat_label) |> 
    cards::unlist_ard_columns() |> 
    dplyr::arrange(dplyr::pick(c(all_ard_groups(), all_ard_variables(), contains("stat")))),
  ignore_attr = TRUE
)
✔ No differences

We have plans to make these comparisons with less code.

ARDs + LLMs

How can LLMs help us within the {cards}+{gtsummary} framework?

  • {gtsummary} is a widely used package, the publicly available LLMs write code out of the box: no additional training needed.

Act as a statistical programmer within the pharmaceutical industry, and provide me code to create tables in the R language using the gtsummary package.
Using the ADSL data set as an input, please provide the code for creating a demographics table that is split by treatment arm. Include the variables AGE and AGEGR1. AGE is a continuous summary, and include the mean, standard deviation, median and IQR. AGEGR1 is a categorical variable and include the count and percent in the summary.

How can LLMs help us within the ARD framework?

The result! (using Gemini)

# Load necessary packages
# If you don't have gtsummary installed, uncomment and run:
# install.packages("gtsummary")
# install.packages("dplyr") # Often useful with gtsummary for data manipulation

library(gtsummary)
library(dplyr)

# --- Simulate ADSL Data ---
# In a real-world scenario, you would load your actual ADSL data.
# For demonstration purposes, we'll create a dummy ADSL dataset.
set.seed(123) # for reproducibility

adsl_data <- tibble(
  USUBJID = paste0("SUBJ", 1:100),
  TRT01A = sample(c("Placebo", "Treatment A", "Treatment B"), 100, replace = TRUE, prob = c(0.3, 0.4, 0.3)),
  AGE = round(rnorm(100, mean = 55, sd = 10), 0),
  AGEGR1 = cut(AGE,
               breaks = c(0, 18, 65, Inf),
               labels = c("<18", "18-65", ">65"),
               right = FALSE)
) %>%
  # Ensure factors for proper ordering and handling by gtsummary
  mutate(
    TRT01A = factor(TRT01A, levels = c("Placebo", "Treatment A", "Treatment B")),
    AGEGR1 = factor(AGEGR1, levels = c("<18", "18-65", ">65"))
  )

# --- Create the Demographics Table ---
# Use tbl_summary to create the table, specifying:
# - data: The ADSL dataset
# - by: The variable to split the table by (treatment arm)
# - include: The variables to include in the table
# - type: How to summarize each variable (continuous vs. categorical)
# - statistic: The specific statistics to display for each variable type

demographics_table <-
  adsl_data %>%
  tbl_summary(
    by = TRT01A, # Split the table by treatment arm
    include = c(AGE, AGEGR1), # Include Age and Age Group variables
    type = list(
      AGE ~ "continuous",      # Specify AGE as continuous
      AGEGR1 ~ "categorical"   # Specify AGEGR1 as categorical
    ),
    statistic = list(
      # For continuous AGE, include mean (SD) and median (IQR)
      AGE ~ "{mean} ({sd})",      # Mean (Standard Deviation)
      AGE ~ "{median} ({p25}, {p75})", # Median (25th percentile, 75th percentile) - IQR
      # For categorical AGEGR1, include count (percent)
      AGEGR1 ~ "{n} ({p}%)"
    ),
    # Add labels for better readability
    label = list(
      AGE ~ "Age, years",
      AGEGR1 ~ "Age Group, years"
    ),
    # Add a note explaining the statistics
    missing = "no", # Do not show missing values row for this table
    # Specify options for displaying missing data, formatting, etc.
    # We are explicitly specifying all statistics for AGE, so we might not need "all_continuous()" or "all_categorical()",
    # but they are good for default behavior if not all statistics are overridden.
    # To combine mean (SD) and median (IQR) into a single row, you would list them separated by a comma
    # in the statistic argument.
    digits = all_continuous() ~ 1, # One decimal place for continuous variables
    sort = list(AGEGR1 = ~ rtf_order_stat_level) # Ensure age groups are sorted correctly
  ) %>%
  add_n() %>% # Add column for total N
  add_p() %>% # Add p-value for group comparison (optional, but good for demographics)
  modify_header(
    list(
      label ~ "**Characteristic**",
      stat_by ~ "**{level}**\nN = {n}" # Customize column headers to show N
    )
  ) %>%
  modify_footnote(
    update = AGE ~ "Mean (SD); Median (IQR)"
  ) %>%
  # Add table title
  modify_caption("**Table 1. Demographics by Treatment Arm**")

# Print the table to see the output
demographics_table

# To export the table (e.g., to Word, PDF, HTML):
# library(flextable) # for Word/PDF output
# library(gt) # for HTML output (often used with gtsummary)

# Example for Word:
# demographics_table %>%
#   as_flex_table() %>%
#   flextable::save_as_docx(path = "demographics_table.docx")

# Example for HTML:
# demographics_table %>%
#   as_gt() %>%
#   gt::gtsave(filename = "demographics_table.html")

# Example for RMarkdown/Quarto knitting (table will render directly):
# You would simply include the 'demographics_table' object in your RMarkdown/Quarto chunk.

How can LLMs help us within the ARD framework?

ARDs can be represented in language-agnostic formats like JSON and YAML.

A summary table has a representation that is readily digestible by your favorite LLM.

tbl_ard |> 
  cards::as_nested_list() |> 
  jsonlite::toJSON(pretty = TRUE)
{
  "variable": {
    "AGEGR1": {
      "variable_level": {
        ">64": {
          "stat_name": {
            "n": {
              "stat": [221],
              "stat_fmt": ["221"],
              "warning": {},
              "error": {},
              "context": ["categorical"]
            },
            "N": {
              "stat": [254],
              "stat_fmt": ["254"],
              "warning": {},
              "error": {},
              "context": ["categorical"]
            },
            "p": {
              "stat": [0.8701],
              "stat_fmt": ["87.0"],
              "warning": {},
              "error": {},
              "context": ["categorical"]
            }
          }
        },
        "18-64": {
          "stat_name": {
            "n": {
              "stat": [33],
              "stat_fmt": ["33"],
              "warning": {},
              "error": {},
              "context": ["categorical"]
            },
            "N": {
              "stat": [254],
              "stat_fmt": ["254"],
              "warning": {},
              "error": {},
              "context": ["categorical"]
            },
            "p": {
              "stat": [0.1299],
              "stat_fmt": ["13.0"],
              "warning": {},
              "error": {},
              "context": ["categorical"]
            }
          }
        }
      },
      "stat_name": {
        "label": {
          "stat": ["Pooled Age Group 1"],
          "stat_fmt": ["Pooled Age Group 1"],
          "warning": {},
          "error": {},
          "context": ["attributes"]
        },
        "class": {
          "stat": ["character"],
          "stat_fmt": {},
          "warning": {},
          "error": {},
          "context": ["attributes"]
        },
        "N_obs": {
          "stat": [254],
          "stat_fmt": ["254"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "N_miss": {
          "stat": [0],
          "stat_fmt": ["0"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "N_nonmiss": {
          "stat": [254],
          "stat_fmt": ["254"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "p_miss": {
          "stat": [0],
          "stat_fmt": ["0"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "p_nonmiss": {
          "stat": [1],
          "stat_fmt": ["100"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        }
      }
    },
    "AGE": {
      "stat_name": {
        "mean": {
          "stat": [75.0866],
          "stat_fmt": ["75"],
          "warning": {},
          "error": {},
          "context": ["continuous"]
        },
        "sd": {
          "stat": [8.2462],
          "stat_fmt": ["8"],
          "warning": {},
          "error": {},
          "context": ["continuous"]
        },
        "median": {
          "stat": [77],
          "stat_fmt": ["77"],
          "warning": {},
          "error": {},
          "context": ["continuous"]
        },
        "p25": {
          "stat": [70],
          "stat_fmt": ["70"],
          "warning": {},
          "error": {},
          "context": ["continuous"]
        },
        "p75": {
          "stat": [81],
          "stat_fmt": ["81"],
          "warning": {},
          "error": {},
          "context": ["continuous"]
        },
        "label": {
          "stat": ["Age"],
          "stat_fmt": ["Age"],
          "warning": {},
          "error": {},
          "context": ["attributes"]
        },
        "class": {
          "stat": ["numeric"],
          "stat_fmt": {},
          "warning": {},
          "error": {},
          "context": ["attributes"]
        },
        "N_obs": {
          "stat": [254],
          "stat_fmt": ["254"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "N_miss": {
          "stat": [0],
          "stat_fmt": ["0"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "N_nonmiss": {
          "stat": [254],
          "stat_fmt": ["254"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "p_miss": {
          "stat": [0],
          "stat_fmt": ["0"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        },
        "p_nonmiss": {
          "stat": [1],
          "stat_fmt": ["100"],
          "warning": {},
          "error": {},
          "context": ["missing"]
        }
      }
    },
    "..ard_total_n..": {
      "stat_name": {
        "N": {
          "stat": [254],
          "stat_fmt": ["254"],
          "warning": {},
          "error": {},
          "context": ["total_n"]
        }
      }
    }
  }
} 

How can LLMs help us within the ARD framework?

Ask the LLM to write draft summary for you.

Below you will find a JSON representation of the results in the table. Act as a medical writer working at a pharmaceutical company and write a brief summary of the results.

A total of 254 participants were enrolled in the study, distributed across three treatment arms: Placebo (N=86), Xanomeline High Dose (N=84), and Xanomeline Low Dose (N=84).

Baseline demographics indicate that participants were generally similar across all treatment groups with respect to age. The mean age was approximately 75 years across the groups, with standard deviations ranging from 7.9 to 8.6 years. Specifically, the mean age (standard deviation) was 75.2 (8.6) years in the Placebo group, 74.4 (7.9) years in the Xanomeline High Dose group, and 75.7 (8.3) years in the Xanomeline Low Dose group. The median ages were also comparable: 76 years (IQR: 69, 82) for Placebo, 76 years (IQR: 70.5, 80) for Xanomeline High Dose, and 77.5 years (IQR: 71, 82) for Xanomeline Low Dose.

Regarding age groups, the majority of participants in all treatment arms were over 64 years of age. Specifically, the proportion of participants aged >64 years was 83.7% (n=72) in the Placebo group, 86.9% (n=73) in the Xanomeline High Dose group, and 90.5% (n=76) in the Xanomeline Low Dose group. The proportion of participants aged 18-64 years was 16.3% (n=14) in the Placebo group, 13.1% (n=11) in the Xanomeline High Dose group, and 9.5% (n=8) in the Xanomeline Low Dose group. There were no participants under 18 years of age.

Overall, the demographic characteristics at baseline appear well-balanced across the treatment groups, suggesting that any observed differences in outcomes are less likely to be attributable to baseline age disparities.

ARD Collaboration

Continue for Extra Slides

Our Pilot

Our ARD-based Pilot

  • We wanted to dip our toes into CDISC’s Analysis Results Standard (ARS)

    • We did not try to implement the full model, just getting a taste

    • Using some metadata to drive the creation of TLGs

    • Using an ARD-first approach to create our TLGs, e.g. using the {cards} R package to create ARDs, and the {gtsummary} package for tables

  • Report on our experience! What would we change if we were to do it again?

What we liked!

  • Using {cards}+{cardx}+{gtsummary}, we created every summary for a trial read-out. 🕺🕺🕺

  • These tools are general to allow us to create any table needed, whether it was a table of simple counts or complex statistical results.

  • Nolan was the primary statistical programmer for this study.

    • We did not have time for proper training on the new {cards} and {cardx} packages, nor for the {gtsummary} package.

    • Despite no training, he was quickly able to pick up the packages and immediately begin using, which we attribute to the intuitive user interface. 🚀

What we liked!

  • We loved the ARD-based results, which made QC simple and straightforward.

    • A highly structured data frame of results is straightforward to QC compared to a summary table or figure.
  • The ARDs make re-use of the results simple as well.

  • As long as we have both the ARD and the table/figure, we are happy with whatever order they are created.

What we would do differently?

  • As {gtsummary} runs on ARDs, we found the requirement of creating the ARD first, then using it to create the summary table more complex than the process needed to be.

    • ARDs can be extracted directly from any {gtsummary} and in many cases it is sufficient to create the table directly and take the ARD from the table object.

What we would do differently?

  • The full ARS model is metadata driven. This also means the metadata dictate the precise layout of the tables.

  • We preferred to write an R script for the persnickety formatting, rather than making those specifications in a metadata format.

    • If a trial only reported “standard” tables, it would be much less of an issue to specify detailed meta data for layouts. But every trial (and in particular this pilot) has many bespoke tables not part of a standard library.
  • We were still able to take advantage of the ARS attributes, such as tracability, re-use, and automation.

🕺🕺 ARD Team 🕺🕺

ARDs uses outside of the ARS

  • Rethinking QC

    • Highly structured data frame of results is much simpler to QC compared to statistics in a summary table or figure.
  • Flexible data file types

    • ARD can be saved as a dataset (rds, xpt, parquet, etc.), YAML, or JSON file
  • ARDs integrate with the {gtsummary} package to create summary tables

{cardx} t-test Example

  • We see the results like the mean difference, the confidence interval, and p-value as expected.

  • And we also see the function’s inputs, which is incredibly useful for re-use, e.g. we know the we did not use equal variances.

pharmaverseadam::adsl |> 
  dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
  cardx::ard_stats_t_test(by = ARM, variables = AGE)
{cards} data frame: 14 x 9
   group1 variable   context   stat_name stat_label      stat
1     ARM      AGE stats_t_…    estimate  Mean Dif…    -1.286
2     ARM      AGE stats_t_…   estimate1  Group 1 …    74.381
3     ARM      AGE stats_t_…   estimate2  Group 2 …    75.667
4     ARM      AGE stats_t_…   statistic  t Statis…     -1.03
5     ARM      AGE stats_t_…     p.value    p-value     0.304
6     ARM      AGE stats_t_…   parameter  Degrees …   165.595
7     ARM      AGE stats_t_…    conf.low  CI Lower…     -3.75
8     ARM      AGE stats_t_…   conf.high  CI Upper…     1.179
9     ARM      AGE stats_t_…      method     method Welch Tw…
10    ARM      AGE stats_t_… alternative  alternat… two.sided
11    ARM      AGE stats_t_…          mu    H0 Mean         0
12    ARM      AGE stats_t_…      paired  Paired t…     FALSE
13    ARM      AGE stats_t_…   var.equal  Equal Va…     FALSE
14    ARM      AGE stats_t_…  conf.level  CI Confi…      0.95
ℹ 3 more variables: fmt_fn, warning, error

{cardx} Regression

  • Includes functionality to summarize nearly every type of regression model in the R ecosystem:

betareg::betareg(), biglm::bigglm(), brms::brm(), cmprsk::crr(), fixest::feglm(), fixest::femlm(), fixest::feNmlm(), fixest::feols(), gam::gam(), geepack::geeglm(), glmmTMB::glmmTMB(), glmtoolbox::glmgee(), lavaan::lavaan(), lfe::felm(), lme4::glmer.nb(), lme4::glmer(), lme4::lmer(), logitr::logitr(), MASS::glm.nb(), MASS::polr(), mgcv::gam(), mice::mira, mmrm::mmrm(), multgee::nomLORgee(), multgee::ordLORgee(), nnet::multinom(), ordinal::clm(), ordinal::clmm(), parsnip::model_fit, plm::plm(), pscl::hurdle(), pscl::zeroinfl(), rstanarm::stan_glm(), stats::aov(), stats::glm(), stats::lm(), stats::nls(), survey::svycoxph(), survey::svyglm(), survey::svyolr(), survival::cch(), survival::clogit(), survival::coxph(), survival::survreg(), svyVGAM::svy_vglm(), tidycmprsk::crr(), VGAM::vgam(), VGAM::vglm() (and more)

{cardx} Regression Example

library(survival); library(ggsurvfit)

# build model
mod <- pharmaverseadam::adtte_onco |> 
  dplyr::filter(PARAM %in% "Progression Free Survival") |>
  coxph(Surv_CNSR() ~ ARM, data = _)

# put model in a summary table
tbl <- gtsummary::tbl_regression(mod, exponentiate = TRUE) |> 
  gtsummary::add_n(location = c('label', 'level')) |> 
  gtsummary::add_nevent(location = c('label', 'level'))


Characteristic N Event N HR 95% CI p-value
Description of Planned Arm 254 6


    Placebo 86 3
    Xanomeline High Dose 84 2 3.00 0.39, 22.9 0.3
    Xanomeline Low Dose 84 1 1.27 0.11, 14.3 0.8
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

When things go wrong 😱

What happens when statistics are un-calculable?

ard_gone_wrong <- 
  cards::ADSL |> 
  cards::ard_continuous(
    by = ARM,
    variable = AGEGR1,
    statistic = ~list(kurtosis = \(x) e1071::kurtosis(x))
  )
ard_gone_wrong
{cards} data frame: 3 x 10
  group1 group1_level variable stat_name stat_label stat   warning     error
1    ARM      Placebo   AGEGR1  kurtosis   kurtosis      argument… non-nume…
2    ARM    Xanomeli…   AGEGR1  kurtosis   kurtosis      argument… non-nume…
3    ARM    Xanomeli…   AGEGR1  kurtosis   kurtosis      argument… non-nume…
ℹ 2 more variables: context, fmt_fn
cards::print_ard_conditions(ard_gone_wrong)

{gtsummary} extras

  • {gtsummary} tables are composable, meaning complex tables can be cobbled together one piece at a time and combined.

    • many other functions to create common structures, such as, tbl_continuous(), tbl_hierarchical(),tbl_cross(), tbl_wide_summary(), and many more

    • add_*() functions will add additional columns/summary statistics to an existing table.

    • tbl_merge() and tbl_stack() combine tow more more tables

    • and many more functions available for creating beautiful tables!🤩

  • Check out the PHUSE US Connect Workshop (later today) for more information!

{gtsummary} extras

  • If the structured tbl_*() and tbl_ard_*() functions don’t exactly meet your needs, use as_gtsummary()!

  • The as_gtsummary() function ingests a data frame and adds the {gtsummary} framework around it: great for listings and highly bespoke tables.

cards::ADAE[1:7, c("USUBJID", "AESOC", "AETERM", "AESEV")] |> 
  as_gtsummary() |> 
  modify_column_alignment(everything(), "left") |> 
  as_gt(groupname_col = "USUBJID")
Primary System Organ Class Reported Term for the Adverse Event Severity/Intensity
01-701-1015
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS APPLICATION SITE ERYTHEMA MILD
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS APPLICATION SITE PRURITUS MILD
GASTROINTESTINAL DISORDERS DIARRHOEA MILD
01-701-1023
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA MILD
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA MODERATE
CARDIAC DISORDERS ATRIOVENTRICULAR BLOCK SECOND DEGREE MILD
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA MILD

{gtsummary} extras