Analysis Results Datasets Standard with the {cards} and {gtsummary} Packages

Daniel D. Sjoberg, Genentech

What are we doing here today?

  • Briefest introduction into CDISC’s Analysis Results Standard (ARS)

  • Where Analysis Results Datasets (ARDs) fits under the umbrella of ARS

  • How to utilize ARDs to facilitate reporting with and without the ARS

    • Using {cards} to build ARDs and {gtsummary} for reporting

CDISC’s Analysis Results Standard (ARS)

CDISC’s Analysis Results Standard (ARS)

CDISC’s Analysis Results Standard (ARS)

  • The ARS provides a metadata-driven infrastructure for analysis

  • {cards} serves as the engine for the analysis

Analysis Results Data (ARD)

  • Encodes statistical analysis outcomes in a machine-readable format.

  • The ARD model specifies how statistical results are saved into a structured format.

  • The ARD can be used to to subsequently create tables and figures.

  • The ARD does not describe the layout of the results

Analysis Results Data (ARD)

  • After the initial creation of an ARD, the results can later be re-used again and again for subsequent reporting needs.

ARDs uses outside of the ARS

  • Rethinking QC

    • Highly structured data frame of results is much simpler to QC compared to statistics in a summary table or figure.
  • Flexible data file types

    • ARD can be saved as a dataset (rds, xpt, parquet, etc.), YAML, or JSON file
  • ARDs integrate with the {gtsummary} package to create summary tables

ARDs using {cards}

cards website

{cards}: Introduction

  • Part of the Pharmaverse

  • Contains a variety of utilities for creating ARDs

  • Can be used within the ARS workflow and separately

  • 45k downloads per month 🤯

What does an ARD look like?

library(cards)

# create ARD with default summary statistics
ADSL |> 
  ard_continuous(
    variables = AGE
  )
{cards} data frame: 8 x 8
  variable   context stat_name stat_label   stat fmt_fn
1      AGE continuo…         N          N    254      0
2      AGE continuo…      mean       Mean 75.087      1
3      AGE continuo…        sd         SD  8.246      1
4      AGE continuo…    median     Median     77      1
5      AGE continuo…       p25         Q1     70      1
6      AGE continuo…       p75         Q3     81      1
7      AGE continuo…       min        Min     51      1
8      AGE continuo…       max        Max     89      1
ℹ 2 more variables: warning, error

What does an ARD look like?

  • We just saw the default statistics returned in the previous example

  • It’s simple to pass any function to ard_continuous() (base R functions, functions from other package, user-defined functions, etc.)

ADSL |> 
  ard_continuous(
    by = ARM,
    variables = AGE,
    statistic = ~list(cv = \(x) sd(x) / mean(x))
  )
{cards} data frame: 3 x 10
  group1 group1_level variable stat_name stat_label  stat
1    ARM      Placebo      AGE        cv         cv 0.114
2    ARM    Xanomeli…      AGE        cv         cv 0.106
3    ARM    Xanomeli…      AGE        cv         cv  0.11
ℹ 4 more variables: context, fmt_fn, warning, error

{cards}: ard_categorical()

ADSL |> 
  ard_categorical(
    by = ARM,
    variables = AGEGR1
  ) 
{cards} data frame: 27 x 11
   group1 group1_level variable variable_level stat_name stat_label  stat
1     ARM      Placebo   AGEGR1            <65         n          n    14
2     ARM      Placebo   AGEGR1            <65         N          N    86
3     ARM      Placebo   AGEGR1            <65         p          % 0.163
4     ARM    Xanomeli…   AGEGR1            <65         n          n    11
5     ARM    Xanomeli…   AGEGR1            <65         N          N    84
6     ARM    Xanomeli…   AGEGR1            <65         p          % 0.131
7     ARM    Xanomeli…   AGEGR1            <65         n          n     8
8     ARM    Xanomeli…   AGEGR1            <65         N          N    84
9     ARM    Xanomeli…   AGEGR1            <65         p          % 0.095
10    ARM      Placebo   AGEGR1            >80         n          n    30
ℹ 17 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 4 more variables: context, fmt_fn, warning, error

Any unobserved levels of the variables appear in the ARD.

{cards}: Other Summary Functions

  • ard_dichotomous(): similar to ard_categorical(), but for dichotomous summaries

  • ard_hierarchical(): similar to ard_categorical(), but built for nested tabulations, e.g. AE terms within SOC

  • ard_complex(): similar to ard_continuous(), but the summary functions can be more complex and accepts other arguments like the full and subsetted (within the by groups) data sets.

  • ard_missing(): tabulates rates of missingness

The results from all these functions are entirely compatible with one another, and can be stacked into a single data frame. 🥞

{cardx} (read: extra cards)

{cardx}

  • Extension of the {cards} package, providing additional functions to create Analysis Results Datasets (ARDs)

  • The {cardx} package exports many ard_*() function for statistical methods.

cards and cardx package logos

{cardx}

  • Exports ARD frameworks for statistical analyses from many packages
  - {stats}
  - {car}
  - {effectsize}
  - {emmeans}
  - {geepack}
  - {lme4}
  - {parameters}
  - {smd}
  - {survey}
  - {survival}
  • This list is growing (rather quickly) 🌱

{cardx} t-test Example

  • We see the results like the mean difference, the confidence interval, and p-value as expected.

  • And we also see the function’s inputs, which is incredibly useful for re-use, e.g. we know the we did not use equal variances.

pharmaverseadam::adsl |> 
  dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
  cardx::ard_stats_t_test(by = ARM, variables = AGE)
{cards} data frame: 14 x 9
   group1 variable   context   stat_name stat_label      stat
1     ARM      AGE stats_t_…    estimate  Mean Dif…    -1.286
2     ARM      AGE stats_t_…   estimate1  Group 1 …    74.381
3     ARM      AGE stats_t_…   estimate2  Group 2 …    75.667
4     ARM      AGE stats_t_…   statistic  t Statis…     -1.03
5     ARM      AGE stats_t_…     p.value    p-value     0.304
6     ARM      AGE stats_t_…   parameter  Degrees …   165.595
7     ARM      AGE stats_t_…    conf.low  CI Lower…     -3.75
8     ARM      AGE stats_t_…   conf.high  CI Upper…     1.179
9     ARM      AGE stats_t_…      method     method Welch Tw…
10    ARM      AGE stats_t_… alternative  alternat… two.sided
11    ARM      AGE stats_t_…          mu    H0 Mean         0
12    ARM      AGE stats_t_…      paired  Paired t…     FALSE
13    ARM      AGE stats_t_…   var.equal  Equal Va…     FALSE
14    ARM      AGE stats_t_…  conf.level  CI Confi…      0.95
ℹ 3 more variables: fmt_fn, warning, error

{cardx} Regression

  • Includes functionality to summarize nearly every type of regression model in the R ecosystem:

betareg::betareg(), biglm::bigglm(), brms::brm(), cmprsk::crr(), fixest::feglm(), fixest::femlm(), fixest::feNmlm(), fixest::feols(), gam::gam(), geepack::geeglm(), glmmTMB::glmmTMB(), lavaan::lavaan(), lfe::felm(), lme4::glmer.nb(), lme4::glmer(), lme4::lmer(), logitr::logitr(), MASS::glm.nb(), MASS::polr(), mgcv::gam(), mice::mira, mmrm::mmrm(), multgee::nomLORgee(), multgee::ordLORgee(), nnet::multinom(), ordinal::clm(), ordinal::clmm(), parsnip::model_fit, plm::plm(), pscl::hurdle(), pscl::zeroinfl(), rstanarm::stan_glm(), stats::aov(), stats::glm(), stats::lm(), stats::nls(), survey::svycoxph(), survey::svyglm(), survey::svyolr(), survival::cch(), survival::clogit(), survival::coxph(), survival::survreg(), tidycmprsk::crr(), VGAM::vglm() (and more)

{cardx} Regression Example

library(survival)

# build model
mod <- pharmaverseadam::adtte_onco |> 
  dplyr::filter(PARAM %in% "Progression Free Survival") |>
  coxph(ggsurvfit::Surv_CNSR() ~ ARM, data = _)

# put model in a summary table
tbl <- gtsummary::tbl_regression(mod, exponentiate = TRUE) |> 
  gtsummary::add_n(location = c('label', 'level')) |> 
  gtsummary::add_nevent(location = c('label', 'level'))


Characteristic N Event N HR1 95% CI1 p-value
Description of Planned Arm 254 6


    Placebo 86 3
    Xanomeline High Dose 84 2 3.00 0.39, 22.9 0.3
    Xanomeline Low Dose 84 1 1.27 0.11, 14.3 0.8
1 HR = Hazard Ratio, CI = Confidence Interval

When things go wrong 😱

What happens when statistics are un-calculable?

ard_gone_wrong <- 
  cards::ADSL |> 
  cards::ard_continuous(
    by = ARM,
    variable = AGEGR1,
    statistic = ~list(kurtosis = \(x) e1071::kurtosis(x))
  )
ard_gone_wrong
{cards} data frame: 3 x 10
  group1 group1_level variable stat_name stat_label stat   warning     error
1    ARM      Placebo   AGEGR1  kurtosis   kurtosis      argument… non-nume…
2    ARM    Xanomeli…   AGEGR1  kurtosis   kurtosis      argument… non-nume…
3    ARM    Xanomeli…   AGEGR1  kurtosis   kurtosis      argument… non-nume…
ℹ 2 more variables: context, fmt_fn
cards::print_ard_conditions(ard_gone_wrong)

Tables with {gtsummary}

How it started

  • Began to address reproducible issues while working in academia

  • Goal was to build a package to summarize study results with code that was both simple and customizable

  • First release in May 2019

How it’s going

  • The stats

    • 1,000,000+ installations from CRAN
    • 1000+ GitHub stars
    • 300+ contributors
    • ~50 code contributors

  • Won the 2021 American Statistical Association (ASA) Innovation in Programming Award

  • Agustin Calatroni and I won the 2024 Posit Pharma Table Contest by re-creating an entire CSR with the {gtsummary} package

{gtsummary} runs on ARDs!

Demographics Example

library(gtsummary)

tbl <- dplyr::filter(pharmaverseadam::adsl, SAFFL == "Y") |> 
  tbl_summary(
    by = TRT01A,
    include = c(AGE, AGEGR1),
    type = AGE ~ "continuous2",
    statistic = AGE ~ c("{mean} ({sd})", "{median} ({p25}, {p75})")
  ) |> 
  add_overall() |> 
  add_stat_label()
tbl
Characteristic Overall
N = 254
Placebo
N = 86
Xanomeline High Dose
N = 72
Xanomeline Low Dose
N = 96
Age



    Mean (SD) 75 (8) 75 (9) 74 (8) 76 (8)
    Median (Q1, Q3) 77 (70, 81) 76 (69, 82) 76 (70, 79) 78 (71, 82)
Pooled Age Group 1, n (%)



    >64 221 (87%) 72 (84%) 61 (85%) 88 (92%)
    18-64 33 (13%) 14 (16%) 11 (15%) 8 (8.3%)

Demographics Example

  • Extract the ARD from the table object
gather_ard(tbl) |> purrr::pluck("tbl_summary") |> dplyr::select(-gts_column)
{cards} data frame: 79 x 11
   group1 group1_level variable variable_level stat_name stat_label  stat
1  TRT01A      Placebo   AGEGR1            >64         n          n    72
2  TRT01A      Placebo   AGEGR1            >64         N          N    86
3  TRT01A      Placebo   AGEGR1            >64         p          % 0.837
4  TRT01A    Xanomeli…   AGEGR1            >64         n          n    61
5  TRT01A    Xanomeli…   AGEGR1            >64         N          N    72
6  TRT01A    Xanomeli…   AGEGR1            >64         p          % 0.847
7  TRT01A    Xanomeli…   AGEGR1            >64         n          n    88
8  TRT01A    Xanomeli…   AGEGR1            >64         N          N    96
9  TRT01A    Xanomeli…   AGEGR1            >64         p          % 0.917
10 TRT01A      Placebo   AGEGR1          18-64         n          n    14
ℹ 69 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 4 more variables: context, fmt_fn, warning, error

Demographics Example: ARD-first

dplyr::filter(pharmaverseadam::adsl, SAFFL == "Y") |> 
  cards::ard_stack(
    .by = TRT01A, .overall = TRUE, .attributes = TRUE,
    ard_continuous(variables = AGE),
    ard_categorical(variables = AGEGR1)
  ) |> 
  tbl_ard_summary(
    by = TRT01A,
    type = AGE ~ "continuous2",
    statistic = AGE ~ c("{mean} ({sd})", "{median} ({p25}, {p75})"),
    overall = TRUE
  ) |> 
  add_stat_label()
Characteristic Overall Placebo Xanomeline High Dose Xanomeline Low Dose
Age



    Mean (SD) 75.1 (8.2) 75.2 (8.6) 73.8 (7.9) 76.0 (8.1)
    Median (Q1, Q3) 77.0 (70.0, 81.0) 76.0 (69.0, 82.0) 75.5 (70.0, 79.0) 78.0 (71.0, 82.0)
Pooled Age Group 1, n (%)



    >64 221 (87.0%) 72 (83.7%) 61 (84.7%) 88 (91.7%)
    18-64 33 (13.0%) 14 (16.3%) 11 (15.3%) 8 (8.3%)

{gtsummary} extras

  • {gtsummary} tables are composable, meaning complex tables can be cobbled together one piece at a time and combined.

    • many other functions to create common structures, such as, tbl_continuous(), tbl_hierarchical(),tbl_cross(), tbl_wide_summary(), and many more

    • add_*() functions will add additional columns/summary statistics to an existing table.

    • tbl_merge() and tbl_stack() combine tow more more tables

    • and many more functions available for creating beautiful tables!🤩

  • Check out the R/Pharma Webinar for more information on {gtsummary} and {cards} too!

🕺🕺 ARD Team 🕺🕺