CDISC Analysis Results Data with {cards} + {gtsummary}

蘇丹杰 (Daniel D. Sjoberg)

Introduction

Acknowledgements

drawing

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA4.0).

蘇丹杰 (Daniel D. Sjoberg)

  • Senior Principal Data Scientist at Genentech/Roche

  • 13+ years at Memorial Sloan Kettering Cancer Center as a Biostatistician

  • Published 250+ papers in peer-reviewed journals and served on Editorial Board of European Urology

  • Written and contributed to many R packages available on CRAN, R Universe, and GitHub.

Outline


  • Analysis Results Standard (ARS)

  • Analysis Results Data (ARD)

  • {cards} + {cardx} R packages

  • {gtsummary} R Package

Analysis Results Standard

Analysis Results Standard (ARS)

Analysis Results Standard (ARS)

  • Emerging standard for prospectively encoding statistical analysis reporting pipeline in a machine-readable format.

  • Primary objectives are to leverage analysis results metadata to drive the automation of results and support storage, access, processing, traceability and reproducibility of results.

  • Logical model that describes analysis results and associated metadata.

  • Focus on concepts, not layout, e.g. the summary statistics, not how the results are shown in a table.

  • Learn more at https://www.cdisc.org/events/webinar/analysis-results-standard-public-review

Analysis Results Standard (ARS)

Example ARS Flow

Analysis Results Standard (ARS)

Example ARS Flow

Analysis Results Data

Analysis Results Data (ARD)

  • Encodes statistical analysis outcomes in a machine-readable format.

  • Primary objective is to streamline the processes of automation, ensuring reproducibility, promoting reusability, and enhancing traceability.

  • The ARD model specified how statistical results are saved into a structured format.

  • The ARD can be used to to subsequently create tables and figures.

Analysis Results Data (ARD)

  • After the initial creation of an ARD, the results can later be re-used again and again for subsequent reporting needs.

{cards}

{cards} R Package cards website

Let’s check out a simple example

library(cards)

# create ARD with default summary statistics
ard_continuous(ADSL, variables = AGE)
{cards} data frame: 8 x 8
  variable   context stat_name stat_label   stat fmt_fn
1      AGE continuo…         N          N    254      0
2      AGE continuo…      mean       Mean 75.087      1
3      AGE continuo…        sd         SD  8.246      1
4      AGE continuo…    median     Median     77      1
5      AGE continuo…       p25  25th Per…     70      1
6      AGE continuo…       p75  75th Per…     81      1
7      AGE continuo…       min        Min     51      1
8      AGE continuo…       max        Max     89      1
ℹ 2 more variables: warning, error

{cards}: ard_continuous() arguments

  • by: summary statistics are calculated by all combinations of the by variables, including unobserved factor levels

  • statistic: specify univariate summary statistics. Accepts any function, base R, from a package, or user-defined.

  • fmt_fn: Override the default formatting functions, e.g. when you need

ADSL |> 
  ard_continuous(
    variables = AGE,
    by = ARM,                               # stats by treatment arm
    statistic = ~list(mean = \(x) mean(x)), # return the mean
    fmt_fn = ~list(mean = 0)                # format the result
  ) |> 
  apply_fmt_fn() # add a character column of rounded results
{cards} data frame: 3 x 11
  group1 group1_level variable stat_name stat_label   stat stat_fmt
1    ARM      Placebo      AGE      mean       Mean 75.209       75
2    ARM    Xanomeli…      AGE      mean       Mean 74.381       74
3    ARM    Xanomeli…      AGE      mean       Mean 75.667       76
ℹ 4 more variables: context, fmt_fn, warning, error

{cards}: ard_categorical()

ADSL |> 
  ard_categorical(
    by = ARM,
    variables = AGEGR1
  ) |> 
  dplyr::filter(stat_name %in% c("n", "p")) |> # keep most common stats 
  print(n = 8)
{cards} data frame: 18 x 11
  group1 group1_level variable variable_level stat_name stat_label  stat
1    ARM      Placebo   AGEGR1            <65         n          n    14
2    ARM      Placebo   AGEGR1            <65         p          % 0.163
3    ARM    Xanomeli…   AGEGR1            <65         n          n    11
4    ARM    Xanomeli…   AGEGR1            <65         p          % 0.131
5    ARM    Xanomeli…   AGEGR1            <65         n          n     8
6    ARM    Xanomeli…   AGEGR1            <65         p          % 0.095
7    ARM      Placebo   AGEGR1            >80         n          n    30
8    ARM      Placebo   AGEGR1            >80         p          % 0.349
ℹ 10 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 4 more variables: context, fmt_fn, warning, error

Any unobserved levels of the variables will be present in the resulting ARD.

{cards}: Other Summary Functions

  • ard_hierarchical(): similar to ard_categorical(), but built for nested tabulations, e.g. AE terms within SOC

  • ard_dichotomous(): similar to ard_categorical(), and tabulates a single value of the variable

  • ard_complex(): similar to ard_continuous(), but the summary functions can be more complex and accepts other arguments like the full and subsetted (within the by groups) data sets.

  • ard_missing(): tabulates rates of missingness

The results from all these functions are entirely compatible with one another, and can be stacked into a single data frame.

{cards}: Other Functions

In addition to exporting functions to prepare summaries, {cards} exports many utilities for wrangling ARDs and creating new ARDs.

Constructing: bind_ard(), tidy_as_ard(), nest_for_ard(), check_ard_structure(), and many more

Wrangling: shuffle_ard(), get_ard_statistics(), replace_null_statistic(), etc.

{cardx}

{cardx} R Package cardx website

  • While {cards} performs basic (and very common) summaries, {cardx} exports ard_*() functions for more complex analytic results.

  • The list is growing, but we have functions for t-tests, Wilcoxon tests, standardized mean differences, ANOVA (including repeated measures), survey methods, confidence intervals for proportions and centrality estimates, and more.

  • Elegant solution for nearly every regression model type in the R ecosystem, where we can identify regression types (linear, logistic, Cox, etc.), identify the underlying variable names for categorical variables (different from the model terms), identify reference groups for categorical variables, and much much more.

{cardx}

Supported regression model types include:

betareg::betareg(), biglm::bigglm(), biglmm::bigglm(), brms::brm(), cmprsk::crr(), fixest::feglm(), fixest::femlm(), fixest::feNmlm(), fixest::feols(), gam::gam(), geepack::geeglm(), glmmTMB::glmmTMB(), lavaan::lavaan(), lfe::felm(), lme4::glmer.nb(), lme4::glmer(), lme4::lmer(), logitr::logitr(), MASS::glm.nb(), MASS::polr(), mgcv::gam(), mice::mira, mmrm::mmrm(), multgee::nomLORgee(), multgee::ordLORgee(), nnet::multinom(), ordinal::clm(), ordinal::clmm(), parsnip::model_fit, plm::plm(), pscl::hurdle(), pscl::zeroinfl(), rstanarm::stan_glm(), stats::aov(), stats::glm(), stats::lm(), stats::nls(), survey::svycoxph(), survey::svyglm(), survey::svyolr(), survival::cch(), survival::clogit(), survival::coxph(), survival::survreg(), tidycmprsk::crr(), VGAM::vglm().

{cardx} in Practice

In the example below, we’re adding a confidence interval around the rate of subjects who completed the study.

We’re using the default computation method (Wald) and confidence level (95%).

cards::ADSL |> 
  dplyr::mutate(COMPLETED = DCDECOD == "COMPLETED") |> 
  cardx::ard_proportion_ci(variables = COMPLETED) # using default CI method 'Wald'
{cards} data frame: 6 x 8
   variable   context  stat_name stat_label      stat fmt_fn
1 COMPLETED proporti…          N          N       254      0
2 COMPLETED proporti…   estimate   estimate     0.433      1
3 COMPLETED proporti…   conf.low   conf.low      0.37      1
4 COMPLETED proporti…  conf.high  conf.high     0.496      1
5 COMPLETED proporti… conf.level  conf.lev…      0.95      1
6 COMPLETED proporti…     method     method Wald Con…   <fn>
ℹ 2 more variables: warning, error

{cardx} in Practice: Gone Wrong 😱

What happens when statistics are un-calculable?

It’s one of my favorite features of {cards} and {cardx}

ard_gone_wrong <- 
  cards::ADSL |> 
  cards::ard_continuous(
    variable = AGEGR1,
    statistic = ~list(kurtosis = \(x) e1071::kurtosis(x))
  )
ard_gone_wrong
{cards} data frame: 1 x 8
  variable stat_name stat_label stat   warning     error
1   AGEGR1  kurtosis   kurtosis      argument… non-nume…
ℹ 2 more variables: context, fmt_fn
print_ard_conditions(ard_gone_wrong)

{cards} + {cardx}

  • These packages together provide robust set of tools for preparing ARDs.

  • Incredibly easy to extend them for your own specific cases.

  • But what’s next? I want a pretty table!

{gtsummary}

{gtsummary} Background gtsummary website

  • Born from my time as a biostatistician at Memorial Sloan Kettering Cancer Center.

  • I led a team of other statisticians and I wanted us to transition to R.

  • Began writing a package that eventually became {gtsummary} to help make this transition smooth.

  • Since then the package has grown and grown, and is now the most downloaded summary table package on CRAN.

  • Package the American Statistical Association’s (ASA) 2021 award for Innovation in Statistical Programming and Analytics

{gtsummary} R Package

{gtsummary} R Package

{cards} + {gtsummary}

How is this related to ARD and the {cards} package? 🤔🤔

  • The {cards} package does not present results and this is where the {gtsummary} package shines.

  • The {gtsummary} package is currently being refactored with a {cards} backend.

  • In addition to a refactor, adding new features making it easier to create common pharma tables.

{cards} + {gtsummary}

Stay Tuned!

  • Expect new releases of {cards}, {cardx}, and {gtsummary} soon!

  • These packages working together will serve as a strong combination for ARD-first TLG creation in the pharmaceutical space.

drawing

Mock Tables

  • With {cards}+{gtsummary} it’s easy to create bespoke table shells.
cards::ADSL |> 
  cards::ard_continuous(
    variables = AGE, 
    fmt_fn = ~list(everything() ~ \(x) "xx")
  ) |> 
  cards::apply_fmt_fn() |> 
  print(n = 3)
  variable   context stat_name stat_label   stat stat_fmt
1      AGE continuo…         N          N    254       xx
2      AGE continuo…      mean       Mean 75.087       xx
3      AGE continuo…        sd         SD  8.246       xx
  • Pass this ARD to card_summary() and the table will be populated with "xx" placeholders.

Mock Tables

Characteristic Placebo
N = xx
Xanomeline Low Dose
N = xx
Xanomeline High Dose
N = xx
Age


    Median (IQR) xx (xx, xx) xx (xx, xx) xx (xx, xx)
    Mean (SD) xx (xx) xx (xx) xx (xx)
    Range xx - xx xx - xx xx - xx
Age Group, n (%)


    <65 xx (xx.x%) xx (xx.x%) xx (xx.x%)
    65-80 xx (xx.x%) xx (xx.x%) xx (xx.x%)
    >80 xx (xx.x%) xx (xx.x%) xx (xx.x%)