ARD-first Tables • gtsummary

library(gtsummary)
library(cards)
theme_gtsummary_compact()
#> Setting theme "Compact"

Introduction

Analysis Results Datasets (ARDs) are part of the CDISC Analysis Results Standard, which aims to facilitate automation, reproducibility, reusability, and traceability of analysis results data (ARD). ARDs are highly structured and generalized data frame objects in which store the results of both simple and complex statistical results. The {gtsummary} package utilizes ARDs (via the {cards} and {cardx} packages) to perform all calculations.

In this tutorial, we will review how to use the native {gtsummary} functions to build standard and highly customized tables. There are two basic approaches to creating summary tables utilizing ARDs:

As ARDs power every calculation and tabulation in the {gtsummary}, ARDs can be extracted from any table created with the tbl_*() functions (and their add-ons).
One can also create ARDs first, then pass the ARD to a tbl_ard_*() function that will convert the ARD to a summary table.

Both methods will be covered in this tutorial.

Extract ARD summary table

Many standard tables are simple to create with tbl_*() functions, such as, demographics tables, adverse event summary tables, and more. After the table is created, the ARD can be extracted using the gather_ard() function.

Demographics Summary

Begin by building the summary table with tbl_summary().

tbl_demo <-
  ADSL |> 
  dplyr::mutate(AGEGR1 = factor(AGEGR1, levels = c("<65", "65-80", ">80"))) |> 
  # building summary table by treatment arm
  tbl_summary(
    by = ARM, 
    include = c(AGE, AGEGR1, SEX),
    type = list(AGE = "continuous2"),
    statistic = all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}"),
    label = list(AGEGR1 = "Age Group")
  ) |> 
  # add an overall column with all treatments combined
  add_overall() |> 
  add_stat_label()
tbl_demo

Characteristic	Overall N = 254	Placebo N = 86	Xanomeline High Dose N = 84	Xanomeline Low Dose N = 84
Age
Mean (SD)	75 (8)	75 (9)	74 (8)	76 (8)
Median (Q1, Q3)	77 (70, 81)	76 (69, 82)	76 (71, 80)	78 (71, 82)
Min, Max	51, 89	52, 89	56, 88	51, 88
Age Group, n (%)
<65	33 (13%)	14 (16%)	11 (13%)	8 (9.5%)
65-80	144 (57%)	42 (49%)	55 (65%)	47 (56%)
>80	77 (30%)	30 (35%)	18 (21%)	29 (35%)
Sex, n (%)
F	143 (56%)	53 (62%)	40 (48%)	50 (60%)
M	111 (44%)	33 (38%)	44 (52%)	34 (40%)

Now that we have a summary table, we can extract and save the ARD.

gather_ard(tbl_demo) |> bind_ard()
#> ℹ 8 rows with duplicated statistic values have been removed.
#> • See cards::bind_ard(.distinct) (`?cards::bind_ard()`) for details.
#> {cards} data frame: 167 x 12
#>    group1 group1_level variable variable_level stat_name stat_label  stat
#> 1     ARM      Placebo   AGEGR1            <65         n          n    14
#> 2     ARM      Placebo   AGEGR1            <65         N          N    86
#> 3     ARM      Placebo   AGEGR1            <65         p          % 0.163
#> 4     ARM      Placebo   AGEGR1          65-80         n          n    42
#> 5     ARM      Placebo   AGEGR1          65-80         N          N    86
#> 6     ARM      Placebo   AGEGR1          65-80         p          % 0.488
#> 7     ARM      Placebo   AGEGR1            >80         n          n    30
#> 8     ARM      Placebo   AGEGR1            >80         N          N    86
#> 9     ARM      Placebo   AGEGR1            >80         p          % 0.349
#> 10    ARM      Placebo      SEX              F         n          n    53
#> ℹ 157 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 5 more variables: context, fmt_fun, warning, error, gts_column

Adverse Event Summary

The adverse event example is similar to the example above; instead of using tbl_summary() we use tbl_hierarchical().

tbl_ae <-
  ADAE |>
  # filter the data frame to print fewer AEs
  dplyr::filter(
    AESOC %in% unique(cards::ADAE$AESOC)[1:3],
    AETERM %in% unique(cards::ADAE$AETERM)[1:3]
  ) |> 
  # create AE summary table
  tbl_hierarchical(
    variables = c(AESOC, AETERM),
    by = TRTA,
    denominator = cards::ADSL |> mutate(TRTA = ARM),
    id = USUBJID,
    overall_row = TRUE,
    label = list(..ard_hierarchical_overall.. = "Any Adverse Event")
  ) |> 
  # add a column with overall estimates
  add_overall()
tbl_ae

Primary System Organ Class Reported Term for the Adverse Event	Overall N = 254¹	Placebo N = 86¹	Xanomeline High Dose N = 84¹	Xanomeline Low Dose N = 84¹
Any Adverse Event	70 (28%)	16 (19%)	27 (32%)	27 (32%)
GASTROINTESTINAL DISORDERS	18 (7.1%)	9 (10%)	4 (4.8%)	5 (6.0%)
DIARRHOEA	18 (7.1%)	9 (10%)	4 (4.8%)	5 (6.0%)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	57 (22%)	8 (9.3%)	25 (30%)	24 (29%)
APPLICATION SITE ERYTHEMA	30 (12%)	3 (3.5%)	15 (18%)	12 (14%)
APPLICATION SITE PRURITUS	50 (20%)	6 (7.0%)	22 (26%)	22 (26%)
¹ n (%)


# return ARDs
gather_ard(tbl_ae) |> bind_ard()
#> {cards} data frame: 82 x 15
#>    group1 group1_level group2 group2_level                     variable
#> 1    <NA>                <NA>                                      TRTA
#> 2    <NA>                <NA>                                      TRTA
#> 3    <NA>                <NA>                                      TRTA
#> 4    <NA>                <NA>                                      TRTA
#> 5    <NA>                <NA>                                      TRTA
#> 6    <NA>                <NA>                                      TRTA
#> 7    <NA>                <NA>                                      TRTA
#> 8    <NA>                <NA>                                      TRTA
#> 9    <NA>                <NA>                                      TRTA
#> 10   TRTA      Placebo   <NA>              ..ard_hierarchical_overall..
#>    variable_level stat_name stat_label  stat stat_fmt
#> 1         Placebo         n          n    86       86
#> 2         Placebo         N          N   254      254
#> 3         Placebo         p          % 0.339     33.9
#> 4       Xanomeli…         n          n    84       84
#> 5       Xanomeli…         N          N   254      254
#> 6       Xanomeli…         p          % 0.331     33.1
#> 7       Xanomeli…         n          n    84       84
#> 8       Xanomeli…         N          N   254      254
#> 9       Xanomeli…         p          % 0.331     33.1
#> 10           TRUE         n          n    16       16
#> ℹ 72 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 5 more variables: context, fmt_fun, warning, error, gts_column

Other summaries

Other summary functions available include

tbl_cross() for cross tabulations
tbl_continuous() for summaries of continuous variables stratified by two other categorical variables
tbl_wide_summary() for statistics represented in a wide table format, that is statistics in separate columns
tbl_survfit() for survival endpoint summaries
tbl_regression() for regression model summaries
tbl_likert() for Likert-scale summaries

ARD-first summary tables

While the above examples are simple, there are cases when we must use a two step process of building our ARD, then converting the ARD to a summary table. Two common instances where one would want to create a table from an ARD are 1. for tables that include more complex statistical results, 2. for re-use purposes (e.g. extract an ARD from a previously built table, and modify it for another purpose). For this ARD-first approach, {gtsummary} has tbl_ard_*() functions to generate summary tables.

tbl_ard_summary() for ARDs with descriptive statistics for continuous, categorical and dichotomous variables
tbl_ard_continuous() for ARDs summarizing continuous variables
tbl_ard_wide_summary() for ARD statistics represented in a wide table format - in separate columns
tbl_ard_hierarchical() for ARDs containing nested or hierarchical data structures

Demographics Summary

In this example, we will build a simple demographics and baseline characteristics table as outlined in the FDA Standard Safety Tables Guidelines. This table has variables: a continuous variable summary for AGE, a categorical variable summaries for AGEGR1 and SEX.

Data ➡ ARD

The {cards} package can be utilized to create the ARD from a data frame. The package includes functions ard_continuous() for continuous summaries, ard_categorical() for categorical summaries, and ard_dichotomous() for dichotomous variables (and more).

The package also exports a helper function, ard_stack() to simultaneously build these summaries along with optional ancillary results for a nicer display.

ard_demo <-
  ADSL |> 
  dplyr::mutate(AGEGR1 = factor(AGEGR1, levels = c("<65", "65-80", ">80"))) |> 
  ard_stack(
    # stratify all results by ARM
    .by = ARM, 
    # these are the results that will be calculated
    ard_continuous(variables = "AGE"),
    ard_categorical(variables = c("AGEGR1","SEX")),
    # optional arguments for additional results
    .attributes = TRUE,
    .total_n = TRUE,
    .overall = TRUE
  )
ard_demo
#> {cards} data frame: 111 x 11
#>    group1 group1_level variable variable_level stat_name stat_label   stat
#> 1     ARM      Placebo      AGE                        N          N     86
#> 2     ARM      Placebo      AGE                     mean       Mean 75.209
#> 3     ARM      Placebo      AGE                       sd         SD   8.59
#> 4     ARM      Placebo      AGE                   median     Median     76
#> 5     ARM      Placebo      AGE                      p25         Q1     69
#> 6     ARM      Placebo      AGE                      p75         Q3     82
#> 7     ARM      Placebo      AGE                      min        Min     52
#> 8     ARM      Placebo      AGE                      max        Max     89
#> 9     ARM      Placebo   AGEGR1            <65         n          n     14
#> 10    ARM      Placebo   AGEGR1            <65         N          N     86
#> ℹ 101 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 4 more variables: context, fmt_fun, warning, error

The optional arguments that can be specified to improve the appearance of the table. - .attributes summary table will utilize the column label attributes, if available - .total_n the total N is saved internally, and will be used in the printed table. - .overall the operations will be repeated without .by variable - .missing when missing results are included, users can include missing counts or rates for the variables.

ARD ➡ Table

After the ARD has been created, we can now create the summary table with tbl_ard_summary().

ard_demo |> 
  tbl_ard_summary(
    by = ARM, 
    overall = TRUE,
    type = AGE ~ "continuous2",
    statistic = all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}"),
    label = list(AGEGR1 = "Age Group")
  ) |> 
  add_stat_label() |> 
  modify_header(all_stat_cols() ~ "**{level}**  \nN= {n}")

Characteristic	Overall N= 254	Placebo N= 86	Xanomeline High Dose N= 84	Xanomeline Low Dose N= 84
Age
Mean (SD)	75.1 (8.2)	75.2 (8.6)	74.4 (7.9)	75.7 (8.3)
Median (Q1, Q3)	77.0 (70.0, 81.0)	76.0 (69.0, 82.0)	76.0 (70.5, 80.0)	77.5 (71.0, 82.0)
Min, Max	51.0, 89.0	52.0, 89.0	56.0, 88.0	51.0, 88.0
Age Group, n (%)
<65	33 (13.0%)	14 (16.3%)	11 (13.1%)	8 (9.5%)
65-80	144 (56.7%)	42 (48.8%)	55 (65.5%)	47 (56.0%)
>80	77 (30.3%)	30 (34.9%)	18 (21.4%)	29 (34.5%)
Sex, n (%)
F	143 (56.3%)	53 (61.6%)	40 (47.6%)	50 (59.5%)
M	111 (43.7%)	33 (38.4%)	44 (52.4%)	34 (40.5%)

Complex Summaries

The ARD to Table pipeline is most convenient when trying to consolidate multiple analysis steps into an ARD to feed only the relevant stats to the table building machinery. In the example below, we create a table that mixing three types of analysis for assessing outcomes after treatment: Kaplan-Meier estimate of survival, mean marker levels with confidence intervals, and rate of tumor response with confidence intervals.

First, we will create an ARD for each of these analyses, then combine them with cards::bind_ard().

# ARD with the Kaplan-Meier survival estimates
ard_survival <-
  trial |> 
  cardx::ard_survival_survfit(
    y = survival::Surv(ttdeath, death),
    variables = "trt",
    times = c(12, 24)
  ) |> 
  # retain survival time statistics
  dplyr::filter(variable == "time") |> 
  update_ard_fmt_fun(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fun = "xx%")

# ARD with the mean post-treatment marker level with 95%CI
ard_marker_level <-
  cardx::ard_stats_t_test_onesample(trial, variables = marker, by = trt) |> 
  update_ard_fmt_fun(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fun = label_style_sigfig(digits = 2))

# ARD with the post-treatment response rate with 95%CI
ard_tumor_response <-
  cardx::ard_categorical_ci(trial, by = trt, variables = response, method = "wilson") |> 
  update_ard_fmt_fun(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fun = "xx%")


# combine all the ARDs into a single ARD for the outcomes
ard_outcomes <- 
  cards::bind_ard(
    ard_survival,
    ard_marker_level, 
    ard_tumor_response
  )

If you inspect the ARDs, you’ll see that these analytic results have a similar structure to the simple ARDs we extracted from the tbl_summary() results above.

The cardx::ard_survival_survfit() ARD looks like the ard_categorical() result.
The cardx::ard_stats_t_test_onesample() ARD looks like the ard_continuous() result.
The cardx::ard_categorical_ci() ARD looks like the ard_dichotomous() result.

With the created ARD, we can now build a summary table.

ard_outcomes |> 
  tbl_ard_summary(
    by = trt, 
    type = response ~ "dichotomous",
    statistic =
      list(
        c(time, response) ~ "{estimate}% (95% CI {conf.low}%, {conf.high}%)",
        marker ~ "{estimate} (95% CI {conf.low}, {conf.high})"
      ),
    label = 
      list(time = "Overal Survival, months",
           marker = "Tumor Marker",
           response = "Tumor Response")
  ) |> 
  remove_footnote_header(columns = everything()) |> 
  modify_abbreviation(abbreviation = "CI = Confidence Interval") |> 
  modify_footnote_body(
    footnote = "Kaplan-Meier estimate", 
    columns = label, 
    rows = variable == "time" & row_type == "label"
  ) |> 
  modify_footnote_body("t-distribution based mean and CI", columns = "label", rows = variable == "marker") |> 
  modify_footnote_body("Wilson CI", columns = "label", rows = variable == "response")

Characteristic	Drug A	Drug B
Overal Survival, months¹
12	91% (95% CI 85%, 97%)	86% (95% CI 80%, 93%)
24	47% (95% CI 38%, 58%)	41% (95% CI 33%, 52%)
Tumor Marker²	1.0 (95% CI 0.83, 1.2)	0.82 (95% CI 0.65, 0.99)
Tumor Response³	29% (95% CI 21%, 39%)	34% (95% CI 25%, 43%)
Abbreviation: CI = Confidence Interval
¹ Kaplan-Meier estimate
² t-distribution based mean and CI
³ Wilson CI

Final Thoughts

When creating the a custom summary table, you will want to utilize the functions with the tbl_ard_*() prefix. It will be important to familiarize yourself with the table structures that each of these functions produce, so you know which to use to build your table.

If your table is a combination or mix of table types structures, you can build each part of your table separately and use tbl_stack() and tbl_merge() to cobble together your final table.

Finally, some tables are entirely unique and would be difficult to create under any framework. In these cases, it’s often much easier to build a data frame and then convert it to a gtsummary table with as_gtsummary(). Once converted, you can take advantage of styling that is available for all gtsummary tables.