Skip to contents
library(gtsummary)
library(cards)
theme_gtsummary_compact()
#> Setting theme "Compact"

Introduction

Analysis Results Datasets (ARDs) are part of the CDISC Analysis Results Standard, which aims to facilitate automation, reproducibility, reusability, and traceability of analysis results data (ARD). ARDs are highly structured and generalized data frame objects in which store the results of both simple and complex statistical results. The {gtsummary} package utilizes ARDs (via the {cards} and {cardx} packages) to perform all calculations.

In this tutorial, we will review how to use the native {gtsummary} functions to build standard and highly customized tables. There are two basic approaches to creating summary tables utilizing ARDs:

  1. As ARDs power every calculation and tabulation in the {gtsummary}, ARDs can be extracted from any table created with the tbl_*() functions (and their add-ons).
  2. One can also create ARDs first, then pass the ARD to a tbl_ard_*() function that will convert the ARD to a summary table.

Both methods will be covered in this tutorial.

Extract ARD summary table

Many standard tables are simple to create with tbl_*() functions, such as, demographics tables, adverse event summary tables, and more. After the table is created, the ARD can be extracted using the gather_ard() function.

Demographics Summary

Begin by building the summary table with tbl_summary().

tbl_demo <-
  ADSL |> 
  dplyr::mutate(AGEGR1 = factor(AGEGR1, levels = c("<65", "65-80", ">80"))) |> 
  # building summary table by treatment arm
  tbl_summary(
    by = ARM, 
    include = c(AGE, AGEGR1, SEX),
    type = list(AGE = "continuous2"),
    statistic = all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}"),
    label = list(AGEGR1 = "Age Group")
  ) |> 
  # add an overall column with all treatments combined
  add_overall() |> 
  add_stat_label()
tbl_demo
Characteristic Overall
N = 254
Placebo
N = 86
Xanomeline High Dose
N = 84
Xanomeline Low Dose
N = 84
Age



    Mean (SD) 75 (8) 75 (9) 74 (8) 76 (8)
    Median (Q1, Q3) 77 (70, 81) 76 (69, 82) 76 (71, 80) 78 (71, 82)
    Min, Max 51, 89 52, 89 56, 88 51, 88
Age Group, n (%)



    <65 33 (13%) 14 (16%) 11 (13%) 8 (9.5%)
    65-80 144 (57%) 42 (49%) 55 (65%) 47 (56%)
    >80 77 (30%) 30 (35%) 18 (21%) 29 (35%)
Sex, n (%)



    F 143 (56%) 53 (62%) 40 (48%) 50 (60%)
    M 111 (44%) 33 (38%) 44 (52%) 34 (40%)

Now that we have a summary table, we can extract and save the ARD.

gather_ard(tbl_demo) |> bind_ard()
#>  8 rows with duplicated statistic values have been removed.
#>  See cards::bind_ard(.distinct) (`?cards::bind_ard()`) for details.
#> {cards} data frame: 167 x 12
#>    group1 group1_level variable variable_level stat_name stat_label  stat
#> 1     ARM      Placebo   AGEGR1            <65         n          n    14
#> 2     ARM      Placebo   AGEGR1            <65         N          N    86
#> 3     ARM      Placebo   AGEGR1            <65         p          % 0.163
#> 4     ARM      Placebo   AGEGR1          65-80         n          n    42
#> 5     ARM      Placebo   AGEGR1          65-80         N          N    86
#> 6     ARM      Placebo   AGEGR1          65-80         p          % 0.488
#> 7     ARM      Placebo   AGEGR1            >80         n          n    30
#> 8     ARM      Placebo   AGEGR1            >80         N          N    86
#> 9     ARM      Placebo   AGEGR1            >80         p          % 0.349
#> 10    ARM      Placebo      SEX              F         n          n    53
#>  157 more rows
#>  Use `print(n = ...)` to see more rows
#>  5 more variables: context, fmt_fn, warning, error, gts_column

Adverse Event Summary

The adverse event example is similar to the example above; instead of using tbl_summary() we use tbl_hierarchical().

tbl_ae <-
  ADAE |>
  # filter the data frame to print fewer AEs
  dplyr::filter(
    AESOC %in% unique(cards::ADAE$AESOC)[1:3],
    AETERM %in% unique(cards::ADAE$AETERM)[1:3]
  ) |> 
  # create AE summary table
  tbl_hierarchical(
    variables = c(AESOC, AETERM),
    by = TRTA,
    denominator = cards::ADSL |> mutate(TRTA = ARM),
    id = USUBJID,
    overall_row = TRUE,
    label = list(..ard_hierarchical_overall.. = "Any Adverse Event")
  ) |> 
  # add a column with overall estimates
  add_overall()
tbl_ae
Primary System Organ Class
    Reported Term for the Adverse Event
Overall
N = 254
1
Placebo
N = 86
1
Xanomeline High Dose
N = 84
1
Xanomeline Low Dose
N = 84
1
Any Adverse Event 70 (28%) 16 (19%) 27 (32%) 27 (32%)
GASTROINTESTINAL DISORDERS 18 (7.1%) 9 (10%) 4 (4.8%) 5 (6.0%)
    DIARRHOEA 18 (7.1%) 9 (10%) 4 (4.8%) 5 (6.0%)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS 57 (22%) 8 (9.3%) 25 (30%) 24 (29%)
    APPLICATION SITE ERYTHEMA 30 (12%) 3 (3.5%) 15 (18%) 12 (14%)
    APPLICATION SITE PRURITUS 50 (20%) 6 (7.0%) 22 (26%) 22 (26%)
1 n (%)

# return ARDs
gather_ard(tbl_ae) |> bind_ard()
#> {cards} data frame: 82 x 15
#>    group1 group1_level group2 group2_level                     variable
#> 1    <NA>                <NA>                                      TRTA
#> 2    <NA>                <NA>                                      TRTA
#> 3    <NA>                <NA>                                      TRTA
#> 4    <NA>                <NA>                                      TRTA
#> 5    <NA>                <NA>                                      TRTA
#> 6    <NA>                <NA>                                      TRTA
#> 7    <NA>                <NA>                                      TRTA
#> 8    <NA>                <NA>                                      TRTA
#> 9    <NA>                <NA>                                      TRTA
#> 10   TRTA      Placebo   <NA>              ..ard_hierarchical_overall..
#>    variable_level stat_name stat_label  stat stat_fmt
#> 1         Placebo         n          n    86       86
#> 2         Placebo         N          N   254      254
#> 3         Placebo         p          % 0.339     33.9
#> 4       Xanomeli…         n          n    84       84
#> 5       Xanomeli…         N          N   254      254
#> 6       Xanomeli…         p          % 0.331     33.1
#> 7       Xanomeli…         n          n    84       84
#> 8       Xanomeli…         N          N   254      254
#> 9       Xanomeli…         p          % 0.331     33.1
#> 10           TRUE         n          n    16       16
#>  72 more rows
#>  Use `print(n = ...)` to see more rows
#>  5 more variables: context, fmt_fn, warning, error, gts_column

Other summaries

Other summary functions available include

ARD-first summary tables

While the above examples are simple, there are cases when we must use a two step process of building our ARD, then converting the ARD to a summary table. Two common instances where one would want to create a table from an ARD are 1. for tables that include more complex statistical results, 2. for re-use purposes (e.g. extract an ARD from a previously built table, and modify it for another purpose). For this ARD-first approach, {gtsummary} has tbl_ard_*() functions to generate summary tables.

Demographics Summary

In this example, we will build a simple demographics and baseline characteristics table as outlined in the FDA Standard Safety Tables Guidelines. This table has variables: a continuous variable summary for AGE, a categorical variable summaries for AGEGR1 and SEX.

Data ➡ ARD

The {cards} package can be utilized to create the ARD from a data frame. The package includes functions ard_continuous() for continuous summaries, ard_categorical() for categorical summaries, and ard_dichotomous() for dichotomous variables (and more).

The package also exports a helper function, ard_stack() to simultaneously build these summaries along with optional ancillary results for a nicer display.

ard_demo <-
  ADSL |> 
  dplyr::mutate(AGEGR1 = factor(AGEGR1, levels = c("<65", "65-80", ">80"))) |> 
  ard_stack(
    # stratify all results by ARM
    .by = ARM, 
    # these are the results that will be calculated
    ard_continuous(variables = "AGE"),
    ard_categorical(variables = c("AGEGR1","SEX")),
    # optional arguments for additional results
    .attributes = TRUE,
    .total_n = TRUE,
    .overall = TRUE
  )
ard_demo
#> {cards} data frame: 111 x 11
#>    group1 group1_level variable variable_level stat_name stat_label   stat
#> 1     ARM      Placebo      AGE                        N          N     86
#> 2     ARM      Placebo      AGE                     mean       Mean 75.209
#> 3     ARM      Placebo      AGE                       sd         SD   8.59
#> 4     ARM      Placebo      AGE                   median     Median     76
#> 5     ARM      Placebo      AGE                      p25         Q1     69
#> 6     ARM      Placebo      AGE                      p75         Q3     82
#> 7     ARM      Placebo      AGE                      min        Min     52
#> 8     ARM      Placebo      AGE                      max        Max     89
#> 9     ARM      Placebo   AGEGR1            <65         n          n     14
#> 10    ARM      Placebo   AGEGR1            <65         N          N     86
#>  101 more rows
#>  Use `print(n = ...)` to see more rows
#>  4 more variables: context, fmt_fn, warning, error

The optional arguments that can be specified to improve the appearance of the table. - .attributes summary table will utilize the column label attributes, if available - .total_n the total N is saved internally, and will be used in the printed table. - .overall the operations will be repeated without .by variable - .missing when missing results are included, users can include missing counts or rates for the variables.

ARD ➡ Table

After the ARD has been created, we can now create the summary table with tbl_ard_summary().

ard_demo |> 
  tbl_ard_summary(
    by = ARM, 
    overall = TRUE,
    type = AGE ~ "continuous2",
    statistic = all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}"),
    label = list(AGEGR1 = "Age Group")
  ) |> 
  add_stat_label() |> 
  modify_header(all_stat_cols() ~ "**{level}**  \nN= {n}")
Characteristic Overall
N= 254
Placebo
N= 86
Xanomeline High Dose
N= 84
Xanomeline Low Dose
N= 84
Age



    Mean (SD) 75.1 (8.2) 75.2 (8.6) 74.4 (7.9) 75.7 (8.3)
    Median (Q1, Q3) 77.0 (70.0, 81.0) 76.0 (69.0, 82.0) 76.0 (70.5, 80.0) 77.5 (71.0, 82.0)
    Min, Max 51.0, 89.0 52.0, 89.0 56.0, 88.0 51.0, 88.0
Age Group, n (%)



    <65 33 (13.0%) 14 (16.3%) 11 (13.1%) 8 (9.5%)
    65-80 144 (56.7%) 42 (48.8%) 55 (65.5%) 47 (56.0%)
    >80 77 (30.3%) 30 (34.9%) 18 (21.4%) 29 (34.5%)
Sex, n (%)



    F 143 (56.3%) 53 (61.6%) 40 (47.6%) 50 (59.5%)
    M 111 (43.7%) 33 (38.4%) 44 (52.4%) 34 (40.5%)

Complex Summaries

The ARD to Table pipeline is most convenient when trying to consolidate multiple analysis steps into an ARD to feed only the relevant stats to the table building machinery. In the example below, we create a table that mixing three types of analysis for assessing outcomes after treatment: Kaplan-Meier estimate of survival, mean marker levels with confidence intervals, and rate of tumor response with confidence intervals.

First, we will create an ARD for each of these analyses, then combine them with cards::bind_ard().

# ARD with the Kaplan-Meier survival estimates
ard_survival <-
  trial |> 
  cardx::ard_survival_survfit(
    y = survival::Surv(ttdeath, death),
    variables = "trt",
    times = c(12, 24)
  ) |> 
  # retain survival time statistics
  dplyr::filter(variable == "time") |> 
  update_ard_fmt_fn(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fn = "xx%")

# ARD with the mean post-treatment marker level with 95%CI
ard_marker_level <-
  cardx::ard_stats_t_test_onesample(trial, variables = marker, by = trt) |> 
  update_ard_fmt_fn(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fn = label_style_sigfig(digits = 2))

# ARD with the post-treatment response rate with 95%CI
ard_tumor_response <-
  cardx::ard_categorical_ci(trial, by = trt, variables = response, method = "wilson") |> 
  update_ard_fmt_fn(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fn = "xx%")


# combine all the ARDs into a single ARD for the outcomes
ard_outcomes <- 
  cards::bind_ard(
    ard_survival,
    ard_marker_level, 
    ard_tumor_response
  )

If you inspect the ARDs, you’ll see that these analytic results have a similar structure to the simple ARDs we extracted from the tbl_summary() results above.

With the created ARD, we can now build a summary table.

ard_outcomes |> 
  tbl_ard_summary(
    by = trt, 
    type = response ~ "dichotomous",
    statistic =
      list(
        c(time, response) ~ "{estimate}% (95% CI {conf.low}%, {conf.high}%)",
        marker ~ "{estimate} (95% CI {conf.low}, {conf.high})"
      ),
    label = 
      list(time = "Overal Survival, months",
           marker = "Tumor Marker",
           response = "Tumor Response")
  ) |> 
  remove_footnote_header(columns = everything()) |> 
  modify_abbreviation(abbreviation = "CI = Confidence Interval") |> 
  modify_footnote_body(
    footnote = "Kaplan-Meier estimate", 
    columns = label, 
    rows = variable == "time" & row_type == "label"
  ) |> 
  modify_footnote_body("t-distribution based mean and CI", columns = "label", rows = variable == "marker") |> 
  modify_footnote_body("Wilson CI", columns = "label", rows = variable == "response")
Characteristic Drug A Drug B
Overal Survival, months1

    12 91% (95% CI 85%, 97%) 86% (95% CI 80%, 93%)
    24 47% (95% CI 38%, 58%) 41% (95% CI 33%, 52%)
Tumor Marker2 1.0 (95% CI 0.83, 1.2) 0.82 (95% CI 0.65, 0.99)
Tumor Response3 29% (95% CI 21%, 39%) 34% (95% CI 25%, 43%)
Abbreviation: CI = Confidence Interval
1 Kaplan-Meier estimate
2 t-distribution based mean and CI
3 Wilson CI

Final Thoughts

When creating the a custom summary table, you will want to utilize the functions with the tbl_ard_*() prefix. It will be important to familiarize yourself with the table structures that each of these functions produce, so you know which to use to build your table.

If your table is a combination or mix of table types structures, you can build each part of your table separately and use tbl_stack() and tbl_merge() to cobble together your final table.

Finally, some tables are entirely unique and would be difficult to create under any framework. In these cases, it’s often much easier to build a data frame and then convert it to a gtsummary table with as_gtsummary(). Once converted, you can take advantage of styling that is available for all gtsummary tables.