library(gtsummary)
library(cards)
theme_gtsummary_compact()
#> Setting theme "Compact"
Introduction
Analysis Results Datasets (ARDs) are part of the CDISC Analysis Results Standard, which aims to facilitate automation, reproducibility, reusability, and traceability of analysis results data (ARD). ARDs are highly structured and generalized data frame objects in which store the results of both simple and complex statistical results. The {gtsummary} package utilizes ARDs (via the {cards} and {cardx} packages) to perform all calculations.
In this tutorial, we will review how to use the native {gtsummary} functions to build standard and highly customized tables. There are two basic approaches to creating summary tables utilizing ARDs:
- As ARDs power every calculation and tabulation in the {gtsummary},
ARDs can be extracted from any table created with the
tbl_*()
functions (and their add-ons). - One can also create ARDs first, then pass the ARD to a
tbl_ard_*()
function that will convert the ARD to a summary table.
Both methods will be covered in this tutorial.
Extract ARD summary table
Many standard tables are simple to create with tbl_*()
functions, such as, demographics tables, adverse event summary tables,
and more. After the table is created, the ARD can be extracted using the
gather_ard()
function.
Demographics Summary
Begin by building the summary table with
tbl_summary()
.
tbl_demo <-
ADSL |>
dplyr::mutate(AGEGR1 = factor(AGEGR1, levels = c("<65", "65-80", ">80"))) |>
# building summary table by treatment arm
tbl_summary(
by = ARM,
include = c(AGE, AGEGR1, SEX),
type = list(AGE = "continuous2"),
statistic = all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}"),
label = list(AGEGR1 = "Age Group")
) |>
# add an overall column with all treatments combined
add_overall() |>
add_stat_label()
tbl_demo
Characteristic | Overall N = 254 |
Placebo N = 86 |
Xanomeline High Dose N = 84 |
Xanomeline Low Dose N = 84 |
---|---|---|---|---|
Age | ||||
Mean (SD) | 75 (8) | 75 (9) | 74 (8) | 76 (8) |
Median (Q1, Q3) | 77 (70, 81) | 76 (69, 82) | 76 (71, 80) | 78 (71, 82) |
Min, Max | 51, 89 | 52, 89 | 56, 88 | 51, 88 |
Age Group, n (%) | ||||
<65 | 33 (13%) | 14 (16%) | 11 (13%) | 8 (9.5%) |
65-80 | 144 (57%) | 42 (49%) | 55 (65%) | 47 (56%) |
>80 | 77 (30%) | 30 (35%) | 18 (21%) | 29 (35%) |
Sex, n (%) | ||||
F | 143 (56%) | 53 (62%) | 40 (48%) | 50 (60%) |
M | 111 (44%) | 33 (38%) | 44 (52%) | 34 (40%) |
Now that we have a summary table, we can extract and save the ARD.
gather_ard(tbl_demo) |> bind_ard()
#> ℹ 8 rows with duplicated statistic values have been removed.
#> • See cards::bind_ard(.distinct) (`?cards::bind_ard()`) for details.
#> {cards} data frame: 167 x 12
#> group1 group1_level variable variable_level stat_name stat_label stat
#> 1 ARM Placebo AGEGR1 <65 n n 14
#> 2 ARM Placebo AGEGR1 <65 N N 86
#> 3 ARM Placebo AGEGR1 <65 p % 0.163
#> 4 ARM Placebo AGEGR1 65-80 n n 42
#> 5 ARM Placebo AGEGR1 65-80 N N 86
#> 6 ARM Placebo AGEGR1 65-80 p % 0.488
#> 7 ARM Placebo AGEGR1 >80 n n 30
#> 8 ARM Placebo AGEGR1 >80 N N 86
#> 9 ARM Placebo AGEGR1 >80 p % 0.349
#> 10 ARM Placebo SEX F n n 53
#> ℹ 157 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 5 more variables: context, fmt_fn, warning, error, gts_column
Adverse Event Summary
The adverse event example is similar to the example above; instead of
using tbl_summary()
we use
tbl_hierarchical()
.
tbl_ae <-
ADAE |>
# filter the data frame to print fewer AEs
dplyr::filter(
AESOC %in% unique(cards::ADAE$AESOC)[1:3],
AETERM %in% unique(cards::ADAE$AETERM)[1:3]
) |>
# create AE summary table
tbl_hierarchical(
variables = c(AESOC, AETERM),
by = TRTA,
denominator = cards::ADSL |> mutate(TRTA = ARM),
id = USUBJID,
overall_row = TRUE,
label = list(..ard_hierarchical_overall.. = "Any Adverse Event")
) |>
# add a column with overall estimates
add_overall()
tbl_ae
Primary System Organ Class Reported Term for the Adverse Event |
Overall N = 2541 |
Placebo N = 861 |
Xanomeline High Dose N = 841 |
Xanomeline Low Dose N = 841 |
---|---|---|---|---|
Any Adverse Event | 70 (28%) | 16 (19%) | 27 (32%) | 27 (32%) |
GASTROINTESTINAL DISORDERS | 18 (7.1%) | 9 (10%) | 4 (4.8%) | 5 (6.0%) |
DIARRHOEA | 18 (7.1%) | 9 (10%) | 4 (4.8%) | 5 (6.0%) |
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS | 57 (22%) | 8 (9.3%) | 25 (30%) | 24 (29%) |
APPLICATION SITE ERYTHEMA | 30 (12%) | 3 (3.5%) | 15 (18%) | 12 (14%) |
APPLICATION SITE PRURITUS | 50 (20%) | 6 (7.0%) | 22 (26%) | 22 (26%) |
1 n (%) |
# return ARDs
gather_ard(tbl_ae) |> bind_ard()
#> {cards} data frame: 82 x 15
#> group1 group1_level group2 group2_level variable
#> 1 <NA> <NA> TRTA
#> 2 <NA> <NA> TRTA
#> 3 <NA> <NA> TRTA
#> 4 <NA> <NA> TRTA
#> 5 <NA> <NA> TRTA
#> 6 <NA> <NA> TRTA
#> 7 <NA> <NA> TRTA
#> 8 <NA> <NA> TRTA
#> 9 <NA> <NA> TRTA
#> 10 TRTA Placebo <NA> ..ard_hierarchical_overall..
#> variable_level stat_name stat_label stat stat_fmt
#> 1 Placebo n n 86 86
#> 2 Placebo N N 254 254
#> 3 Placebo p % 0.339 33.9
#> 4 Xanomeli… n n 84 84
#> 5 Xanomeli… N N 254 254
#> 6 Xanomeli… p % 0.331 33.1
#> 7 Xanomeli… n n 84 84
#> 8 Xanomeli… N N 254 254
#> 9 Xanomeli… p % 0.331 33.1
#> 10 TRUE n n 16 16
#> ℹ 72 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 5 more variables: context, fmt_fn, warning, error, gts_column
Other summaries
Other summary functions available include
-
tbl_cross()
for cross tabulations -
tbl_continuous()
for summaries of continuous variables stratified by two other categorical variables -
tbl_wide_summary()
for statistics represented in a wide table format, that is statistics in separate columns -
tbl_survfit()
for survival endpoint summaries -
tbl_regression()
for regression model summaries -
tbl_likert()
for Likert-scale summaries
ARD-first summary tables
While the above examples are simple, there are cases when we must use
a two step process of building our ARD, then converting the ARD to a
summary table. Two common instances where one would want to create a
table from an ARD are 1. for tables that include more complex
statistical results, 2. for re-use purposes (e.g. extract an ARD from a
previously built table, and modify it for another purpose). For this
ARD-first approach, {gtsummary} has tbl_ard_*()
functions
to generate summary tables.
-
tbl_ard_summary()
for ARDs with descriptive statistics for continuous, categorical and dichotomous variables -
tbl_ard_continuous()
for ARDs summarizing continuous variables -
tbl_ard_wide_summary()
for ARD statistics represented in a wide table format - in separate columns -
tbl_ard_hierarchical()
for ARDs containing nested or hierarchical data structures
Demographics Summary
In this example, we will build a simple demographics and baseline characteristics table as outlined in the FDA Standard Safety Tables Guidelines. This table has variables: a continuous variable summary for AGE, a categorical variable summaries for AGEGR1 and SEX.
Data ➡ ARD
The {cards} package can be utilized to create the ARD from a data
frame. The package includes functions ard_continuous()
for
continuous summaries, ard_categorical()
for categorical
summaries, and ard_dichotomous()
for dichotomous variables
(and more).
The package also exports a helper function, ard_stack()
to simultaneously build these summaries along with optional ancillary
results for a nicer display.
ard_demo <-
ADSL |>
dplyr::mutate(AGEGR1 = factor(AGEGR1, levels = c("<65", "65-80", ">80"))) |>
ard_stack(
# stratify all results by ARM
.by = ARM,
# these are the results that will be calculated
ard_continuous(variables = "AGE"),
ard_categorical(variables = c("AGEGR1","SEX")),
# optional arguments for additional results
.attributes = TRUE,
.total_n = TRUE,
.overall = TRUE
)
ard_demo
#> {cards} data frame: 111 x 11
#> group1 group1_level variable variable_level stat_name stat_label stat
#> 1 ARM Placebo AGE N N 86
#> 2 ARM Placebo AGE mean Mean 75.209
#> 3 ARM Placebo AGE sd SD 8.59
#> 4 ARM Placebo AGE median Median 76
#> 5 ARM Placebo AGE p25 Q1 69
#> 6 ARM Placebo AGE p75 Q3 82
#> 7 ARM Placebo AGE min Min 52
#> 8 ARM Placebo AGE max Max 89
#> 9 ARM Placebo AGEGR1 <65 n n 14
#> 10 ARM Placebo AGEGR1 <65 N N 86
#> ℹ 101 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 4 more variables: context, fmt_fn, warning, error
The optional arguments that can be specified to improve the
appearance of the table. - .attributes
summary table will
utilize the column label attributes, if available -
.total_n
the total N is saved internally, and will be used
in the printed table. - .overall
the operations will be
repeated without .by
variable - .missing
when
missing results are included, users can include missing counts or rates
for the variables.
ARD ➡ Table
After the ARD has been created, we can now create the summary table
with tbl_ard_summary()
.
ard_demo |>
tbl_ard_summary(
by = ARM,
overall = TRUE,
type = AGE ~ "continuous2",
statistic = all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}"),
label = list(AGEGR1 = "Age Group")
) |>
add_stat_label() |>
modify_header(all_stat_cols() ~ "**{level}** \nN= {n}")
Characteristic | Overall N= 254 |
Placebo N= 86 |
Xanomeline High Dose N= 84 |
Xanomeline Low Dose N= 84 |
---|---|---|---|---|
Age | ||||
Mean (SD) | 75.1 (8.2) | 75.2 (8.6) | 74.4 (7.9) | 75.7 (8.3) |
Median (Q1, Q3) | 77.0 (70.0, 81.0) | 76.0 (69.0, 82.0) | 76.0 (70.5, 80.0) | 77.5 (71.0, 82.0) |
Min, Max | 51.0, 89.0 | 52.0, 89.0 | 56.0, 88.0 | 51.0, 88.0 |
Age Group, n (%) | ||||
<65 | 33 (13.0%) | 14 (16.3%) | 11 (13.1%) | 8 (9.5%) |
65-80 | 144 (56.7%) | 42 (48.8%) | 55 (65.5%) | 47 (56.0%) |
>80 | 77 (30.3%) | 30 (34.9%) | 18 (21.4%) | 29 (34.5%) |
Sex, n (%) | ||||
F | 143 (56.3%) | 53 (61.6%) | 40 (47.6%) | 50 (59.5%) |
M | 111 (43.7%) | 33 (38.4%) | 44 (52.4%) | 34 (40.5%) |
Complex Summaries
The ARD to Table pipeline is most convenient when trying to consolidate multiple analysis steps into an ARD to feed only the relevant stats to the table building machinery. In the example below, we create a table that mixing three types of analysis for assessing outcomes after treatment: Kaplan-Meier estimate of survival, mean marker levels with confidence intervals, and rate of tumor response with confidence intervals.
First, we will create an ARD for each of these analyses, then combine
them with cards::bind_ard()
.
# ARD with the Kaplan-Meier survival estimates
ard_survival <-
trial |>
cardx::ard_survival_survfit(
y = survival::Surv(ttdeath, death),
variables = "trt",
times = c(12, 24)
) |>
# retain survival time statistics
dplyr::filter(variable == "time") |>
update_ard_fmt_fn(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fn = "xx%")
# ARD with the mean post-treatment marker level with 95%CI
ard_marker_level <-
cardx::ard_stats_t_test_onesample(trial, variables = marker, by = trt) |>
update_ard_fmt_fn(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fn = label_style_sigfig(digits = 2))
# ARD with the post-treatment response rate with 95%CI
ard_tumor_response <-
cardx::ard_categorical_ci(trial, by = trt, variables = response, method = "wilson") |>
update_ard_fmt_fn(stat_names = c("estimate", "conf.low", "conf.high"), fmt_fn = "xx%")
# combine all the ARDs into a single ARD for the outcomes
ard_outcomes <-
cards::bind_ard(
ard_survival,
ard_marker_level,
ard_tumor_response
)
If you inspect the ARDs, you’ll see that these analytic results have
a similar structure to the simple ARDs we extracted from the
tbl_summary()
results above.
- The
cardx::ard_survival_survfit()
ARD looks like theard_categorical()
result. - The
cardx::ard_stats_t_test_onesample()
ARD looks like theard_continuous()
result. - The
cardx::ard_categorical_ci()
ARD looks like theard_dichotomous()
result.
With the created ARD, we can now build a summary table.
ard_outcomes |>
tbl_ard_summary(
by = trt,
type = response ~ "dichotomous",
statistic =
list(
c(time, response) ~ "{estimate}% (95% CI {conf.low}%, {conf.high}%)",
marker ~ "{estimate} (95% CI {conf.low}, {conf.high})"
),
label =
list(time = "Overal Survival, months",
marker = "Tumor Marker",
response = "Tumor Response")
) |>
remove_footnote_header(columns = everything()) |>
modify_abbreviation(abbreviation = "CI = Confidence Interval") |>
modify_footnote_body(
footnote = "Kaplan-Meier estimate",
columns = label,
rows = variable == "time" & row_type == "label"
) |>
modify_footnote_body("t-distribution based mean and CI", columns = "label", rows = variable == "marker") |>
modify_footnote_body("Wilson CI", columns = "label", rows = variable == "response")
Characteristic | Drug A | Drug B |
---|---|---|
Overal Survival, months1 | ||
12 | 91% (95% CI 85%, 97%) | 86% (95% CI 80%, 93%) |
24 | 47% (95% CI 38%, 58%) | 41% (95% CI 33%, 52%) |
Tumor Marker2 | 1.0 (95% CI 0.83, 1.2) | 0.82 (95% CI 0.65, 0.99) |
Tumor Response3 | 29% (95% CI 21%, 39%) | 34% (95% CI 25%, 43%) |
Abbreviation: CI = Confidence Interval | ||
1 Kaplan-Meier estimate | ||
2 t-distribution based mean and CI | ||
3 Wilson CI |
Final Thoughts
When creating the a custom summary table, you will want to utilize
the functions with the tbl_ard_*()
prefix. It will be
important to familiarize yourself with the table structures that each of
these functions produce, so you know which to use to build your
table.
If your table is a combination or mix of table types structures, you
can build each part of your table separately and use
tbl_stack()
and tbl_merge()
to cobble together
your final table.
Finally, some tables are entirely unique and would be difficult to
create under any framework. In these cases, it’s often much easier to
build a data frame and then convert it to a gtsummary table with
as_gtsummary()
. Once converted, you can take advantage of
styling that is available for all gtsummary tables.