Release Notes for gtsummary v2.0

Author

Daniel D. Sjoberg

Published

July 23, 2024

We are excited to announce the release of gtsummary 2.0. The new release is a near re-write of every line of code in the package. The re-write brings consistency to the internals of the package and efficiency gains as well.

The goal of this post is to let you know about the new release and discuss some of the big changes, and encourage you to try it out.

pak::pak("gtsummary")

What’s Changed?

Visit the the website for a full listing of changes. This release includes significant internal changes, and fewer user-facing changes. However, the internal updates we’ve made pave the way for future updates, and I am excited about the year ahead for the package.

Structure

The major change in this release of the package is the structure of how results are calculated and stored—not visible to the vast majority of users. The updated format provides a unified structure to all gtsummary functions, which makes it much simpler, and trivial in some cases, to build on top of the existing functions or add new functions.

For those interested, results are now stored in a format compatible with the emerging CDISC Analysis Results Standard. From the standard, we are using the Analysis Results Datasets (ARDs).

A number of package dependencies have been removed in this release, and calculations that require outside packages have been consolidated in a new package called {cardx}. The {cardx} package is a suggested dependency of the package.

Other Changes

First, let’s load the package.

library(gtsummary)
packageVersion("gtsummary")

[1] '2.0.0'

Transposed Tables

Users often requested that the results of tbl_summary() be ‘transposed’ with the results across the columns. That is now possible with the new function tbl_wide_summary() (which, by the way, was a super simple addition due to the new structure).

tbl_wide_summary(
  trial, 
  include = c(response, grade),
  statistic = c("{n}", "{p}")
)

Characteristic	n	%
Tumor Response	61	32
Grade
I	68	34
II	68	34
III	64	32

tbl_wide_summary(
  trial, 
  include = c(age, marker),
  statistic = c("{median}", "{p25}, {p75}")
)

Characteristic	Median	Q1, Q3
Age	47	38, 57
Marker Level (ng/mL)	0.64	0.22, 1.41

Default Printer

The {gt} package is now the default printer for all Quarto and R markdown output formats. Previously, when printing a gtsummary table in a Quarto or R markdown document, we would detect the output format and convert to gt, flextable, or kable to provide the best-looking table. The {gt} package has matured and provides lovely tables for nearly all output types, and we have now made {gt} the default table drawing tool for all gtsummary tables. The other output types are still supported.

Dichotomous summaries in tbl_summary()

Previously, in a tbl_summary() variables that were c(0, 1), c("no", "yes"), c("No", "Yes"), and c("NO", "YES") would default to a dichotomous summary with the 1 and yes level being shown in the table. This would occur even in the case when, for example, only 0 was observed. In this release, the line shown for dichotomous variables must be observed OR the unobserved level must be either explicitly defined in a factor or be a logical vector. This means that a character vector of all "yes" or all "no" values will default to a categorical summary instead of dichotomous.

Easier to specify changes to a single statistic

Previously, if I wanted a single statistic to be reported to additional levels of precision, I would need to specify the precision of every summary statistic for a variable. Now, we can simple update the one statistic we’re interested in.

tbl_summary(
  trial, 
  include = age,
  type = all_continuous() ~ "continuous2",
  statistic = all_continuous() ~ c("{mean} ({sd})",
                                   "{median} ({p25}, {p75})",
                                   "{min}, {max}"),
  digits = age ~ list(sd = 2), # show the SD to 2 decimal places
  missing = "no"
)

Characteristic	N = 200
Age
Mean (SD)	47 (14.31)
Median (Q1, Q3)	47 (38, 57)
Min, Max	6, 83

Quantile Calculations

In tbl_summary(), continuous summaries can be summarized with quantiles using shortcuts '{p##}', e.g. '{p25}, {p75}' for the interquartile range. Previously, these quantiles were calculated with quantile(x, type = 7) and we now use type = 2. The results are very similar; the type 2 method matches SAS’s calculation and now I don’t need to hear complaints the results do not match. 🤷

In tbl_svysummary(), we are now using the re-factored survey::svyquantile() introduced in {survey} v4.1.