This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.

Introduction

Every {gtsummary} table has a few characteristics common among all tables created with the package. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.

library(gtsummary)

tbl_regression_ex <-
  lm(age ~ grade + marker, trial) %>%
  tbl_regression() %>%
  bold_p(t = 0.5) 

tbl_summary_ex <-
  trial %>%
  select(trt, age, grade, response) %>%
  tbl_summary(by = trt)

Structure of a {gtsummary} object

Every {gtsummary} object is a list comprising of, at minimum, these elements:

.$table_body    .$table_styling         

table_body

The .$table_body object is the data frame that will ultimately be printed as the output. The table must include columns "label", "row_type", and "variable". The "label" column is printed, and the other two are hidden from the final output.

tbl_summary_ex$table_body
#> # A tibble: 8 x 7
#>   variable var_type    var_label     row_type label        stat_1     stat_2    
#>   <chr>    <chr>       <chr>         <chr>    <chr>        <chr>      <chr>     
#> 1 age      continuous  Age           label    Age          46 (37, 5~ 48 (39, 5~
#> 2 age      continuous  Age           missing  Unknown      7          4         
#> 3 grade    categorical Grade         label    Grade        <NA>       <NA>      
#> 4 grade    categorical Grade         level    I            35 (36%)   33 (32%)  
#> 5 grade    categorical Grade         level    II           32 (33%)   36 (35%)  
#> 6 grade    categorical Grade         level    III          31 (32%)   33 (32%)  
#> 7 response dichotomous Tumor Respon~ label    Tumor Respo~ 28 (29%)   33 (34%)  
#> 8 response dichotomous Tumor Respon~ missing  Unknown      3          4

table_styling

The .$table_styling object is a list of data frames containing information about how .$table_body is printed, formatted, and styled.
The list contains the following data frames header, footnote, footnote_abbrev, fmt_fun, text_format, fmt_missing, cols_merge and the following objects source_note, caption, horizontal_line_above.

header

The header table has the following columns and is one row per column found in .$table_body. The table contains styling information that applies to entire column or the columns headers.

Column Description
column Column name from .$table_body
hide Logical indicating whether the column is hidden in the output
align Specifies the alignment/justification of the column, e.g. ‘center’ or ‘left’
label Label that will be displayed (if column is displayed in output)
interpret_label the {gt} function that is used to interpret the column label, gt::md() or gt::html()
spanning_header Includes text printed above columns as spanning headers.
interpret_spanning_header the {gt} function that is used to interpret the column spanning headers, gt::md() or gt::html()

footnote & footnote_abbrev

Each {gtsummary} table may contain a single footnote per header and cell within the table. Footnotes and footnote abbreviations are handled separately. Updates/changes to footnote are appended to the bottom of the tibble. A footnote of NA_character_ deletes an existing footnote.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body, NA indicates to add footnote to header
footnote string containing footnote to add to column/row

fmt_fun

Numeric columns/rows are styled with the functions stored in fmt_fun. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
fmt_fun list of formatting/styling functions

text_format

Columns/rows are styled with bold, italic, or indenting stored in text_format. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
format_type one of c('bold', 'italic', 'indent')
undo_text_format logical indicating where the formatting indicated should be undone/removed.

fmt_missing

By default, all NA values are shown blanks. Missing values in columns/rows are replaced with the symbol. For example, reference rows in tbl_regression() are shown with an em-dash. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
symbol string to replace missing values with, e.g. an em-dash

cols_merge

This object is experimental and may change in the future. This tibble gives instructions for merging columns into a single column. The implementation in as_gt() will be updated after gt::cols_label() gains a rows= argument.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
pattern glue pattern directing how to combine/merge columns. The merged columns will replace the column indicated in ‘column’.

source_note

String that is made a table source note. The attribute "text_interpret" is either c("md", "html").

caption

String that is made into the table caption. The attribute "text_interpret" is either c("md", "html").

horizontal_line_above

Expression identifying a row where a horizontal line is placed above in the table.

Example from tbl_regression()

tbl_regression_ex$table_styling
#> $header
#> # A tibble: 24 x 7
#>    column   hide  align interpret_label label  interpret_spanni~ spanning_header
#>    <chr>    <lgl> <chr> <chr>           <chr>  <chr>             <chr>          
#>  1 variable TRUE  cent~ gt::md          varia~ gt::md            <NA>           
#>  2 var_lab~ TRUE  cent~ gt::md          var_l~ gt::md            <NA>           
#>  3 var_type TRUE  cent~ gt::md          var_t~ gt::md            <NA>           
#>  4 referen~ TRUE  cent~ gt::md          refer~ gt::md            <NA>           
#>  5 row_type TRUE  cent~ gt::md          row_t~ gt::md            <NA>           
#>  6 header_~ TRUE  cent~ gt::md          heade~ gt::md            <NA>           
#>  7 N_obs    TRUE  cent~ gt::md          N_obs  gt::md            <NA>           
#>  8 N        TRUE  cent~ gt::md          **N**  gt::md            <NA>           
#>  9 coeffic~ TRUE  cent~ gt::md          coeff~ gt::md            <NA>           
#> 10 coeffic~ TRUE  cent~ gt::md          coeff~ gt::md            <NA>           
#> # ... with 14 more rows
#> 
#> $footnote
#> # A tibble: 0 x 4
#> # ... with 4 variables: column <chr>, rows <list>, text_interpret <chr>,
#> #   footnote <chr>
#> 
#> $footnote_abbrev
#> # A tibble: 2 x 4
#>   column    rows      text_interpret footnote                
#>   <chr>     <list>    <chr>          <chr>                   
#> 1 ci        <quosure> gt::md         CI = Confidence Interval
#> 2 std.error <quosure> gt::md         SE = Standard Error     
#> 
#> $text_format
#> # A tibble: 2 x 4
#>   column  rows       format_type undo_text_format
#>   <chr>   <list>     <chr>       <lgl>           
#> 1 label   <language> indent      FALSE           
#> 2 p.value <quosure>  bold        FALSE           
#> 
#> $fmt_missing
#> # A tibble: 4 x 3
#>   column    rows      symbol
#>   <chr>     <list>    <chr> 
#> 1 estimate  <quosure> —     
#> 2 ci        <quosure> —     
#> 3 std.error <quosure> —     
#> 4 statistic <quosure> —     
#> 
#> $fmt_fun
#> # A tibble: 10 x 3
#>    column      rows      fmt_fun   
#>    <chr>       <list>    <list>    
#>  1 estimate    <quosure> <fn>      
#>  2 N           <quosure> <fn>      
#>  3 N_obs       <quosure> <fn>      
#>  4 n_obs       <quosure> <fn>      
#>  5 conf.low    <quosure> <fn>      
#>  6 conf.high   <quosure> <fn>      
#>  7 p.value     <quosure> <fn>      
#>  8 std.error   <quosure> <prrr_fn_>
#>  9 statistic   <quosure> <prrr_fn_>
#> 10 var_nlevels <quosure> <prrr_fn_>
#> 
#> $cols_merge
#> # A tibble: 0 x 3
#> # ... with 3 variables: column <chr>, rows <list>, pattern <chr>

Constructing a {gtsummary} object

table_body

When constructing a {gtsummary} object, the author will begin with the .$table_body object. Recall the .$table_body data frame must include columns "label", "row_type", and "variable". Of these columns, only the "label" column will be printed with the final results. The "row_type" column typically will control whether or not the label column is indented. The "variable" column is often used in the inline_text() family of functions, and merging {gtsummary} tables with tbl_merge().

tbl_regression_ex %>%
  purrr::pluck("table_body") %>%
  select(variable, row_type, label)
#> # A tibble: 5 x 3
#>   variable row_type label               
#>   <chr>    <chr>    <chr>               
#> 1 grade    label    Grade               
#> 2 grade    level    I                   
#> 3 grade    level    II                  
#> 4 grade    level    III                 
#> 5 marker   label    Marker Level (ng/mL)

The other columns in .$table_body are created by the user and are likely printed in the output. Formatting and printing instructions for these columns is stored in .$table_styling.

table_styling

There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header data frame.

  1. .create_gtsummary_object(table_body) After a user creates a table_body, pass it to this function and the skeleton of a gtsummary object is created and returned (including the full table_styling list of tables).

  2. .update_table_styling() After columns are added or removed from table_body, run this function to update .$table_styling to include or remove styling instructions for the columns. FYI the default styling for each new column is to hide it.

  3. modify_table_styling() This exported function modifies the printing instructions for a single column or groups of columns.

  4. modify_table_body() This exported function helps users make changes to .$table_body. The function runs .update_table_styling() internally to maintain internal validity with the printing instructions.

Printing a {gtsummary} object

All {gtsummary} objects are printed with print.gtsummary(). Before a {gtsummary} object is printed, it is converted to a {gt} object using as_gt(). This function takes the {gtsummary} object as its input, and uses the information in .$table_styling to construct a list of {gt} calls that will be executed on .$table_body. After the {gtsummary} object is converted to {gt}, it is then printed as any other {gt} object.

In some cases, the package defaults to printing with other engines, such as flextable (as_flex_table()), huxtable (as_hux_table()), kableExtra (as_kable_extra()), and kable (as_kable()). The default print engine is set with the theme element "pkgwide-str:print_engine"

While the actual print function is slightly more involved, it is basically this:

print.gtsummary <- function(x) {
  get_theme_element("pkgwide-str:print_engine") %>%
    switch(
      "gt" = as_gt(x),
      "flextable" = as_flex_table(x),
      "huxtable" = as_hux_table(x),
      "kable_extra" = as_kable_extra(x),
      "kable" = as_kable(x)
    ) %>%
    print()
}