Introduction

This vignette will walk a reader through the fmt_table1() function, and the various functions available to modify and make additions to an existing Table 1.

To start, a quick note on the magrittr package’s pipe function, %>%. By default the pipe operator puts whatever is on the left hand side of %>% into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to fmt_table1() easier to use, but it is not required. Here are a few examples of how %>% translates into typical R notation.

x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)

Here’s how this translates into the use of fmt_table1().

mtcars %>% fmt_table1() is equivalent to fmt_table1(mtcars)
mtcars %>% fmt_table1(by = "am") is equivalent to fmt_table1(mtcars, by = "am")
fmt_table1(mtcars, by = "am") %>% add_comparison() is equivalent to
    t = fmt_table1(mtcars, by = "am")
    add_comparison(t)

Basic Usage

We’ll be using the trial data set throughout this example. This set contains data from 200 patients randomized to a new adjuvant therapy or placebo. The outcome is a binary tumor response. Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") = "Treatment Randomization"). These labels are displayed in the output table by default. A data frame without labels will print variable names.

trt      Treatment Randomization
age      Age, yrs
marker   Marker Level, ng/mL
stage    T Stage
grade    Grade
response Tumor Response
library(dplyr)
library(knitr)
library(kableExtra)
library(gtsummary)

# printing trial data
head(trial) %>% kable()
trt age marker stage grade response
Drug 23 0.160 T3 I 1
Drug 9 1.107 T4 III 1
Drug 31 0.277 T1 I 1
Placebo 46 2.067 T4 II 1
Drug 51 2.767 T2 II 0
Drug 39 0.613 T1 III 1

The default output from fmt_table1() is meant to be publication ready. Let’s start by creating a descriptive statistics table from the trial data set built into the gtsummary package. The fmt_table1() can take, minimally, a data set as the only input, and return descriptive statistics for each column in the data frame.

For brevity, keeping a subset of the variables in the trial data set.

Variable N = 200
Treatment Randomization
Drug 107 (54%)
Placebo 93 (46%)
Marker Level, ng/mL 0.68 (0.22, 1.42)
Unknown 8
T Stage
T1 51 (26%)
T2 49 (24%)
T3 42 (21%)
T4 58 (29%)

If your output does not appear in a formatted table, it is likely due to a known issue in the knitr::kable() function. One way around the issue to to add styling from the kableExtra package.
fmt_table1(trial2) %>% as_tibble() %>% knitr::kable() %>% kableExtra::kable_styling()

This is a great table, but for trial data the summary statistics should be split by randomization group. While reporting p-values for a randomized trial isn’t recommended, we’ll do it here as an illustration. To compare two or more groups, include add_comparison() to the function call.

fmt_table1(trial2, by = "trt") %>% add_comparison()
Variable Drug Placebo p-value
N = 107 N = 93
Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63) 0.4
Unknown 4 4
T Stage 0.13
T1 25 (23%) 26 (28%)
T2 26 (24%) 23 (25%)
T3 29 (27%) 13 (14%)
T4 27 (25%) 31 (33%)

Customize Table 1 Output

It’s also possible to add information to fmt_table1() output. The code below calculates the standard table with summary statistics split by treatment randomization with the following modifications

  • Report ‘mean (SD)’ and ‘n / N (%)’
  • Use t-test instead of Wilcoxon rank-sum
  • Do not add row for number of missing observations
  • Round large p-values to two decimal place
  • Add column of q-values (p-values adjusted using FDR)
  • Add column reporting summary statistics for the cohort overall
  • Add column reporting N not missing for each variable
  • Add column with statistic labels
  • Modify header to include percentages in each group
  • Bold variable labels
  • Italicize variable levels
Variable Statistic N All Patients Drug Placebo p-value q-value
N = 200 (100%) N = 107 (54%) N = 93 (46%)
Pretreatment Marker Level, ng/mL Mean (SD) 192 0.93 (0.85) 0.90 (0.88) 0.97 (0.83) 0.58 0.58
Clinical T Stage 200 0.13 0.26
T1 n / N (%) 51 / 200 (26%) 25 / 107 (23%) 26 / 93 (28%)
T2 n / N (%) 49 / 200 (24%) 26 / 107 (24%) 23 / 93 (25%)
T3 n / N (%) 42 / 200 (21%) 29 / 107 (27%) 13 / 93 (14%)
T4 n / N (%) 58 / 200 (29%) 27 / 107 (25%) 31 / 93 (33%)

Each of the modification functions have additional options outlined in their respective help files.

Report Results Inline

Having a well formatted and reproducible table is a great! But we often need to report the results from a table in the text of an Rmarkdown report. Inline reporting has been made simple with inline_text().

Let’s first create a basic Table 1.

tab1 = fmt_table1(trial2, by = "trt")
tab1
Variable Drug Placebo
N = 107 N = 93
Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63)
Unknown 4 4
T Stage
T1 25 (23%) 26 (28%)
T2 26 (24%) 23 (25%)
T3 29 (27%) 13 (14%)
T4 27 (25%) 31 (33%)

To report the median (IQR) of the marker levels in each group, use the following commands inline.

The median (IQR) marker level in the drug and placebo groups are `r inline_text(tab1, cell = "marker:Drug")` and `r inline_text(tab1, cell = "marker:Placebo")`, respectively.

Here’s how the line will appear in your report.

The median (IQR) marker level in the drug and placebo groups are 0.61 (0.22, 1.20) and 0.72 (0.22, 1.63), respectively.

The cell argument indicates to inline_text() which statistic to display. Information regarding which statistic to display are separated by ":". The first term indicates the variable name and the last indicates the level of the by variable e.g. marker:Placebo would display the summary statistics for the variable marker among patients in the Placebo group. If you display a statistic from a categorical variable, include the desired level after the variable name, e.g. stage:T1:Drug.

`r inline_text(tab1, "stage:T1:Drug")` resolves to “25 (23%)”

gtsummary + kableExtra

Need a data frame for any reason (e.g. if you want to get extra fancy with kableExtra)? Use generic function as_tibble to extract an easy-to-use data frame from any fmt_table1 object.

If you want to customize anything with knitr::kable or kableExtra, you can use the above as_tibble along with the function indent_key which extracts the row numbers you want indented when knitting your table to HTML. (NOTE: Only load library(kableExtra) and use the below if knitting to HTML, this will not work with Word or PDF.) For more on customizing your tables with kableExtra check out the package’s vignette on HTML output.

Table 1: Summary of Patient and Clinical Variables
Treatment assignment
Variable Drug Placebo
N = 107 N = 93
Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63)
Unknown 4 4
T Stage
T1 25 (23%) 26 (28%)
T2 26 (24%) 23 (25%)
T3 29 (27%) 13 (14%)
T4 27 (25%) 31 (33%)
Note:
Isn’t this footnote so nice?
1 You can also add numbered or lettered footnotes
2 Which is great.

Under the Hood

When you print the output from the fmt_table1() function into the R console or into an Rmarkdown, there are default printing functions that are called in the background: print.fmt_table1() and knit_print.fmt_table1(). The true output from fmt_table1() is a named list, but when you print into the R console the interesting portions are displayed from the .$table1 data frame.

There is additional information stored in the fmt_table1() output list.

  • table1 data frame with summary statistics
  • meta_data data frame that is one row per variable, and contains information about each variable in the object
  • by the by = variable name from the function call
  • call the fmt_table1 function call
  • call_list named list of each function called for the fmt_table1 object. the above example would have two elements in the list: fmt_table1 and add_comparison.
  • inputs Inputs from the function call. Not only is the call stored, but the values of the inputs as well. For example, you can access the data frame passed to fmt_table1().

It is particularly useful to access .$meta_data to confirm which statistical tests were used to calculate the p-values in the table.

print.listof(t)
#> table1 :
#> # A tibble: 9 x 6
#>   .variable row_type label            stat_by1        stat_by2       pvalue
#>   <chr>     <chr>    <chr>            <chr>           <chr>          <chr> 
#> 1 <NA>      header2  Variable         Drug            Placebo        p-val~
#> 2 <NA>      header1  ""               N = 107         N = 93         ""    
#> 3 marker    label    Marker Level, n~ 0.61 (0.22, 1.~ 0.72 (0.22, 1~ 0.4   
#> 4 marker    missing  Unknown          4               4              <NA>  
#> 5 stage     label    T Stage          <NA>            <NA>           0.13  
#> 6 stage     level    T1               25 (23%)        26 (28%)       <NA>  
#> 7 stage     level    T2               26 (24%)        23 (25%)       <NA>  
#> 8 stage     level    T3               29 (27%)        13 (14%)       <NA>  
#> 9 stage     level    T4               27 (25%)        31 (33%)       <NA>  
#> 
#> by :
#> [1] "trt"
#> 
#> meta_data :
#> # A tibble: 2 x 10
#>   .variable .class .summary_type .dichotomous_va~ .var_label .stat_display
#>   <chr>     <chr>  <chr>         <list>           <chr>      <chr>        
#> 1 marker    numer~ continuous    <NULL>           Marker Le~ {median} ({q~
#> 2 stage     factor categorical   <NULL>           T Stage    {n} ({p}%)   
#> # ... with 4 more variables: .digits <dbl>, stat_test <chr>,
#> #   pvalue_exact <dbl>, pvalue <chr>
#> 
#> call :
#> fmt_table1(trial2, by = "trt")
#> 
#> inputs :
#> $data
#> # A tibble: 200 x 3
#>    trt     marker stage
#>    <chr>    <dbl> <fct>
#>  1 Drug     0.16  T3   
#>  2 Drug     1.11  T4   
#>  3 Drug     0.277 T1   
#>  4 Placebo  2.07  T4   
#>  5 Drug     2.77  T2   
#>  6 Drug     0.613 T1   
#>  7 Drug     0.354 T4   
#>  8 Drug     1.74  T4   
#>  9 Drug     0.144 T4   
#> 10 Placebo  0.205 T2   
#> # ... with 190 more rows
#> 
#> $by
#> [1] "trt"
#> 
#> $label
#> NULL
#> 
#> $type
#> NULL
#> 
#> $statistic
#> NULL
#> 
#> $digits
#> NULL
#> 
#> $id
#> NULL
#> 
#> $missing
#> [1] "ifany"
#> 
#> 
#> call_list :
#> $fmt_table1
#> fmt_table1(data = trial2, by = "trt")
#> 
#> $add_comparison
#> add_comparison(x = .)