vignettes/tbl_regression.Rmd
tbl_regression.Rmd
The tbl_regression()
function takes a regression model object in R and returns a formatted table of regression model results that is publicationready. It is a simple way to summarize and present your analysis results using R! Like tbl_summary()
, tbl_regression()
creates highly customizable analytic tables with sensible defaults.
This vignette will walk a reader through the tbl_regression()
function, and the various functions available to modify and make additions to an existing formatted regression table.
Behind the scenes: tbl_regression()
uses broom::tidy()
to perform the initial model formatting, and can accommodate many different model types (e.g. lm()
, glm()
, survival::coxph()
, survival::survreg()
and more are vetted tidy models that are known to work with our package). It is also possible to specify your own function to tidy the model results if needed.
To start, a quick note on the {magrittr} package’s pipe function, %>%
. By default the pipe operator puts whatever is on the left hand side of %>%
into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to tbl_regression()
easier to use, but it is not required. Here are a few examples of how %>%
translates into typical R notation.
x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)
In this vignette we’ll be using the trial
data set which is included in the {gtsummary package}.
This data set contains information from 200 patients who received one of two types of chemotherapy (Drug A or Drug B).
The outcomes are tumor response and death.
Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") == "Chemotherapy Treatment")
with the labelled package, which we highly recommend using. These labels are displayed in the {gtsummary} output table by default. Using {gtsummary} on a data frame without labels will simply print variable names, or there is an option to add labels later.
trt Chemotherapy Treatment
age Age, yrs
marker Marker Level, ng/mL
stage T Stage
grade Grade
response Tumor Response
death Patient Died
ttdeath Years from Treatment to Death/Censor
The default output from tbl_regression()
is meant to be publication ready.
trial
data set.# build logistic regression model m1 = glm(response ~ age + stage + grade, trial, family = binomial(link = "logit")) # view raw model results summary(m1)$coefficients #> Estimate Std. Error z value Pr(>z) #> (Intercept) 1.42184501 0.65711995 2.1637526 0.03048334 #> age 0.01935700 0.01149333 1.6841945 0.09214409 #> stageT2 0.56765609 0.44328677 1.2805618 0.20034764 #> stageT3 0.09619949 0.45702787 0.2104893 0.83328578 #> stageT4 0.26797315 0.45364355 0.5907130 0.55471272 #> gradeII 0.17315419 0.40255106 0.4301422 0.66709221 #> gradeIII 0.04434059 0.38892269 0.1140087 0.90923087
# format results tbl_regression(m1, exponentiate = TRUE)
Characteristic  OR^{1}  95% CI^{1}  pvalue 

Age, yrs  1.02  1.00, 1.04  0.092 
T Stage  
T1  —  —  
T2  0.57  0.23, 1.34  0.2 
T3  0.91  0.37, 2.22  0.8 
T4  0.76  0.31, 1.85  0.6 
Grade  
I  —  —  
II  0.84  0.38, 1.85  0.7 
III  1.05  0.49, 2.25  >0.9 
^{1
}
OR = Odds Ratio, CI = Confidence Interval

Note the sensible defaults with this basic usage (that can be customized later):
The model was recognized as logistic regression with coefficients exponentiated, so the header displayed “OR” for odds ratio.
Variable types are automatically detected and reference rows are created for categorical variables.
Model estimates and confidence intervals are rounded and nicely formatted.
Pvalues above 0.9 are presented as “>0.9” and below 0.001 are presented as “<0.001”. Nonsignificant pvalues are only rounded to one decimal, while those close to or below the significance threshold (default 0.05) have additional decimal places by default.
Because the variables in the data set were labelled, the labels were carried through into the {gtsummary} output table. Had the data not been labelled, the default is to display the variable name.
Variable levels are indented and footnotes are added if printed using {gt}. (can alternatively be printed using knitr::kable()
; see options here)
There are four primary ways to customize the output of the regression model table.
tbl_regression()
function input argumentsadd_*()
functionsThe tbl_regression()
function includes many input options for modifying the appearance.
label modify the variable labels printed in the table.
exponentiate exponentiate model coefficients.
include names of variables to include in output. Default is all variables.
show_single_row By default, categorical variables are printed on multiple rows.
If a variable is dichotomous (e.g. Yes/No) and you wish to print
the regression coefficient on a single row, include the variable name(s) here.
conf.level confidence level of confidence interval.
intercept logical argument indicates whether to include the intercept in output.
estimate_fun function to round and format coefficient estimates.
pvalue_fun function to round and format pvalues.
tidy_fun function to specify/customize tidier function
The {gtsummary} package has builtin functions for adding to results from tbl_regression()
. The following functions add columns and/or information to the regression table.
add_global_p() adds the global pvalue for a categorical variables
add_nevent() adds the number of observed events to the results object
The {gtsummary} package comes with functions specifically made to modify and format summary tables.
bold_labels() bold variable labels
bold_levels() bold variable levels
italicize_labels() italicize variable labels
italicize_levels() italicize variable levels
bold_p() bold significant pvalues
The {gt} package is packed with many great functions for modifying table output—too many to list here. Review the package’s website for a full listing. https://gt.rstudio.com/index.html
To use the {gt} package functions with {gtsummary} tables, the regression table must first be converted into a {gt} object. To this end, use the as_gt()
function after modifications have been completed with {gtsummary} functions.
m1 %>%
tbl_regression(exponentiate = TRUE) %>%
as_gt() %>%
<gt functions>
There are formatting options available, such as adding bold and italics to text. In the example below,
 Variable labels are bold
 Levels of categorical levels are italicized
 Global pvalues for T Stage and Grade are reported  Pvalues less than 0.10 are bold  Large pvalues are rounded to two decimal places
 Coefficients are exponentiated to give odds ratios
 Odds ratios are rounded to 2 or 3 significant figures
# format results into data frame with global pvalues m1 %>% tbl_regression( exponentiate = TRUE, pvalue_fun = function(x) style_pvalue(x, digits = 2), estimate_fun = function(x) style_ratio(x, digits = 3) ) %>% add_global_p() %>% bold_p(t = 0.10) %>% bold_labels() %>% italicize_levels()
Characteristic  OR^{1}  95% CI^{1}  pvalue 

Age, yrs  1.020  0.997, 1.043  0.092 
T Stage  0.60  
T1  —  —  
T2  0.567  0.234, 1.342  
T3  0.908  0.367, 2.220  
T4  0.765  0.310, 1.854  
Grade  0.85  
I  —  —  
II  0.841  0.379, 1.849  
III  1.045  0.486, 2.246  
^{1
}
OR = Odds Ratio, CI = Confidence Interval

When you print the output from the tbl_regression()
function into the R console or into an R markdown, there are default printing functions that are called in the background: print.tbl_regression()
and knit_print.tbl_regression()
. The true output from tbl_regression()
is a named list, but when you print the object, a formatted version of .$table_body
is displayed. All formatting and modifications are made using the {gt} package by default.
tbl_regression(m1) %>% names() #> [1] "table_body" "table_header" "n" "model_obj" "inputs" #> [6] "call_list"
These are the additional data stored in the tbl_regression()
output list.
table_body data frame with summary statistics
n N included in model
model_obj the model object passed to `tbl_regression`
call_list named list of each function called on the `tbl_regression` object
inputs inputs from the `tbl_regression()` function call
When a {gtsummary} object is printed, it is first converted to a {gt} object with as_gt()
via a sequence of {gt} commands executed on x$table_body
. Here’s an example of the first few calls saved with tbl_rregression()
:
tbl_regression(m1) %>% as_gt(return_calls = TRUE) %>% head(n = 3) #> $gt #> gt::gt(data = x$table_body) #> #> $fmt_missing #> gt::fmt_missing(columns = gt::everything(), missing_text = "") #> #> $fmt_missing_emdash #> $fmt_missing_emdash[[1]] #> gt::fmt_missing(columns = gt::vars(estimate), rows = row_ref == #> TRUE, missing_text = "") #> #> $fmt_missing_emdash[[2]] #> gt::fmt_missing(columns = gt::vars(ci), rows = row_ref == TRUE, #> missing_text = "")
The {gt} functions are called in the order they appear, always beginning with the gt::gt()
function.
If the user does not want a specific {gt} function to run, any {gt} call can be included or excluded in the as_gt()
function. In this example, the default footnote will be excluded from the output.
tbl_regression(m1, exponentiate = TRUE) %>% as_gt(include = tab_footnote)
Characteristic  OR  95% CI  pvalue 

Age, yrs  1.02  1.00, 1.04  0.092 
T Stage  
T1  —  —  
T2  0.57  0.23, 1.34  0.2 
T3  0.91  0.37, 2.22  0.8 
T4  0.76  0.31, 1.85  0.6 
Grade  
I  —  —  
II  0.84  0.38, 1.85  0.7 
III  1.05  0.49, 2.25  >0.9 
The tbl_uvregression()
produces a table of univariate regression results. The function is a wrapper for tbl_regression()
, and as a result, accepts nearly identical function arguments. The function’s results can be modified in similar ways to tbl_regression()
and the results reported inline similarly to tbl_regression()
.
trial %>% select(death, ttdeath, stage) %>% tbl_uvregression( method = glm, y = response, method.args = list(family = binomial), exponentiate = TRUE, pvalue_fun = function(x) style_pvalue(x, digits = 2) ) %>% # overrides the default that shows pvalues for each level add_global_p() %>% # adjusts global pvalues for multiple testing (default method: FDR) add_q() %>% # bold pvalues under a given threshold (default 0.05) bold_p() %>% # now bold qvalues under the threshold of 0.10 bold_p(t = 0.10, q = TRUE) %>% bold_labels() #> Global pvalues calculated with #> `car::Anova(mod = x$model_obj, type = "III")` #> Adjusting pvalues with #> `stats::p.adjust(x$table_body$p.value, method = "fdr")`
Characteristic  N  OR^{1}  95% CI^{1}  pvalue  qvalue^{2} 

Chemotherapy Treatment  193  0.5  0.7  
Drug A  —  —  
Drug B  1.21  0.66, 2.24  
Age, yrs  183  1.02  1.00, 1.04  0.091  0.2 
Marker Level, ng/mL  183  1.35  0.94, 1.93  0.10  0.2 
Grade  193  >0.9  >0.9  
I  —  —  
II  0.95  0.45, 2.00  
III  1.10  0.52, 2.29  
^{1
}
OR = Odds Ratio, CI = Confidence Interval
^{2
}
False discovery rate correction for multiple testing

The {gtsummary} regression functions and their related functions have sensible defaults for rounding and formatting results. If you, however, would like to change the defaults there are a few options. The default options can be changed using the {gtsummary} themes function set_gtsummary_theme()
. The package includes prespecified themes, and you can also create your own. Themes can control baseline behavior, for example, how pvalues are rounded, coefficients are rounded, default headers, confidence levels, etc. For details on creating a theme and setting personal defaults, visit the themes vignette.