vignettes/tbl_regression.Rmd
tbl_regression.Rmd
The tbl_regression()
function takes a regression model object in
R and returns a formatted table of regression
model results that is publicationready. It is a simple way to
summarize and present your analysis results using R!
Like tbl_summary()
,
tbl_regression()
creates highly customizable analytic
tables with sensible defaults.
This vignette will walk a reader through the
tbl_regression()
function, and the various functions
available to modify and make additions to an existing formatted
regression table.
Behind the scenes: tbl_regression()
uses
broom::tidy()
to perform the initial model formatting, and
can accommodate many different model types (e.g. lm()
,
glm()
, survival::coxph()
,
survival::survreg()
and other are vetted
models known to work with {gtsummary}). It is also possible to
specify your own function to tidy the model results if needed.
In this vignette we’ll be using the trial
data set which is included in the {gtsummary package}.
This data set contains information from 200 patients who received one of two types of chemotherapy (Drug A or Drug B).
The outcomes are tumor response and death.
Each variable in the data frame has been assigned an
attribute label
(i.e. attr(trial$trt, "label") == "Chemotherapy Treatment")
with the labelled
package, which we highly recommend using. These labels are displayed in
the {gtsummary} output table by default. Using {gtsummary} on a data
frame without labels will simply print variable names, or there is an
option to add labels later.
Variable  Class  Label 


character  Chemotherapy Treatment 

numeric  Age 

numeric  Marker Level (ng/mL) 

factor  T Stage 

factor  Grade 

integer  Tumor Response 

integer  Patient Died 

numeric  Months to Death/Censor 
Includes mix of continuous, dichotomous, and categorical variables 
The default output from tbl_regression()
is meant to be
publication ready.
trial
data set.# build logistic regression model
m1 < glm(response ~ age + stage, trial, family = binomial)
# view raw model results
summary(m1)$coefficients
#> Estimate Std. Error z value Pr(>z)
#> (Intercept) 1.48622424 0.62022844 2.3962530 0.01656365
#> age 0.01939109 0.01146813 1.6908683 0.09086195
#> stageT2 0.54142643 0.44000267 1.2305071 0.21850725
#> stageT3 0.05953479 0.45042027 0.1321761 0.89484501
#> stageT4 0.23108633 0.44822835 0.5155549 0.60616530
tbl_regression(m1, exponentiate = TRUE)
Characteristic  OR^{1}  95% CI^{1}  pvalue 

Age  1.02  1.00, 1.04  0.091 
T Stage  
T1  —  —  
T2  0.58  0.24, 1.37  0.2 
T3  0.94  0.39, 2.28  0.9 
T4  0.79  0.33, 1.90  0.6 
^{1} OR = Odds Ratio, CI = Confidence Interval 
Note the sensible defaults with this basic usage (that can be customized later):
The model was recognized as logistic regression with coefficients exponentiated, so the header displayed “OR” for odds ratio.
Variable types are automatically detected and reference rows are added for categorical variables.
Model estimates and confidence intervals are rounded and formatted.
Because the variables in the data set were labelled, the labels were carried through into the {gtsummary} output table. Had the data not been labelled, the default is to display the variable name.
Variable levels are indented and footnotes added.
There are four primary ways to customize the output of the regression model table.
tbl_regression()
function input argumentsadd_*()
functionsThe tbl_regression()
function includes many arguments
for modifying the appearance.
Argument  Description 


modify variable labels in table 

exponentiate model coefficients 

names of variables to include in output. Default is all variables 

By default, categorical variables are printed on multiple rows. If a variable is dichotomous and you wish to print the regression coefficient on a single row, include the variable name(s) here. 

confidence level of confidence interval 

indicates whether to include the intercept 

function to round and format coefficient estimates 

function to round and format pvalues 

function to specify/customize tidier function 
The {gtsummary} package has builtin functions for adding to results
from tbl_regression()
. The following functions add columns
and/or information to the regression table.
Function  Description 

adds the global pvalue for a categorical variables  
adds statistics from `broom::glance()` as source note  
adds column of the variance inflation factors (VIF)  
add a column of q values to control for multiple comparisons 
The {gtsummary} package comes with functions specifically made to modify and format summary tables.
Function  Description 

update column headers  
update column footnote  
update spanning headers  
update table caption/title  
bold variable labels  
bold variable levels  
italicize variable labels  
italicize variable levels  
bold significant pvalues 
The {gt} package is packed with many great functions for modifying table output—too many to list here. Review the package’s website for a full listing.
To use the {gt} package functions with {gtsummary} tables, the
regression table must first be converted into a {gt} object. To this
end, use the as_gt()
function after modifications have been
completed with {gtsummary} functions.
m1 %>%
tbl_regression(exponentiate = TRUE) %>%
as_gt() %>%
gt::tab_source_note(gt::md("*This data is simulated*"))
Characteristic  OR^{1}  95% CI^{1}  pvalue 

Age  1.02  1.00, 1.04  0.091 
T Stage  
T1  —  —  
T2  0.58  0.24, 1.37  0.2 
T3  0.94  0.39, 2.28  0.9 
T4  0.79  0.33, 1.90  0.6 
This data is simulated  
^{1} OR = Odds Ratio, CI = Confidence Interval 
There are formatting options available, such as adding bold and
italics to text. In the example below,
 Coefficients are exponentiated to give odds
ratios
 Global pvalues for Stage are reported  Large
pvalues are rounded to two decimal places
 Pvalues less than 0.10 are bold  Variable labels
are bold
 Variable levels are italicized
# format results into data frame with global pvalues
m1 %>%
tbl_regression(
exponentiate = TRUE,
pvalue_fun = ~style_pvalue(.x, digits = 2),
) %>%
add_global_p() %>%
bold_p(t = 0.10) %>%
bold_labels() %>%
italicize_levels()
Characteristic  OR^{1}  95% CI^{1}  pvalue 

Age  1.02  1.00, 1.04  0.087 
T Stage  0.62  
T1  —  —  
T2  0.58  0.24, 1.37  
T3  0.94  0.39, 2.28  
T4  0.79  0.33, 1.90  
^{1} OR = Odds Ratio, CI = Confidence Interval 
The tbl_uvregression()
function produces a table of
univariate regression models. The function is a wrapper for
tbl_regression()
, and as a result, accepts nearly identical
function arguments. The function’s results can be modified in similar
ways to tbl_regression()
.
trial %>%
select(response, age, grade) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE,
pvalue_fun = ~style_pvalue(.x, digits = 2)
) %>%
add_global_p() %>% # add global pvalue
add_nevent() %>% # add number of events of the outcome
add_q() %>% # adjusts global pvalues for multiple testing
bold_p() %>% # bold pvalues under a given threshold (default 0.05)
bold_p(t = 0.10, q = TRUE) %>% # now bold qvalues under the threshold of 0.10
bold_labels()
#> add_q: Adjusting pvalues with
#> `stats::p.adjust(x$table_body$p.value, method = "fdr")`
Characteristic  N  Event N  OR^{1}  95% CI^{1}  pvalue  qvalue^{2} 

Age  183  58  1.02  1.00, 1.04  0.091  0.18 
Grade  193  61  0.93  0.93  
I  —  —  
II  0.95  0.45, 2.00  
III  1.10  0.52, 2.29  
^{1} OR = Odds Ratio, CI = Confidence Interval  
^{2} False discovery rate correction for multiple testing 
The {gtsummary} regression functions and their related functions have
sensible defaults for rounding and formatting results. If you, however,
would like to change the defaults there are a few options. The default
options can be changed using the {gtsummary} themes function
set_gtsummary_theme()
. The package includes prespecified
themes, and you can also create your own. Themes can control baseline
behavior, for example, how pvalues are rounded, coefficients are
rounded, default headers, confidence levels, etc. For details on
creating a theme and setting personal defaults, visit the themes
vignette.
Below is a listing of known and tested models supported by
tbl_regression()
. If a model follows a standard format and
has a tidier, it’s likely to be supported as well, even if not listed
below.
Model  Details 







Limited support. It is recommended to use 


May fail with R <= 4.0. 

May fail with R <= 4.0. 

May fail with R <= 4.0. 

May fail with R <= 4.0. 





Limited support for categorical variables 








Use default tidier 

Limited support. If 


Limited support for models with nominal predictors. 

Limited support for models with nominal predictors. 
Supported as long as the type of model and the engine is supported. 





Reference rows are not relevant for such models. 

Limited support 


Limited support. It is recommended to use 