The tbl_summary()
function calculates descriptive statistics for
continuous, categorical, and dichotomous variables.
Review the
tbl_summary vignette
for detailed examples.
Usage
tbl_summary(
data,
by = NULL,
label = NULL,
statistic = list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~
"{n} ({p}%)"),
digits = NULL,
type = NULL,
value = NULL,
missing = c("ifany", "no", "always"),
missing_text = "Unknown",
missing_stat = "{N_miss}",
sort = all_categorical(FALSE) ~ "alphanumeric",
percent = c("column", "row", "cell"),
include = everything()
)
Arguments
- data
(
data.frame
)
A data frame.- by
(
tidy-select
)
A single column fromdata
. Summary statistics will be stratified by this variable. Default isNULL
.- label
(
formula-list-selector
)
Used to override default labels in summary table, e.g.list(age = "Age, years")
. The default for each variable is the column label attribute,attr(., 'label')
. If no label has been set, the column name is used.- statistic
(
formula-list-selector
)
Specifies summary statistics to display for each variable. The default islist(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}%)")
. See below for details.- digits
(
formula-list-selector
)
Specifies how summary statistics are rounded. Values may be either integer(s) or function(s). If not specified, default formatting is assigned viaassign_summary_digits()
. See below for details.- type
(
formula-list-selector
)
Specifies the summary type. Accepted value arec("continuous", "continuous2", "categorical", "dichotomous")
. If not specified, default type is assigned viaassign_summary_type()
. See below for details.- value
(
formula-list-selector
)
Specifies the level of a variable to display on a single row. The gtsummary type selectors, e.g.all_dichotomous()
, cannot be used with this argument. Default isNULL
. See below for details.- missing, missing_text, missing_stat
Arguments dictating how and if missing values are presented:
missing
: must be one ofc("ifany", "no", "always")
missing_text
: string indicating text shown on missing row. Default is"Unknown"
missing_stat
: statistic to show on missing row. Default is"{N_miss}"
. Possible values areN_miss
,N_obs
,N_nonmiss
,p_miss
,p_nonmiss
.
- sort
(
formula-list-selector
)
Specifies sorting to perform for categorical variables. Values must be one ofc("alphanumeric", "frequency")
. Default isall_categorical(FALSE) ~ "alphanumeric"
.- percent
(
string
)
Indicates the type of percentage to return. Must be one ofc("column", "row", "cell")
. Default is"column"
.- include
(
tidy-select
)
Variables to include in the summary table. Default iseverything()
.
statistic argument
The statistic argument specifies the statistics presented in the table. The
input dictates the summary statistics presented in the table. For example,
statistic = list(age ~ "{mean} ({sd})")
would report the mean and
standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})")
would report the mean and standard deviation for all continuous variables.
The values are interpreted using glue::glue()
syntax:
a name that appears between curly brackets will be interpreted as a function
name and the formatted result of that function will be placed in the table.
For categorical variables, the following statistics are available to display:
{n}
(frequency), {N}
(denominator), {p}
(percent).
For continuous variables, any univariate function may be used.
The most commonly used functions are {median}
, {mean}
, {sd}
, {min}
,
and {max}
.
Additionally, {p##}
is available for percentiles, where ##
is an integer from 0 to 100.
For example, p25: quantile(probs=0.25, type=2)
.
When the summary type is "continuous2"
, pass a vector of statistics.
Each element of the vector will result in a separate row in the summary table.
For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.
{N_obs}
total number of observations{N_miss}
number of missing observations{N_nonmiss}
number of non-missing observations{p_miss}
percentage of observations missing{p_nonmiss}
percentage of observations not missing
digits argument
The digits argument specifies the the number of digits (or formatting function) statistics are rounded to.
The values passed can either be a single integer, a vector of integers, a
function, or a list of functions. If a single integer or function is passed,
it is recycled to the length of the number of statistics presented.
For example, if the statistic is "{mean} ({sd})"
, it is equivalent to
pass 1
, c(1, 1)
, label_style_number(digits=1)
, and
list(label_style_number(digits=1), label_style_number(digits=1))
.
Named lists are also accepted to change the default formatting for a single
statistic, e.g. list(sd = label_style_number(digits=1))
.
type and value arguments
There are four summary types. Use the type
argument to change the default summary types.
"continuous"
summaries are shown on a single row. Most numeric variables default to summary type continuous."continuous2"
summaries are shown on 2 or more rows"categorical"
multi-line summaries of nominal data. Character variables, factor variables, and numeric variables with fewer than 10 unique levels default to type categorical. To change a numeric variable to continuous that defaulted to categorical, usetype = list(varname ~ "continuous")
"dichotomous"
categorical variables that are displayed on a single row, rather than one row per level of the variable. Variables coded asTRUE
/FALSE
,0
/1
, oryes
/no
are assumed to be dichotomous, and theTRUE
,1
, andyes
rows are displayed. Otherwise, the value to display must be specified in thevalue
argument, e.g.value = list(varname ~ "level to show")
See also
See tbl_summary vignette for detailed tutorial
See table gallery for additional examples
Review list, formula, and selector syntax used throughout gtsummary
Examples
# Example 1 ----------------------------------
trial |>
select(age, grade, response) |>
tbl_summary()
Characteristic
N = 2001
1 Median (Q1, Q3); n (%)
# Example 2 ----------------------------------
trial |>
select(age, grade, response, trt) |>
tbl_summary(
by = trt,
label = list(age = "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age = c(0, 1))
)
Characteristic
Drug A
N = 981
Drug B
N = 1021
1 Mean (SD); n (%)
# Example 3 ----------------------------------
trial |>
select(age, marker) |>
tbl_summary(
type = all_continuous() ~ "continuous2",
statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}"),
missing = "no"
)
Characteristic
N = 200