The tbl_summary() function calculates descriptive statistics for
continuous, categorical, and dichotomous variables.
Review the
tbl_summary vignette
for detailed examples.
Usage
tbl_summary(
data,
by = NULL,
label = NULL,
statistic = list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~
"{n} ({p}%)"),
digits = NULL,
type = NULL,
value = NULL,
missing = c("ifany", "no", "always"),
missing_text = "Unknown",
missing_stat = "{N_miss}",
sort = all_categorical(FALSE) ~ "alphanumeric",
percent = c("column", "row", "cell"),
include = everything()
)Arguments
- data
(
data.frame)
A data frame.- by
(
tidy-select)
A single column fromdata. Summary statistics will be stratified by this variable. Default isNULL.- label
(
formula-list-selector)
Used to override default labels in summary table, e.g.list(age = "Age, years"). The default for each variable is the column label attribute,attr(., 'label'). If no label has been set, the column name is used.- statistic
(
formula-list-selector)
Specifies summary statistics to display for each variable. The default islist(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}%)"). See below for details.- digits
(
formula-list-selector)
Specifies how summary statistics are rounded. Values may be either integer(s) or function(s). If not specified, default formatting is assigned viaassign_summary_digits(). See below for details.- type
(
formula-list-selector)
Specifies the summary type. Accepted value arec("continuous", "continuous2", "categorical", "dichotomous"). If not specified, default type is assigned viaassign_summary_type(). See below for details.- value
(
formula-list-selector)
Specifies the level of a variable to display on a single row. The gtsummary type selectors, e.g.all_dichotomous(), cannot be used with this argument. Default isNULL. See below for details.- missing, missing_text, missing_stat
Arguments dictating how and if missing values are presented:
missing: must be one ofc("ifany", "no", "always").missing_text: string indicating text shown on missing row. Default is"Unknown".missing_stat: statistic to show on missing row. Default is"{N_miss}". Possible values areN_miss,N_obs,N_nonmiss,p_miss,p_nonmiss.
- sort
(
formula-list-selector)
Specifies sorting to perform for categorical variables. Values must be one ofc("alphanumeric", "frequency"). Default isall_categorical(FALSE) ~ "alphanumeric".- percent
(
string)
Indicates the type of percentage to return. Must be one ofc("column", "row", "cell"). Default is"column".In rarer cases, you may need to define/override the typical denominators. In these cases, pass an integer or a data frame. Refer to the
?cards::ard_categorical(denominator)help file for details.- include
(
tidy-select)
Variables to include in the summary table. Default iseverything().
statistic argument
The statistic argument specifies the statistics presented in the table. The
input dictates the summary statistics presented in the table. For example,
statistic = list(age ~ "{mean} ({sd})") would report the mean and
standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})")
would report the mean and standard deviation for all continuous variables.
The values are interpreted using glue::glue() syntax:
a name that appears between curly brackets will be interpreted as a function
name and the formatted result of that function will be placed in the table.
For categorical variables, the following statistics are available to display:
{n} (frequency), {N} (denominator), {p} (percent).
For continuous variables, any univariate function may be used.
The most commonly used functions are {median}, {mean}, {sd}, {min},
and {max}.
Additionally, {p##} is available for percentiles, where ## is an integer from 0 to 100.
For example, p25: quantile(probs=0.25, type=2).
When the summary type is "continuous2", pass a vector of statistics.
Each element of the vector will result in a separate row in the summary table.
For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.
{N_obs}total number of observations{N_miss}number of missing observations{N_nonmiss}number of non-missing observations{p_miss}percentage of observations missing{p_nonmiss}percentage of observations not missing
digits argument
The digits argument specifies the the number of digits (or formatting function) statistics are rounded to.
The values passed can either be a single integer, a vector of integers, a
function, or a list of functions. If a single integer or function is passed,
it is recycled to the length of the number of statistics presented.
For example, if the statistic is "{mean} ({sd})", it is equivalent to
pass 1, c(1, 1), label_style_number(digits=1), and
list(label_style_number(digits=1), label_style_number(digits=1)).
Named lists are also accepted to change the default formatting for a single
statistic, e.g. list(sd = label_style_number(digits=1)).
type and value arguments
There are four summary types. Use the type argument to change the default summary types.
"continuous"summaries are shown on a single row. Most numeric variables default to summary type continuous."continuous2"summaries are shown on 2 or more rows"categorical"multi-line summaries of nominal data. Character variables, factor variables, and numeric variables with fewer than 10 unique levels default to type categorical. To change a numeric variable to continuous that defaulted to categorical, usetype = list(varname ~ "continuous")"dichotomous"categorical variables that are displayed on a single row, rather than one row per level of the variable. Variables coded asTRUE/FALSE,0/1, oryes/noare assumed to be dichotomous, and theTRUE,1, andyesrows are displayed. Otherwise, the value to display must be specified in thevalueargument, e.g.value = list(varname ~ "level to show")
See also
See tbl_summary vignette for detailed tutorial
See table gallery for additional examples
Review list, formula, and selector syntax used throughout gtsummary
Examples
# Example 1 ----------------------------------
trial |>
select(age, grade, response) |>
tbl_summary()
Characteristic
N = 2001
1 Median (Q1, Q3); n (%)
# Example 2 ----------------------------------
trial |>
tbl_summary(
by = trt,
include = c(age, grade, response, trt),
label = list(age = "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age = c(0, 1))
)
Characteristic
Drug A
N = 981
Drug B
N = 1021
1 Mean (SD); n (%)
# Example 3 ----------------------------------
trial |>
tbl_summary(
include = c(age, marker),
type = all_continuous() ~ "continuous2",
statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}"),
missing = "no"
)
Characteristic
N = 200
