Skip to contents

The tbl_summary() function calculates descriptive statistics for continuous, categorical, and dichotomous variables. Review the tbl_summary vignette for detailed examples.

Usage

tbl_summary(
  data,
  by = NULL,
  label = NULL,
  statistic = list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~
    "{n} ({p}%)"),
  digits = NULL,
  type = NULL,
  value = NULL,
  missing = c("ifany", "no", "always"),
  missing_text = "Unknown",
  missing_stat = "{N_miss}",
  sort = all_categorical(FALSE) ~ "alphanumeric",
  percent = c("column", "row", "cell"),
  include = everything()
)

Arguments

data

(data.frame)
A data frame.

by

(tidy-select)
A single column from data. Summary statistics will be stratified by this variable. Default is NULL.

label

(formula-list-selector)
Used to override default labels in summary table, e.g. list(age = "Age, years"). The default for each variable is the column label attribute, attr(., 'label'). If no label has been set, the column name is used.

statistic

(formula-list-selector)
Specifies summary statistics to display for each variable. The default is list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}%)"). See below for details.

digits

(formula-list-selector)
Specifies how summary statistics are rounded. Values may be either integer(s) or function(s). If not specified, default formatting is assigned via assign_summary_digits(). See below for details.

type

(formula-list-selector)
Specifies the summary type. Accepted value are c("continuous", "continuous2", "categorical", "dichotomous"). If not specified, default type is assigned via assign_summary_type(). See below for details.

value

(formula-list-selector)
Specifies the level of a variable to display on a single row. The gtsummary type selectors, e.g. all_dichotomous(), cannot be used with this argument. Default is NULL. See below for details.

missing, missing_text, missing_stat

Arguments dictating how and if missing values are presented:

  • missing: must be one of c("ifany", "no", "always")

  • missing_text: string indicating text shown on missing row. Default is "Unknown"

  • missing_stat: statistic to show on missing row. Default is "{N_miss}". Possible values are N_miss, N_obs, N_nonmiss, p_miss, p_nonmiss.

sort

(formula-list-selector)
Specifies sorting to perform for categorical variables. Values must be one of c("alphanumeric", "frequency"). Default is all_categorical(FALSE) ~ "alphanumeric".

percent

(string)
Indicates the type of percentage to return. Must be one of c("column", "row", "cell"). Default is "column".

include

(tidy-select)
Variables to include in the summary table. Default is everything().

Value

a gtsummary table of class "tbl_summary"

A table of class c('tbl_summary', 'gtsummary')

statistic argument

The statistic argument specifies the statistics presented in the table. The input dictates the summary statistics presented in the table. For example, statistic = list(age ~ "{mean} ({sd})") would report the mean and standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})") would report the mean and standard deviation for all continuous variables.

The values are interpreted using glue::glue() syntax: a name that appears between curly brackets will be interpreted as a function name and the formatted result of that function will be placed in the table.

For categorical variables, the following statistics are available to display: {n} (frequency), {N} (denominator), {p} (percent).

For continuous variables, any univariate function may be used. The most commonly used functions are {median}, {mean}, {sd}, {min}, and {max}. Additionally, {p##} is available for percentiles, where ## is an integer from 0 to 100. For example, p25: quantile(probs=0.25, type=2).

When the summary type is "continuous2", pass a vector of statistics. Each element of the vector will result in a separate row in the summary table.

For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.

  • {N_obs} total number of observations

  • {N_miss} number of missing observations

  • {N_nonmiss} number of non-missing observations

  • {p_miss} percentage of observations missing

  • {p_nonmiss} percentage of observations not missing

digits argument

The digits argument specifies the the number of digits (or formatting function) statistics are rounded to.

The values passed can either be a single integer, a vector of integers, a function, or a list of functions. If a single integer or function is passed, it is recycled to the length of the number of statistics presented. For example, if the statistic is "{mean} ({sd})", it is equivalent to pass 1, c(1, 1), label_style_number(digits=1), and list(label_style_number(digits=1), label_style_number(digits=1)).

Named lists are also accepted to change the default formatting for a single statistic, e.g. list(sd = label_style_number(digits=1)).

type and value arguments

There are four summary types. Use the type argument to change the default summary types.

  • "continuous" summaries are shown on a single row. Most numeric variables default to summary type continuous.

  • "continuous2" summaries are shown on 2 or more rows

  • "categorical" multi-line summaries of nominal data. Character variables, factor variables, and numeric variables with fewer than 10 unique levels default to type categorical. To change a numeric variable to continuous that defaulted to categorical, use type = list(varname ~ "continuous")

  • "dichotomous" categorical variables that are displayed on a single row, rather than one row per level of the variable. Variables coded as TRUE/FALSE, 0/1, or yes/no are assumed to be dichotomous, and the TRUE, 1, and yes rows are displayed. Otherwise, the value to display must be specified in the value argument, e.g. value = list(varname ~ "level to show")

See also

See tbl_summary vignette for detailed tutorial

See table gallery for additional examples

Review list, formula, and selector syntax used throughout gtsummary

Author

Daniel D. Sjoberg

Examples

# Example 1 ----------------------------------
trial |>
  select(age, grade, response) |>
  tbl_summary()
Characteristic N = 2001
Age 47 (38, 57)
    Unknown 11
Grade
    I 68 (34%)
    II 68 (34%)
    III 64 (32%)
Tumor Response 61 (32%)
    Unknown 7
1 Median (Q1, Q3); n (%)
# Example 2 ---------------------------------- trial |> select(age, grade, response, trt) |> tbl_summary( by = trt, label = list(age = "Patient Age"), statistic = list(all_continuous() ~ "{mean} ({sd})"), digits = list(age = c(0, 1)) )
Characteristic Drug A
N = 98
1
Drug B
N = 102
1
Patient Age 47 (14.7) 47 (14.0)
    Unknown 7 4
Grade

    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%)
    Unknown 3 4
1 Mean (SD); n (%)
# Example 3 ---------------------------------- trial |> select(age, marker) |> tbl_summary( type = all_continuous() ~ "continuous2", statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}"), missing = "no" )
Characteristic N = 200
Age
    Median (Q1, Q3) 47 (38, 57)
    Min, Max 6, 83
Marker Level (ng/mL)
    Median (Q1, Q3) 0.64 (0.22, 1.41)
    Min, Max 0.00, 3.87