Summary table

The tbl_summary() function calculates descriptive statistics for continuous, categorical, and dichotomous variables. Review the tbl_summary vignette for detailed examples.

Usage

tbl_summary(
  data,
  by = NULL,
  label = NULL,
  statistic = list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~
    "{n} ({p}%)"),
  digits = NULL,
  type = NULL,
  value = NULL,
  missing = c("ifany", "no", "always"),
  missing_text = "Unknown",
  missing_stat = "{N_miss}",
  sort = all_categorical(FALSE) ~ "alphanumeric",
  percent = c("column", "row", "cell"),
  include = everything()
)

Arguments

data

(data.frame)
A data frame.

by

(tidy-select)
A single column from data. Summary statistics will be stratified by this variable. Default is NULL.

label

(formula-list-selector)
Used to override default labels in summary table, e.g. list(age = "Age, years"). The default for each variable is the column label attribute, attr(., 'label'). If no label has been set, the column name is used.

statistic

(formula-list-selector)
Specifies summary statistics to display for each variable. The default is list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}%)"). See below for details.

digits

(formula-list-selector)
Specifies how summary statistics are rounded. Values may be either integer(s) or function(s). If not specified, default formatting is assigned via assign_summary_digits(). See below for details.

type

(formula-list-selector)
Specifies the summary type. Accepted value are c("continuous", "continuous2", "categorical", "dichotomous"). If not specified, default type is assigned via assign_summary_type(). See below for details.

value

(formula-list-selector)
Specifies the level of a variable to display on a single row. The gtsummary type selectors, e.g. all_dichotomous(), cannot be used with this argument. Default is NULL. See below for details.

missing, missing_text, missing_stat

Arguments dictating how and if missing values are presented:

missing: must be one of c("ifany", "no", "always").
missing_text: string indicating text shown on missing row. Default is "Unknown".
missing_stat: statistic to show on missing row. Default is "{N_miss}". Possible values are N_miss, N_obs, N_nonmiss, p_miss, p_nonmiss.

sort

(formula-list-selector)
Specifies sorting to perform for categorical variables. Values must be one of c("alphanumeric", "frequency"). Default is all_categorical(FALSE) ~ "alphanumeric".

percent

(string)
Indicates the type of percentage to return. Must be one of c("column", "row", "cell"). Default is "column".

In rarer cases, you may need to define/override the typical denominators. In these cases, pass an integer or a data frame. Refer to the ?cards::ard_categorical(denominator) help file for details.

include

(tidy-select)
Variables to include in the summary table. Default is everything().

Value

a gtsummary table of class "tbl_summary"

A table of class c('tbl_summary', 'gtsummary')

statistic argument

The statistic argument specifies the statistics presented in the table. The input dictates the summary statistics presented in the table. For example, statistic = list(age ~ "{mean} ({sd})") would report the mean and standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})") would report the mean and standard deviation for all continuous variables.

The values are interpreted using glue::glue() syntax: a name that appears between curly brackets will be interpreted as a function name and the formatted result of that function will be placed in the table.

For categorical variables, the following statistics are available to display: {n} (frequency), {N} (denominator), {p} (percent).

For continuous variables, any univariate function may be used. The most commonly used functions are {median}, {mean}, {sd}, {min}, and {max}. Additionally, {p##} is available for percentiles, where ## is an integer from 0 to 100. For example, p25: quantile(probs=0.25, type=2).

When the summary type is "continuous2", pass a vector of statistics. Each element of the vector will result in a separate row in the summary table.

For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.

{N_obs} total number of observations
{N_miss} number of missing observations
{N_nonmiss} number of non-missing observations
{p_miss} percentage of observations missing
{p_nonmiss} percentage of observations not missing

digits argument

The digits argument specifies the the number of digits (or formatting function) statistics are rounded to.

The values passed can either be a single integer, a vector of integers, a function, or a list of functions. If a single integer or function is passed, it is recycled to the length of the number of statistics presented. For example, if the statistic is "{mean} ({sd})", it is equivalent to pass 1, c(1, 1), label_style_number(digits=1), and list(label_style_number(digits=1), label_style_number(digits=1)).

Named lists are also accepted to change the default formatting for a single statistic, e.g. list(sd = label_style_number(digits=1)).

type and value arguments

There are four summary types. Use the type argument to change the default summary types.

"continuous" summaries are shown on a single row. Most numeric variables default to summary type continuous.
"continuous2" summaries are shown on 2 or more rows
"categorical" multi-line summaries of nominal data. Character variables, factor variables, and numeric variables with fewer than 10 unique levels default to type categorical. To change a numeric variable to continuous that defaulted to categorical, use type = list(varname ~ "continuous")
"dichotomous" categorical variables that are displayed on a single row, rather than one row per level of the variable. Variables coded as TRUE/FALSE, 0/1, or yes/no are assumed to be dichotomous, and the TRUE, 1, and yes rows are displayed. Otherwise, the value to display must be specified in the value argument, e.g. value = list(varname ~ "level to show")

Author

Daniel D. Sjoberg

Examples

# Example 1 ----------------------------------
trial |>
  select(age, grade, response) |>
  tbl_summary()


  Characteristic
      N = 200¹
    
Age
47 (38, 57)
    Unknown
11
Grade

    I
68 (34%)
    II
68 (34%)
    III
64 (32%)
Tumor Response
61 (32%)
    Unknown
7
¹ Median (Q1, Q3); n (%)
    

# Example 2 ----------------------------------
trial |>
  tbl_summary(
    by = trt,
    include = c(age, grade, response, trt),
    label = list(age = "Patient Age"),
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    digits = list(age = c(0, 1))
  )


  Characteristic
      Drug A

N = 98¹
      Drug B

N = 102¹
    
Patient Age
47 (14.7)
47 (14.0)
    Unknown
7
4
Grade



    I
35 (36%)
33 (32%)
    II
32 (33%)
36 (35%)
    III
31 (32%)
33 (32%)
Tumor Response
28 (29%)
33 (34%)
    Unknown
3
4
¹ Mean (SD); n (%)
    

# Example 3 ----------------------------------
trial |>
  tbl_summary(
    include = c(age, marker),
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ c("{median} ({p25}, {p75})", "{min}, {max}"),
    missing = "no"
  )


  Characteristic
      N = 200
    
Age

    Median (Q1, Q3)
47 (38, 57)
    Min, Max
6, 83
Marker Level (ng/mL)

    Median (Q1, Q3)
0.64 (0.22, 1.41)
    Min, Max
0.00, 3.87

Characteristic	N = 200¹
Age	47 (38, 57)
Unknown	11
Grade
I	68 (34%)
II	68 (34%)
III	64 (32%)
Tumor Response	61 (32%)
Unknown	7
¹ Median (Q1, Q3); n (%)

Characteristic	Drug A N = 98¹	Drug B N = 102¹
Patient Age	47 (14.7)	47 (14.0)
Unknown	7	4
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
¹ Mean (SD); n (%)

Characteristic	N = 200
Age
Median (Q1, Q3)	47 (38, 57)
Min, Max	6, 83
Marker Level (ng/mL)
Median (Q1, Q3)	0.64 (0.22, 1.41)
Min, Max	0.00, 3.87