R/tbl_svysummary.R
tbl_svysummary.Rd
The tbl_svysummary
function calculates descriptive statistics for
continuous, categorical, and dichotomous variables taking into account survey weights and design.
It is similar to tbl_summary()
.
tbl_svysummary( data, by = NULL, label = NULL, statistic = NULL, digits = NULL, type = NULL, value = NULL, missing = NULL, missing_text = NULL, sort = NULL, percent = NULL, include = NULL )
data  A survey object created with created with 

by  A column name (quoted or unquoted) in 
label  List of formulas specifying variables labels,
e.g. 
statistic  List of formulas specifying types of summary statistics to
display for each variable. The default is

digits  List of formulas specifying the number of decimal
places to round continuous summary statistics. If not specified,

type  List of formulas specifying variable types. Accepted values
are 
value  List of formulas specifying the value to display for dichotomous variables. See below for details. 
missing  Indicates whether to include counts of 
missing_text  String to display for count of missing observations.
Default is 
sort  List of formulas specifying the type of sorting to perform for
categorical data. Options are 
percent  Indicates the type of percentage to return. Must be one of

include  variables to include in the summary table. Default is 
A tbl_svysummary
object
The statistic argument specifies the statistics presented in the table. The
input is a list of formulas that specify the statistics to report. For example,
statistic = list(age ~ "{mean} ({sd})")
would report the mean and
standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})")
would report the mean and standard deviation for all continuous variables.
A statistic name that appears between curly brackets
will be replaced with the numeric statistic (see glue::glue).
For categorical variables the following statistics are available to display.
{n}
frequency
{N}
denominator, or cohort size
{p}
formatted percentage
{n_unweighted}
unweighted frequency
{N_unweighted}
unweighted denominator
{p_unweighted}
unweighted formatted percentage
For continuous variables the following statistics are available to display.
{median}
median
{mean}
mean
{sd}
standard deviation
{var}
variance
{min}
minimum
{max}
maximum
{p##}
any integer percentile, where ##
is an integer from 0 to 100
{sum}
sum
Unlike tbl_summary()
, it is not possible to pass a custom function.
For both categorical and continuous variables, statistics on the number of missing and nonmissing observations and their proportions are available to display.
{N_obs}
total number of observations
{N_miss}
number of missing observations
{N_nonmiss}
number of nonmissing observations
{p_miss}
percentage of observations missing
{p_nonmiss}
percentage of observations not missing
{N_obs_unweighted}
unweighted total number of observations
{N_miss_unweighted}
unweighted number of missing observations
{N_nonmiss_unweighted}
unweighted number of nonmissing observations
{p_miss_unweighted}
unweighted percentage of observations missing
{p_nonmiss_unweighted}
unweighted percentage of observations not missing
Note that for categorical variables, {N_obs}
, {N_miss}
and {N_nonmiss}
refer
to the total number, number missing and number non missing observations
in the denominator, not at each level of the categorical variable.
Example 1
Example 2
The tbl_summary()
function has four summary types:
"continuous"
summaries are shown on a single row. Most numeric
variables default to summary type continuous.
"continuous2"
summaries are shown on 2 or more rows
"categorical"
multiline summaries of nominal data. Character variables,
factor variables, and numeric variables with fewer than 10 unique levels default to
type categorical. To change a numeric variable to continuous that
defaulted to categorical, use type = list(varname ~ "continuous")
"dichotomous"
categorical variables that are displayed on a single row,
rather than one row per level of the variable.
Variables coded as TRUE
/FALSE
, 0
/1
, or yes
/no
are assumed to be dichotomous,
and the TRUE
, 1
, and yes
rows are displayed.
Otherwise, the value to display must be specified in the value
argument, e.g. value = list(varname ~ "level to show")
Select helpers
from the \tidyselect\ package and \gtsummary\ package are available to
modify default behavior for groups of variables.
For example, by default continuous variables are reported with the median
and IQR. To change all continuous variables to mean and standard deviation use
statistic = list(all_continuous() ~ "{mean} ({sd})")
.
All columns with class logical are displayed as dichotomous variables showing
the proportion of events that are TRUE
on a single row. To show both rows
(i.e. a row for TRUE
and a row for FALSE
) use
type = list(all_logical() ~ "categorical")
.
The select helpers are available for use in any argument that accepts a list
of formulas (e.g. statistic
, type
, digits
, value
, sort
, etc.)
Other tbl_svysummary tools:
add_n.tbl_summary()
,
add_overall()
,
add_p.tbl_svysummary()
,
add_q()
,
add_stat_label()
,
modify
,
tbl_merge()
,
tbl_stack()
Joseph Larmarange
# Example 1  # A simple weighted dataset tbl_svysummary_ex1 < survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>% tbl_svysummary(by = Survived, percent = "row") # Example 2  # A dataset with a complex design data(api, package = "survey") tbl_svysummary_ex2 < survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) %>% tbl_svysummary(by = "both", include = c(cname, api00, api99, both))