`R/tbl_custom_summary.R`

`tbl_custom_summary.Rd`

The `tbl_custom_summary()`

function calculates descriptive statistics for
continuous, categorical, and dichotomous variables.
This function is similar to `tbl_summary()`

but allows you to provide
a custom function in charge of computing the statistics (see Details).

```
tbl_custom_summary(
data,
by = NULL,
label = NULL,
stat_fns,
statistic,
digits = NULL,
type = NULL,
value = NULL,
missing = NULL,
missing_text = NULL,
include = everything(),
overall_row = FALSE,
overall_row_last = FALSE,
overall_row_label = NULL
)
```

- data
A data frame

- by
A column name (quoted or unquoted) in

`data`

. Summary statistics will be calculated separately for each level of the`by`

variable (e.g.`by = trt`

). If`NULL`

, summary statistics are calculated using all observations. To stratify a table by two or more variables, use`tbl_strata()`

- label
List of formulas specifying variables labels, e.g.

`list(age ~ "Age", stage ~ "Path T Stage")`

. If a variable's label is not specified here, the label attribute (`attr(data$age, "label")`

) is used. If attribute label is`NULL`

, the variable name will be used.- stat_fns
Formula or list of formulas specifying the function to be used to compute the statistics (see below for details and examples). You can also use dedicated helpers such as

`continuous_summary()`

,`ratio_summary()`

or`proportion_summary()`

.- statistic
List of formulas specifying the

`glue::glue()`

pattern to display the statistics for each variable. The statistics should be returned by the functions specified in`stat_fns`

(see below for details and examples).- digits
List of formulas specifying the number of decimal places to round summary statistics. If not specified,

`tbl_summary`

guesses an appropriate number of decimals to round statistics. When multiple statistics are displayed for a single variable, supply a vector rather than an integer. For example, if the statistic being calculated is`"{mean} ({sd})"`

and you want the mean rounded to 1 decimal place, and the SD to 2 use`digits = list(age ~ c(1, 2))`

. User may also pass a styling function:`digits = age ~ style_sigfig`

- type
List of formulas specifying variable types. Accepted values are

`c("continuous", "continuous2", "categorical", "dichotomous")`

, e.g.`type = list(age ~ "continuous", female ~ "dichotomous")`

. If type not specified for a variable, the function will default to an appropriate summary type. See below for details.- value
List of formulas specifying the value to display for dichotomous variables. gtsummary selectors, e.g.

`all_dichotomous()`

, cannot be used with this argument. See below for details.- missing
Indicates whether to include counts of

`NA`

values in the table. Allowed values are`"no"`

(never display NA values),`"ifany"`

(only display if any NA values), and`"always"`

(includes NA count row for all variables). Default is`"ifany"`

.- missing_text
String to display for count of missing observations. Default is

`"Unknown"`

.- include
variables to include in the summary table. Default is

`everything()`

- overall_row
Logical indicator to display an overall row. Default is

`FALSE`

. Use`add_overall()`

to add an overall column.- overall_row_last
Logical indicator to display overall row last in table. Default is

`FALSE`

, which will display overall row first.- overall_row_label
String indicating the overall row label. Default is

`"Overall"`

.

A `tbl_custom_summary`

and `tbl_summary`

object

`tbl_summary()`

Please refer to the help file of `tbl_summary()`

regarding the use of select
helpers, and arguments `include`

, `by`

, `type`

, `value`

, `digits`

, `missing`

and
`missing_text`

.

`stat_fns`

argumentThe `stat_fns`

argument specify the custom function(s) to be used for computing
the summary statistics. For example, `stat_fns = everything() ~ foo`

.

Each function may take the following arguments:
`foo(data, full_data, variable, by, type, ...)`

`data=`

is the input data frame passed to`tbl_custom_summary()`

, subset according to the level of`by`

or`variable`

if any, excluding`NA`

values of the current`variable`

`full_data=`

is the full input data frame passed to`tbl_custom_summary()`

`variable=`

is a string indicating the variable to perform the calculation on`by=`

is a string indicating the by variable from`tbl_custom_summary=`

, if present`type=`

is a string indicating the type of variable (continuous, categorical, ...)`stat_display=`

a string indicating the statistic to display (for the`statistic`

argument, for that variable)

The user-defined does not need to utilize each of these inputs. It's
encouraged the user-defined function accept `...`

as each of the arguments
*will* be passed to the function, even if not all inputs are utilized by
the user's function, e.g. `foo(data, ...)`

(see examples).

The user-defined function should return a one row `dplyr::tibble()`

with
one column per summary statistics (see examples).

The statistic argument specifies the statistics presented in the table. The
input is a list of formulas that specify the statistics to report. For example,
`statistic = list(age ~ "{mean} ({sd})")`

.
A statistic name that appears between curly brackets
will be replaced with the numeric statistic (see `glue::glue()`

).
All the statistics indicated in the statistic argument should be returned
by the functions defined in the `stat_fns`

argument.

When the summary type is `"continuous2"`

, pass a vector of statistics. Each element
of the vector will result in a separate row in the summary table.

For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are also available to display.

`{N_obs}`

total number of observations`{N_miss}`

number of missing observations`{N_nonmiss}`

number of non-missing observations`{p_miss}`

percentage of observations missing`{p_nonmiss}`

percentage of observations not missing

Note that for categorical variables, `{N_obs}`

, `{N_miss}`

and `{N_nonmiss}`

refer
to the total number, number missing and number non missing observations
in the denominator, not at each level of the categorical variable.

It is recommended to use `modify_footnote()`

to properly describe the
displayed statistics (see examples).

The returned table is compatible with all `gtsummary`

features applicable
to a `tbl_summary`

object, like `add_overall()`

, `modify_footnote()`

or
`bold_labels()`

.

However, some of them could be inappropriate in such case. In particular,
`add_p()`

do not take into account the type of displayed statistics and
always return the p-value of a comparison test of the current variable
according to the `by`

groups, which may be incorrect if the displayed
statistics refer to a third variable.

Example 1

Example 2

Example 3

Review list, formula, and selector syntax used throughout gtsummary

Other tbl_summary tools:
`add_n.tbl_summary()`

,
`add_overall()`

,
`add_p.tbl_summary()`

,
`add_q()`

,
`add_stat_label()`

,
`bold_italicize_labels_levels`

,
`inline_text.tbl_summary()`

,
`inline_text.tbl_survfit()`

,
`modify`

,
`separate_p_footnotes()`

,
`tbl_merge()`

,
`tbl_split()`

,
`tbl_stack()`

,
`tbl_strata()`

,
`tbl_summary()`

Other tbl_custom_summary tools:
`add_overall()`

,
`continuous_summary()`

,
`proportion_summary()`

,
`ratio_summary()`

```
# \donttest{
# Example 1 ----------------------------------
my_stats <- function(data, ...) {
marker_sum <- sum(data$marker, na.rm = TRUE)
mean_age <- mean(data$age, na.rm = TRUE)
dplyr::tibble(
marker_sum = marker_sum,
mean_age = mean_age
)
}
my_stats(trial)
#> # A tibble: 1 × 2
#> marker_sum mean_age
#> <dbl> <dbl>
#> 1 174. 47.2
tbl_custom_summary_ex1 <-
trial %>%
tbl_custom_summary(
include = c("stage", "grade"),
by = "trt",
stat_fns = everything() ~ my_stats,
statistic = everything() ~ "A: {mean_age} - S: {marker_sum}",
digits = everything() ~ c(1, 0),
overall_row = TRUE,
overall_row_label = "All stages & grades"
) %>%
add_overall(last = TRUE) %>%
modify_footnote(
update = all_stat_cols() ~ "A: mean age - S: sum of marker"
) %>%
bold_labels()
# Example 2 ----------------------------------
# Use `data[[variable]]` to access the current variable
mean_ci <- function(data, variable, ...) {
test <- t.test(data[[variable]])
dplyr::tibble(
mean = test$estimate,
conf.low = test$conf.int[1],
conf.high = test$conf.int[2]
)
}
tbl_custom_summary_ex2 <-
trial %>%
tbl_custom_summary(
include = c("marker", "ttdeath"),
by = "trt",
stat_fns = ~mean_ci,
statistic = ~"{mean} [{conf.low}; {conf.high}]"
) %>%
add_overall(last = TRUE) %>%
modify_footnote(
update = all_stat_cols() ~ "mean [95% CI]"
)
# Example 3 ----------------------------------
# Use `full_data` to access the full datasets
# Returned statistic can also be a character
diff_to_great_mean <- function(data, full_data, ...) {
mean <- mean(data$marker, na.rm = TRUE)
great_mean <- mean(full_data$marker, na.rm = TRUE)
diff <- mean - great_mean
dplyr::tibble(
mean = mean,
great_mean = great_mean,
diff = diff,
level = ifelse(diff > 0, "high", "low")
)
}
tbl_custom_summary_ex3 <-
trial %>%
tbl_custom_summary(
include = c("grade", "stage"),
by = "trt",
stat_fns = ~diff_to_great_mean,
statistic = ~"{mean} ({level}, diff: {diff})",
overall_row = TRUE
) %>%
bold_labels()
# }
```