Introduction to {gt} + {gtsummary} Packages

class: center, middle, inverse, title-slide

# Introduction to {gt} + {gtsummary} Packages
### Daniel D. Sjoberg Memorial Sloan Kettering Cancer Center
### June 27, 2019

---

class: center
background-image: url(images/gt_logo.png)
background-size: contain

# {gt} package

???
- package from Rstudio

- you may remember it from Stefania's review of the RStudio conference

- gt is the grammar of tables

- seeks to create a unifying code base for HTML, PDF, and RTF output

- HTML and PDF are great. RTF is getting close

- {gt} is wonderful, and we are doing the most brief introduction to it.

- I recommend you look into further yourself
---
# {gt} philosophy
.large[
"We can construct a wide variety of useful tables with a cohesive set of table parts. These include the *table header*, the *stub*, the *stub head*, the *column labels*, the *table body*, and the *table footer*."
]
<img src="images/gt_parts_of_a_table.png" width=60%>

???

- package has sets of functions for modifying each piece of a table

- we'll review the most common/useful

- {gt} documentation is so organized and color coded.

- It's easy to find what you're looking for.

---
# {gt} installation

.large[
- {gt} is not on CRAN.

- Use the code below to install from GitHub.
]

```r
remotes::install_github("rstudio/gt")
```

.large[
- While you're at it, install {gtsummary} as well.
]

```r
remotes::install_github("ddsjoberg/gtsummary")
```

.large[
- There is a version of {gtsummary} on CRAN, but with limited functionality.

- Use the version on GitHub (<a href="https://github.com/ddsjoberg/gtsummary">www.github.com/ddsjoberg/gtsummary</a>).

- The full version of {gtsummary} be released on CRAN after {gt} is released.
]
---
# {gt} examples: the data

.large[When used alone, the `gt()` function prints a data frame. But so much more is possible!]

.pull-left[

```r
library(gt)
# loading gtsummary for the data
library(gtsummary)
gt_trial_head <- head(trial) %>%
* gt()
```

<img src="images/gt_trial_head.png" width=98%>
]

.pull-right[

<img src="images/gt_trial_info.png" width=85%>
]

???
- the most common and useful is `gt()`

- `gt()` is the first function to run, required every time

- AFTER CLICK

- Data set contains different types of data

- Data is labelled! (leads to nice labels be default)
---
# {gt} examples: the viewer

.large[
- {gt} tables print to the RStudio viewer when in the global environment.
]
<img src="images/gt_in_viewer.PNG" width=64%>

.large[
- {gt} tables also print in R markdown documents (HTML, PDF, RTF), Shiny apps, etc.
]

???
- Word *.docx is NOT an output type.

- Word can, however, read RTF documents.

- RTF is how SAS creates in highly customized output, for example

---
# {gt} examples: formatting columns

```r
*trial_summary <- trial %>% group_by(trt) %>% summarise_at(vars(age, marker), mean, na.rm = TRUE)
```

.pull-left[

### Raw Summary Statistics

```r
gt_print <- 
* gt(trial_summary)
```

<img src="images/gt_print.png" width=64%>
]

.pull-right[
### Formatted Summary Statistics

```r
gt_format <-
 gt(trial_summary) %>% 
* fmt_number(columns = vars(age), decimals = 0) %>%
* fmt_number(columns = "marker", decimals = 2)
```

<img src="images/gt_format.png" width=50%>
]

.bottom[.large[Each column can be formatted without creating a character version of the column!]]

???

- Remember this trial_summary object.  We're using it for the next few slides

- the raw print is not pretty

- AFTER CLICK

- we can add formatting, here we round columns with `fmt_number()`

- we don't have to create character versions of our columns!  wonderful!

- use `vars()` because we can select multiple columns.

- Input allows ALL {tidyselect} functions.

- also accepts character vector of names

---
# {gt} examples: formatting cells

```r
gt_fmt_cell <- trial_summary %>%
 gather("variable", "mean", -trt) %>%
* gt() %>%
* fmt_number(columns = vars(mean), rows = (variable == "age"), decimals = 0) %>%
* fmt_number(columns = vars(mean), rows = (variable == "marker"), decimals = 2)
```

.pull-left[
<img src="images/gt_fmt_cell.png" width=75%>
]

.pull-right[.large[
- Use the `rows = ` argument to pinpoint a cell to format.

- There are many formatting functions available: `fmt_percent()`, `fmt_currency()`, `fmt_date()`, `fmt_time()`, `fmt_missing()`, and more.

- You can write your own function and pass it to `fmt()` to format a table.
]]

???

- the `fmt_number()` function can also used to format a single cell

- lots of formatting functions to choose from

- write your own formatting functions

---
# {gt} examples: grouping data

```r
gt_group <- trial_summary %>%
 gather("variable", "mean", -trt) %>%
* gt(groupname_col = "trt") %>%
 fmt_number(columns = vars(mean), rows = variable == "age", decimals = 0) %>% 
 fmt_number(columns = vars(mean), rows = variable == "marker", decimals = 2)
```

.pull-left[
<img src="images/gt_group.png" width=35%>
]

.pull-right[.large[
- Use the `groupname_col = ` argument to specify a column to group results.

- The grouping column is not printed and a stub row for each group is added.
]]

---
# {gt} examples: column formatting

```r
gt_cols <- trial_summary %>%
 gt() %>% 
 fmt_number(columns = vars(age), decimals = 0) %>% 
 fmt_number(columns = vars(marker), decimals = 2) %>%
* cols_label(trt = md("**Treatment**"), age = md("**Age**"), marker = md("**Marker**")) %>%
* tab_spanner(label = "Patient Characteristics", columns = vars(age, marker))
```

.pull-left[
<img src="images/gt_cols.png" width=95%>
]

.pull-right[.large[
- The `cols_label()` function modifies the column headers.

- The `tab_spanner()` function includes a spanning header row.

- The `md()` function interprets input text as Markdown (see also `html()`).
]]

---
# {gt} examples: titles & footnotes

```r
gt_title_footnote <- trial_summary %>%
 gt() %>% 
 fmt_number(columns = vars(age), decimals = 0) %>% 
 fmt_number(columns = vars(marker), decimals = 2) %>%
 cols_label(trt = md("**Treatment**"), age = md("**Age**"), marker = md("**Marker**")) %>%
* tab_header(title = "Patient Characteristics", subtitle = "Presented by treatment") %>%
* tab_footnote(footnote = "Statistic presented is the mean.",
* locations = cells_column_labels(columns = vars(age, marker)))
```

.pull-left[
<img src="images/gt_title_footnote.png" width=70%>
]

.pull-right[.large[
- It's easy to include titles, subtitles, footnotes, and source notes in {gt} tables.

- The footnotes are automatically numbered based on where they appear in the table.
]]

---
class: center
background-image: url(images/gt_functions.svg)
background-size: contain

# much much more {gt} to learn

???

- color coded documentation
---
class: center
background-image: url(images/gtsummary_logo1.png)
background-size: contain

# {gtsummary}

???

- Still working on the hex sticker!

- Font?

- Bubble placement is so important

---
# {gtsummary} introduction

.large[
- {gtsummary} will soon be a part of the biostatR-verse of packages.

- The package uses {gt} as its back end to create tables.

- Used to summarize data frames, regression models, and more.

- Has a tidy API, sensible defaults (meaning minimal code), and is highly customizable.
]

???

- some functions from biostatR v0.1 have been exported to mskR

- mskR and biostatR are both undergoing an uncoupling.

- biostatR will comprise of three packages

- biostatR won't fully undergo the uncoupling until {gt} is released on CRAN (or their RTF output is improved)
---
# {gtsummary} summarize data with tbl_summary()

.large[Let's review the data once more]

.pull-left[
<img src="images/gt_trial_info.png" width=90%>
]
.pull-right[
.large[For brevity, we'll use an abbreviated version of the trial data set with fewer columns.]

```r
sm_trial <-
 trial %>%
 select(trt, age, response, grade)
```
]

---
# {gtsummary} summarize data with tbl_summary()

.pull-left[

```r
tbl_summary_1 <-
* tbl_summary(sm_trial, by = "trt")
```

.large[
- Default statistics are median (IQR) for continuous variables, and n (percent) for categorical data.

- By default, variables coded as 0/1, TRUE/FALSE, and Yes/No are presented dichotomously.
]
]
.pull-right[
<img src="images/tbl_summary_1.png" width=100%>
]

???
- Go slow here

- summarizing a data set is the MOST important analysis

- summarize data first!  you will often catch mistakes.  Data is complicated, and understanding it up front is important.

- Communicating a summary of the data ALONG with analytic results in necessary (others may catch mistakes you're not aware of)

- {gtsummary} is for presenting results, other great packages are available for summarizing data for your self (e.g. skimr)

- just one line of code

- all functions beginning with `tbl_*` create a new tables

- this is how I used the package 95% percent of the time...so easy

- three types of data shown here (explain them)

---
# {gtsummary} summarize data with tbl_summary()

.pull-left[

```r
tbl_summary_2 <-
 tbl_summary(sm_trial, by = "trt") %>%
* add_p()
```

.large[
- To compare values across two or more groups, use the `add_p()` function.

- The default tests are the Wilcoxon rank-sum test for continuous variables, chi-square test of independence for most categorical data, and Fisher's exact test for categorical data with low expected counts.
]
]
.pull-right[
<img src="images/tbl_summary_2.png" width=100%>
]

???

- `add_p()` account for another 4% of how I use the function

- all functions beginning with `tbl_*` create a new tables, and `add_*` add information to an existing table

- explain the test defaults

- 2 or more groups
    - random effects for correlated data

---
# {gtsummary} and the {glue} package: an aside

.large[
- {glue} is similar to paste (but I like it so much more).

- Embed R expressions in curly braces.

- They are then evaluated and inserted into the argument string.
]

```r
name = "Daniel"
x = 1
*glue::glue("{name} is number {x}")
```

```
## Daniel is number 1
```

.large[
- Expression can be complex.
]

```r
*glue::glue("{name} is number {((x + 100) * 10) - 1009}")
```

```
## Daniel is number 1
```

???
- an aside

- glue is used in some {gtsummary} function arguments, also {gt}
---
# {gtsummary} summarize data with tbl_summary()

.pull-left[

```r
tbl_summary_3 <- sm_trial %>%
 tbl_summary(
 by = "trt",
* statistic = list(
* all_continuous() ~ "{mean} ({sd})",
* all_categorical() ~ "{n} / {N} ({p}%)"
* ),
* label = "age" ~ "Patient Age"
 ) %>%
* add_p(test = all_continuous() ~ "t.test")
```

.large[
- Report mean and standard deviation for continuous variables.

- Specify label for age variable.

- Report p-values from the t-test.
]
]
.pull-right[
<img src="images/tbl_summary_3.png" width=100%>
]

???

- defaults are great, let's change the default behavior

- statistics can be changed to anything...literally any R function

- discuss the formula notation
    - it's like `case_when()`, condition/variable on LHS and result on RHS
    - one formula doesn't need to be in a list, but more than one must be listed

- the vignette has examples with more examples
---
# {gtsummary} summarize data with tbl_summary()

.pull-left[

```r
tbl_summary_4 <- sm_trial %>%
 tbl_summary(
 by = "trt",
* type = "response" ~ "categorical",
 statistic = all_continuous() ~ "{mean} ({sd})",
* digits = vars(age) ~ c(0, 1)
 ) %>%
 add_p(test = all_continuous() ~ "t.test") %>%
* add_stat_label()
```

.large[
- Report levels for the response variable.

- Modify the default rounding for age.

- Add column of statistics presented.

- Footnote about statistics is gone!
]
]
.pull-right[
<img src="images/tbl_summary_4.png" width=100%>
]

???

- further discuss formula notation
    - just like {gt} can use both select helpers OR characters vector of names

- discuss digits and how it's used

- discuss `stat_label = `, and mention the footnote was omitted

---
# {gtsummary} summarize data with tbl_summary()

.large[Advanced Customization

- It's natural a {gtsummary} package user would want to customize the aesthetics of the table with one or more of the many {gt} functions available.

- Every function in {gt} is available to use with a {gtsummary} object.

1. Create a {gtsummary} table.

1. Convert the table to a {gt} object with the `as_gt()` function.

1. Continue formatting as a {gt} table with any {gt} function.
]

???

Discuss `as_gt()` and how to use
---
# {gtsummary} summarize data with tbl_summary()

.pull-left[
.large[Advanced Customization]

```r
tbl_summary_5 <- sm_trial %>%
 tbl_summary(by = "trt") %>%
 # convert from gtsummary object to gt object
* as_gt() %>%
 # modify with gt functions
* tab_spanner(
* label = "Randomization Group",
* columns = starts_with("stat_")
* )
```

.footnote[More on this in the `tbl_summary()` <a href="http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html#advanced-customization">vignette</a>]
]
.pull-right[
<img src="images/tbl_summary_5.png" width=90%>
]
---
class: left
# {gtsummary} summarize data with tbl_summary()

.large[
Review the tbl_summary vignette for more details
http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html
]

.pull-left[.large[
- Reporting any statistic for continuous variables, including user-written functions.

- More on dichotomous variables and how to specify the level printed.

- Missing data options (e.g. report as a column rather than a row, always report N missing even when no missing data, modify missing text, etc.).
]]
.pull-right[.large[
- Sort categorical variables by frequency.

- Report row percent, rather than column percent.

- Report q-values from various methods like false discovery rate.

- Sort data by ascending p-values when comparisons have been made.
]]

???

There is more that we are not covering here

---
# {gtsummary} summarize models with tbl_regression()

### Raw Output

```r
*m1 <- glm(response ~ trt + grade + age, data = trial, family = binomial)
m1
```

```
## 
## Call:  glm(formula = response ~ trt + grade + age, family = binomial, 
##     data = trial)
## 
## Coefficients:
## (Intercept)   trtPlacebo      gradeII     gradeIII          age  
##    0.449477    -0.660514    -0.504973    -0.167732    -0.004199  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  177 Residual
##   (18 observations deleted due to missingness)
## Null Deviance:	    249.1 
## Residual Deviance: 242.8 	AIC: 252.8
```

???

- it's not pretty

- most often I want the odds ratios from a logistic regression, not the betas

- format from every type of model is different and difficult to work with
---
# {gtsummary} summarize models with tbl_regression()

### {broom} Output

```r
*broom::tidy(m1, conf.int = TRUE, exponentiate = TRUE)
```

```
## # A tibble: 5 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.57 0.565 0.796 0.426 0.520 4.81 
## 2 trtPlacebo 0.517 0.309 -2.14 0.0324 0.280 0.941
## 3 gradeII 0.604 0.390 -1.30 0.195 0.279 1.29 
## 4 gradeIII 0.846 0.369 -0.455 0.649 0.409 1.74 
## 5 age 0.996 0.0106 -0.395 0.693 0.975 1.02
```

???

- MUCH MUCH better!

- all models returned with consistent format

- but does not include reference groups

- still needs additional modification before it can be presented
---
# {gtsummary} summarize models with tbl_regression()

### {gtsummary} Output

```r
*tbl_regression_1 <- tbl_regression(m1, exponentiate = TRUE)
```

.pull-left[.large[
- `tbl_regression()` accepts regression model object as inputs.

- Reference groups added to the table.

- Logistic regression model with odds ratio header and footnote.
]]

.pull-right[
<img src="images/tbl_regression_1.png" width=90%>
]
???
- This table is ready for publication in a single line of code!

- That is something no other package I know of can do

- The back end for the function is {broom} and {gt}, meaning that there is broad support for most regression model types, and the resulting tables are gorgeous and customizable.

- Common regression models, such as logistic regression and Cox regression, are automatically identified and the tables are created with appropriate headers.

- build the regression model on your own....we are not in the business of model estimation or checking

---
# {gtsummary} summarize models with tbl_regression()

.pull-left[

```r
tbl_regression_2 <- m1 %>%
 tbl_regression(exponentiate = TRUE) %>%
* add_global_p()
```

.large[
- Replace individual p-values for categorical variables with global p-value for the entire variable.
]
]
.pull-right[
<img src="images/tbl_regression_2.png" width=100%>
]

???

Thank you to Stefania for suggesting an improved name change for `add_global_p()`!

---
# {gtsummary} summarize models with tbl_regression()

.pull-left[

```r
library(survival)
tbl_regression_3 <- 
 coxph(Surv(ttdeath, death) ~ trt + grade + age, 
 data = trial) %>%
 tbl_regression(exponentiate = TRUE)
tbl_regression_4 <-
* tbl_merge(
* tbls = list(tbl_regression_1, tbl_regression_3),
* tab_spanner = c("Tumor Response", "Time to Death")
* )
```

.large[
- Build Cox regression model with same predictors as previous model.

- Merge the two regression models with the same predictors and present results side-by-side.
]
]
.pull-right[
<img src="images/tbl_regression_4.png" width=100%>
]

???

- side-by-side regression results is common in cancer research (e.g. time to recurrence, then time to death)

- stacking two or more models is also possible

- easy to create custom tables that are formatted beautifully
---
# {gtsummary} summarize data with tbl_uvregression()

.pull-left[

```r
library(survival)
tbl_uvregression_1 <- 
* tbl_uvregression(
* sm_trial,
* method = glm,
* y = response,
* method.args = list(family = binomial),
* exponentiate = TRUE
* )
```

.large[
- Table of univariate regression models.

- Specify the outcome, and the remaining variables in data frame serve as predictors.
]
]
.pull-right[
<img src="images/tbl_uvregression_1.png" width=100%>
]

???

- Tables of univariate results can be good for exploratory analysis

- Code is similar to {ggplot2} `geom_smooth()` and `stat_smooth()`

- also great with time-to-event endpoint when you cannot do a `tbl_summary()` to get bivariate p-values

---
# {gtsummary} summarize data with tbl_survival()
.pull-left[

```r
*fit1 <- survfit(Surv(ttdeath, death) ~ trt,
* data = trial)

survminer::ggsurvplot(
  fit = fit1, 
  xlab = "Months",
  ylab = "Overall survival probability",
  legend.title = "Treatment Group",
  legend.labs = c("Drug", "Placebo"),
  break.x.by = 6, 
  censor = FALSE,
  risk.table = TRUE,
  risk.table.y.text = FALSE
)
```
]

.pull-right[
![](index_files/figure-html/unnamed-chunk-44-1.png)
]

???

- You've probably seen something like this before

- It's a Kaplan-Meier curve.  It shows the probability of being free from an event (e.g. cancer recurrence after treatment)

- we can use {gtsummary} to grab estimates from curves like this
---
# {gtsummary} summarize data with tbl_survival()

.pull-left[

```r
tbl_survival_1 <- fit1 %>%
* tbl_survival(times = c(12, 24),
* label = "{time} Month")
```

.large[
- First, use `survfit()` to estimate survival times.

- Create table of estimates with `tbl_survival()`.

- Can use this function to print survival quantiles as well, e.g. median survival.
]
]
.pull-right[
<img src="images/tbl_survival_1.png" width=85%>
]

---
# {gtsummary} reporting results with inline_text()
.large[
- Tables are important, but we often need to report results in-line in a report.

- Any statistic reported in a {gtsummary} table can be extracted and reported in-line in a R Markdown document with the `inline_text()` function.

```r
inline_text(tbl_regression_1, variable = "trt", level = "Placebo")
```

```
0.52 (95% CI 0.28, 0.94; p=0.032)
```

- The pattern of what is reported can be modified with the `pattern = ` argument.

- Default is `pattern = "{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})"`.
]

???

- discuss importance of reproducible results

- data is constantly updating

- this functionality assures you won't miss updating a reported estimate in a document

- for me, this is one the most powerful parts of the {gtsummary} package

- something I've never seen in another package
---
class: center
# {gtsummary}
.large[
• Every function is documented further in the help file •

• Check out the package website for vignettes including detailed examples and explanations •

<img src = "images/open-book-white.png" width="2.4%" height="2.4%"> {gtsummary} documentation <a href="http://www.danieldsjoberg.com/gtsummary/">danieldsjoberg.com/gtsummary/</a>

<img src = "images/github_icon.png" width="2.4%" height="2.4%"> {gtsummary} package <a href="https://github.com/ddsjoberg/gtsummary">github.com/ddsjoberg/gtsummary</a>

<img src = "images/slide_show_icon.png" width="2.4%" height="2.4%"> slides at <a href="http://www.danieldsjoberg.com/gt-and-gtsummary-presentation">danieldsjoberg.com/gt-and-gtsummary-presentation</a>

<img src = "images/github_icon.png" width="2.4%" height="2.4%"> source code for slides at <a href="https://github.com/ddsjoberg/gt-and-gtsummary-presentation">github.com/ddsjoberg/gt-and-gtsummary-presentation</a>

<img src = "images/github_icon.png" width="2.4%" height="2.4%"> {gt} package <a href="https://github.com/rstudio/gt">github.com/rstudio/gt</a>
]

???

Go star {gtsummary} on GitHub...we're already to 50+!

---
# {gtsummary} Advanced

.large[
{gtsummary} output is a list that prints as a {gt} table.
]

```r
names(tbl_summary_1)
```

```
## [1] "gt_calls"   "table_body" "meta_data"  "inputs"     "call_list" 
## [6] "by"         "df_by"
```

.pull-left[

```r
pluck(tbl_summary_1, "table_body") %>% head()
```

```
## # A tibble: 6 x 5
## variable row_type label stat_1 stat_2 
## <chr> <chr> <chr> <chr> <chr> 
## 1 age label Age, yrs 47 (39, 58) 45 (36, 54)
## 2 age missing Unknown 6 3 
## 3 response label Tumor Response 53 (51%) 30 (34%) 
## 4 response missing Unknown 4 5 
## 5 grade label Grade <NA> <NA> 
## 6 grade level I 38 (36%) 29 (31%)
```
]
.pull-right[

```r
pluck(tbl_summary_1, "gt_calls") %>% head(n = 4)
```

```
## $gt
## gt(data = x$table_body)
## 
## $cols_label_label
## cols_label(label = md('**Characteristic**'))
## 
## $cols_align
## cols_align(align = 'center') %>% cols_align(align = 'left', columns = vars(label))
## 
## $cols_hide
## cols_hide(columns = vars(variable, row_type))
```
]

???

If there is time, review the structure of a {gtsummary} object

Essentially, what is going on is that the {gt} calls on the right are called on the table on the left whenever the object is printed.

Understanding this structure will help you modify if you need.  If there is a {gt} call that formats in a way you don't like, convert your object with `as_gt()` and use the `omit =` argument to leave out the gt call you don't like.

You can replace it with whatever you choose.