Clinical Reporting with gtsummary: Solutions

Exercise 1

The Data

Using the National Health and Nutrition Examination Survey Data (nhefs), we will assess whether quitting smoking leads to a decrease in the risk of death. We’ll also investigate whether potential confounding factors–age, sex, blood pressure, diabetes, and exercise.

To begin, let’s pare down the data frame and assign column labels.

library(gtsummary)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
theme_gtsummary_compact()
Setting theme `Compact`
df_nhefs <-
  causaldata::nhefs %>%
  select(death, qsmk, age, sex, sbp, dbp, exercise) %>%
  drop_na() %>%
  mutate(
    qsmk =
      factor(
        qsmk,
        levels = 0:1,
        labels = c("Did not Quit", "Quit")
      ),
    sex = 
      case_when(
        sex == 0 ~ "Male",
        sex == 1 ~ "Female"
      ),
    exercise = 
      factor(
        exercise,
        levels = 0:2,
        labels = c("Much exercise", "Moderate exercise", "Little or no exercise")
      ) %>%
      fct_rev()
  ) %>%
  labelled::set_variable_labels(
    death = "Participant Passed Away",
    qsmk = "Quit Smoking",
    age = "Age",
    sex = "Sex",
    sbp = "Systolic BP",
    dbp = "Diastolic BP",
    exercise = "Exercise Level"
  )

skimr::skim(df_nhefs)
Data summary
Name df_nhefs
Number of rows 1548
Number of columns 7
_______________________
Column type frequency:
character 1
factor 2
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
sex 0 1 4 6 0 2 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
qsmk 0 1 FALSE 2 Did: 1157, Qui: 391
exercise 0 1 FALSE 3 Mod: 651, Lit: 597, Muc: 300

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
death 0 1 0.18 0.39 0 0 0 0 1 ▇▁▁▁▂
age 0 1 43.62 11.99 25 33 43 53 74 ▇▆▇▅▂
sbp 0 1 128.70 19.00 87 116 126 140 229 ▃▇▂▁▁
dbp 0 1 77.74 10.63 47 70 77 85 130 ▁▇▅▁▁

Exercise 2

Prepare a table describing the cohort by whether or not the participant quit smoking. Do not include death in the summary table. Consider using the gtsummary functions that build on a summary table.

gts_patient_characteristics <-
  df_nhefs %>%
  tbl_summary(
    by = qsmk,
    include = -death
  ) %>%
  add_p() %>%
  add_stat_label() %>%
  modify_spanning_header(
    all_stat_cols() ~ "**Smoking Status**"
  )
gts_patient_characteristics
Characteristic Smoking Status p-value1
Did not Quit, N = 1,157 Quit, N = 391
Age, Median (IQR) 42 (33, 51) 46 (35, 56) <0.001
Sex, n (%) 0.005
Female 619 (54%) 177 (45%)
Male 538 (46%) 214 (55%)
Systolic BP, Median (IQR) 125 (115, 138) 130 (118, 142) <0.001
Diastolic BP, Median (IQR) 77 (70, 84) 78 (71, 86) 0.016
Exercise Level, n (%) 0.2
Little or no exercise 438 (38%) 159 (41%)
Moderate exercise 482 (42%) 169 (43%)
Much exercise 237 (20%) 63 (16%)
1 Wilcoxon rank sum test; Pearson's Chi-squared test

Exercise 3

Is there a difference in death rates by smoking status?

Prepare an unadjusted and adjusted rate difference in the table.

# Visit https://www.danieldsjoberg.com/gtsummary/reference/tests.html for a listing of included tests.
# Tests that make use of the `add_difference(adj.vars=)` argument are adjusted analyses.

gts_death_unadjusted <-
  df_nhefs %>%
  tbl_summary(
    by = qsmk, 
    include = death,
    statistic = all_categorical() ~ "{p}%"
  ) %>%
  add_difference() %>%
  add_stat_label() 
gts_death_unadjusted
Characteristic Did not Quit, N = 1,157 Quit, N = 391 Difference1 95% CI1,2 p-value1
Participant Passed Away, % 17% 22% -4.9% -9.7%, -0.07% 0.037
1 Two sample test for equality of proportions
2 CI = Confidence Interval
gts_death_adjusted <-
  df_nhefs %>%
  tbl_summary(
    by = qsmk, 
    include = death,
    statistic = all_categorical() ~ "{p}%"
  ) %>%
  add_difference(
    test = all_categorical() ~ "emmeans",
    adj.vars = c(age, sex, sbp, dbp, exercise)
  ) %>%
  add_stat_label() 
gts_death_adjusted
Characteristic Did not Quit, N = 1,157 Quit, N = 391 Adjusted Difference1 95% CI1,2 p-value1
Participant Passed Away, % 17% 22% 1.0% -2.2%, 4.2% 0.5
1 Regression least-squares adjusted mean difference
2 CI = Confidence Interval

Exercise 4

Build a logistic regression model with death as the outcome. Include smoking and the other variables as covariates. Summarize your regression model in a table, reporting the odds ratios, confidence intervals, and p-values.

mod <- glm(death ~ qsmk + age + sex + sbp + dbp + exercise, 
           data = df_nhefs, family = binomial)

gts_mod <-
  tbl_regression(mod, exponentiate = TRUE) %>%
  add_global_p() %>%
  bold_p() %>%
  italicize_levels() %>%
  modify_caption("**Logistic regression model predicting death**") %>%
  modify_header(label = "**Variable** (N = {N})")
gts_mod
Logistic regression model predicting death
Variable (N = 1548) OR1 95% CI1 p-value
Quit Smoking 0.5
Did not Quit
Quit 0.90 0.64, 1.26
Age 1.11 1.10, 1.13 <0.001
Sex <0.001
Female
Male 1.78 1.31, 2.43
Systolic BP 1.02 1.01, 1.03 <0.001
Diastolic BP 1.00 0.98, 1.01 0.6
Exercise Level 0.3
Little or no exercise
Moderate exercise 0.78 0.56, 1.08
Much exercise 0.81 0.52, 1.25
1 OR = Odds Ratio, CI = Confidence Interval

Exercise 5

Write a brief summary of the results above using inline_text() to report values from the tables directly into the markdown report.

median_age_quit <-
  inline_text(gts_patient_characteristics, 
              variable = age, 
              column = "Quit", 
              pattern = "{median}")
median_age_no_quit <-
  inline_text(gts_patient_characteristics, 
              variable = age, 
              column = "Did not Quit", 
              pattern = "{median}")
pvalue_age <- 
  inline_text(gts_patient_characteristics, 
              variable = age,
              column = p.value)
unadj_difference <-
  inline_text(gts_death_unadjusted,
              variable = death,
              pattern = "difference {estimate}; 95% CI {ci}; {p.value}")

adj_qsmk_or <-
  inline_text(gts_mod,
              variable = qsmk,
              level = "Quit",
              pattern = "odds ratio {estimate}; 95% CI {ci}")

The analysis assessing the relationship between quitting smoking and subsequent death within the next 20 years included 1548 participants. The median age among those who quit was higher compared to those who did not (46 vs 42; p<0.001).

On univariate analysis, participants who did not quit smoking had higher rates of death (difference -4.9%; 95% CI -9.7%, -0.07%; p=0.037). However, on multivariable analysis, the relationship was not longer significant (odds ratio 0.90; 95% CI 0.64, 1.26).