{affirm} R Package Overview

Daniel D. Sjoberg

How to ensure the accuracy of derivations?

Accuracy today when I derive a new variable?
Accuracy 1 year from now after much more data has been collected?

What is {affirm}?

{affirm}

{affirm} package makes daily affirmation against our data
affirm raw data is as expected
affirm derived variables continue to be accurate as data is updated

Why {affirm}?

There are plenty of ways to make checks against your data
- testthat
- checkmate
- assertthat
Why do we need another tool?

REPORTING!

How {affirm} works

Initialize a new affirmation session

options('affirm.id_cols' = "SUBJECT")

affirm_init(replace = TRUE)
#> ✔ We're ready to make data affirmations...

Using EDC data to derive new variables requires a different style of data validations.
When validating raw EDC data, we must report bad/inconsistent data to a data manager who will then investigate and correct the data in the source data base.
When validating derived variables based on raw EDC data, we make assumptions about the data. Validations can be used to ensure that whatever assumptions we made on the day we first derived a new variable are still met as the raw EDC data continues to be updated.

How {affirm} works

Make an affirmation

affirm_true(
  RAND,
  label = "RAND: Subject ID is not missing",
  condition = !is.na(SUBJECT)
) |> 
  invisible()
#> • RAND: Subject ID is not missing
#>   0 issues identified.

Every newly derived variable should be associated with multiple affirmations to ensure the derivation remains correct into the future.

How {affirm} works

Merge in data from the DM data set, and check whether the reported subject age aligns with the age group in the randomization stratification variable

RAND |>
  left_join(
    DM |> prepend_df_name() |> select(SUBJECT, DM.AGE) , 
    by = "SUBJECT"
  ) |> 
  affirm_true(
    label = "RAND: Randomization strata match recorded subject age",
    condition =
      (RAND_STRATA %in% "<65yr" & DM.AGE < 65) | (RAND_STRATA %in% ">=65yr" & DM.AGE >= 65)
  ) |> 
  invisible()
#> • RAND: Randomization strata match recorded subject age
#>   1 issue identified.

How {affirm} works

Other affirmation functions currently available
- affirm_false()
- affirm_class()
- affirm_values()
- affirm_na()
- affirm_not_na()
- affirm_no_dupes()
- affirm_range()

How {affirm} works

What does the report look like?

https://pcctc.github.io/affirm/articles/getting-started.html#report

{affirm}

https://pcctc.github.io/affirm

https://github.com/pcctc/affirm