{affirm} R Package Overview

Daniel D. Sjoberg

How to ensure the accuracy of derivations?

  • Accuracy today when I derive a new variable?

  • Accuracy 1 year from now after much more data has been collected?

What is {affirm}?

{affirm}

  • {affirm} package makes daily affirmation against our data

  • affirm raw data is as expected

  • affirm derived variables continue to be accurate as data is updated

Why {affirm}?

  • There are plenty of ways to make checks against your data

    • testthat
    • checkmate
    • assertthat
  • Why do we need another tool?

REPORTING!

How {affirm} works

  • Initialize a new affirmation session
options('affirm.id_cols' = "SUBJECT")

affirm_init(replace = TRUE)
#> ✔ We're ready to make data affirmations...
  • Using EDC data to derive new variables requires a different style of data validations.

  • When validating raw EDC data, we must report bad/inconsistent data to a data manager who will then investigate and correct the data in the source data base.

  • When validating derived variables based on raw EDC data, we make assumptions about the data. Validations can be used to ensure that whatever assumptions we made on the day we first derived a new variable are still met as the raw EDC data continues to be updated.

How {affirm} works

  • Make an affirmation
affirm_true(
  RAND,
  label = "RAND: Subject ID is not missing",
  condition = !is.na(SUBJECT)
) |> 
  invisible()
#> • RAND: Subject ID is not missing
#>   0 issues identified.
  • Every newly derived variable should be associated with multiple affirmations to ensure the derivation remains correct into the future.

How {affirm} works

  • Merge in data from the DM data set, and check whether the reported subject age aligns with the age group in the randomization stratification variable
RAND |>
  left_join(
    DM |> prepend_df_name() |> select(SUBJECT, DM.AGE) , 
    by = "SUBJECT"
  ) |> 
  affirm_true(
    label = "RAND: Randomization strata match recorded subject age",
    condition =
      (RAND_STRATA %in% "<65yr" & DM.AGE < 65) | (RAND_STRATA %in% ">=65yr" & DM.AGE >= 65)
  ) |> 
  invisible()
#> • RAND: Randomization strata match recorded subject age
#>   1 issue identified.

How {affirm} works

  • Other affirmation functions currently available

    • affirm_false()

    • affirm_class()

    • affirm_values()

    • affirm_na()

    • affirm_not_na()

    • affirm_no_dupes()

    • affirm_range()

How {affirm} works

  • What does the report look like?

https://pcctc.github.io/affirm/articles/getting-started.html#report

{affirm}