Accuracy today when I derive a new variable?
Accuracy 1 year from now after much more data has been collected?
{affirm} package makes daily affirmation against our data
affirm raw data is as expected
affirm derived variables continue to be accurate as data is updated
There are plenty of ways to make checks against your data
Why do we need another tool?
REPORTING!
Using EDC data to derive new variables requires a different style of data validations.
When validating raw EDC data, we must report bad/inconsistent data to a data manager who will then investigate and correct the data in the source data base.
When validating derived variables based on raw EDC data, we make assumptions about the data. Validations can be used to ensure that whatever assumptions we made on the day we first derived a new variable are still met as the raw EDC data continues to be updated.
RAND |>
left_join(
DM |> prepend_df_name() |> select(SUBJECT, DM.AGE) ,
by = "SUBJECT"
) |>
affirm_true(
label = "RAND: Randomization strata match recorded subject age",
condition =
(RAND_STRATA %in% "<65yr" & DM.AGE < 65) | (RAND_STRATA %in% ">=65yr" & DM.AGE >= 65)
) |>
invisible()
#> • RAND: Randomization strata match recorded subject age
#> 1 issue identified.
Other affirmation functions currently available
affirm_false()
affirm_class()
affirm_values()
affirm_na()
affirm_not_na()
affirm_no_dupes()
affirm_range()
https://pcctc.github.io/affirm/articles/getting-started.html#report