Correct multiple choice variables — fix_multiple_choice

In some EDC systems, if there is a multiple choice variable in which multiple answers are possible, the variable will be renamed with a suffix with the multiple answers in it. For example var1, var2, for answers 1 and 2. This function cleans this specific output so that the variable name remains consistent.

Usage

fix_multiple_choice_vars(
  data,
  expected_vars = metadata$items_expanded$var,
  var_column = "var",
  value_column = "item_value",
  suffix = "[[:digit:]]+$",
  key_cols = c("subject_id", "event_repeat", "event_date", "form_repeat"),
  collapse_with = "; "
)

Arguments

data: A data frame.
expected_vars: Character vector containing the expected names of the variables.
var_column: column name in which the variable names are stored
value_column: column name in which the values of the variables are stored
suffix: Multiple choice suffix. Used to define multiple choice values
key_cols: Character vector with key columns used for identifying unique rows in the data set.
collapse_with: character value to collapse the multiple choice options with. If this value is NULL, the rows will be left as is.

Value

data frame with corrected multiple choice variables

Examples

if (FALSE) { # \dontrun{
 df <- data.frame(
  ID = "Subj1",
  var = c("Age", paste0("MH_TRT", 1:4)),
  item_value = as.character(c(95, 67, 58, 83, 34))
 )
 fix_multiple_choice_vars(df, key_cols = "ID")
} # }