Skip to contents

In some EDC systems, if there is a multiple choice variable in which multiple answers are possible, the variable will be renamed with a suffix with the multiple answers in it. For example var1, var2, for answers 1 and 2. This function cleans this specific output so that the variable name remains consistent.

Usage

fix_multiple_choice_vars(
  data,
  expected_vars = metadata$items_expanded$var,
  var_column = "var",
  value_column = "item_value",
  suffix = "[[:digit:]]+$",
  common_vars = c("subject_id", "event_repeat", "event_date", "form_repeat"),
  collapse_with = "; "
)

Arguments

data

A data frame.

expected_vars

Character vector containing the expected names of the variables.

var_column

column name in which the variable names are stored

value_column

column name in which the values of the variables are stored

suffix

Multiple choice suffix. Used to define multiple choice values

common_vars

variables used for identifying unique rows in the dataset.

collapse_with

character value to collapse the multiple choice options with. If this value is NULL, the rows will be left as is.

Value

data frame with corrected multiple choice variables

Examples

 df <- data.frame(
  ID = "Subj1",
  var = c("Age", paste0("MH_TRT", 1:4)),
  item_value = as.character(c(95, 67, 58, 83, 34))
 )
 fix_multiple_choice_vars(df, common_vars = "ID")
#> multiple choice vars that will be adjusted: 
#> MH_TRT
#>      ID    var     item_value
#> 1 Subj1    Age             95
#> 2 Subj1 MH_TRT 67; 58; 83; 34