In some EDC systems, if there is a multiple choice variable in which multiple answers are possible, the variable will be renamed with a suffix with the multiple answers in it. For example var1, var2, for answers 1 and 2. This function cleans this specific output so that the variable name remains consistent.
Usage
fix_multiple_choice_vars(
data,
expected_vars = metadata$items_expanded$var,
var_column = "var",
value_column = "item_value",
suffix = "[[:digit:]]+$",
common_vars = c("subject_id", "event_repeat", "event_date", "form_repeat"),
collapse_with = "; "
)
Arguments
- data
A data frame.
- expected_vars
Character vector containing the expected names of the variables.
- var_column
column name in which the variable names are stored
- value_column
column name in which the values of the variables are stored
- suffix
Multiple choice suffix. Used to define multiple choice values
- common_vars
variables used for identifying unique rows in the dataset.
- collapse_with
character value to collapse the multiple choice options with. If this value is NULL, the rows will be left as is.
Examples
df <- data.frame(
ID = "Subj1",
var = c("Age", paste0("MH_TRT", 1:4)),
item_value = as.character(c(95, 67, 58, 83, 34))
)
fix_multiple_choice_vars(df, common_vars = "ID")
#> multiple choice vars that will be adjusted:
#> MH_TRT
#> ID var item_value
#> 1 Subj1 Age 95
#> 2 Subj1 MH_TRT 67; 58; 83; 34