Skip to contents

Simulate and append non-time-varying categorical covariates to an existing brm_data() dataset.

Usage

brm_simulate_categorical(data, names, levels, probabilities = NULL)

Arguments

data

Classed tibble as from brm_data() or brm_simulate_outline().

names

Character vector with the names of the new covariates to simulate and append. Names must all be unique and must not already be column names of data.

levels

Character vector of unique levels of the simulated categorical covariates.

probabilities

Either NULL or a numeric vector of length length(levels) with levels between 0 and 1 where all elements sum to 1. If NULL, then all levels are equally likely to be drawn. If not NULL, then probabilities is a vector of sampling probabilities corresponding to each respective level of levels.

Value

A classed tibble, like from brm_data() or brm_simulate_outline(), but with new categorical covariate columns and with the names of the new covariates appended to the brm_covariates attribute. Each new categorical covariate column is a character vector, not the factor type in base R.

Details

Each covariate is a new column of the dataset with one independent random categorical draw for each patient, using a fixed set of levels (via base::sample() with replace = TRUE). All covariates simulated this way are independent of everything else in the data, including other covariates (to the extent that the random number generators in R work as intended).

Examples

data <- brm_simulate_outline()
brm_simulate_categorical(
  data = data,
  names = c("site", "region"),
  levels = c("area1", "area2")
)
#> # A tibble: 800 × 7
#>    patient     time   group   missing response site  region
#>    <chr>       <chr>  <chr>   <lgl>      <dbl> <chr> <chr> 
#>  1 patient_001 time_1 group_1 FALSE         NA area2 area2 
#>  2 patient_001 time_2 group_1 FALSE         NA area2 area2 
#>  3 patient_001 time_3 group_1 FALSE         NA area2 area2 
#>  4 patient_001 time_4 group_1 FALSE         NA area2 area2 
#>  5 patient_002 time_1 group_1 FALSE         NA area1 area2 
#>  6 patient_002 time_2 group_1 FALSE         NA area1 area2 
#>  7 patient_002 time_3 group_1 TRUE          NA area1 area2 
#>  8 patient_002 time_4 group_1 TRUE          NA area1 area2 
#>  9 patient_003 time_1 group_1 FALSE         NA area2 area2 
#> 10 patient_003 time_2 group_1 FALSE         NA area2 area2 
#> # ℹ 790 more rows
brm_simulate_categorical(
  data = data,
  names = c("site", "region"),
  levels = c("area1", "area2"),
  probabilities = c(0.1, 0.9)
)
#> # A tibble: 800 × 7
#>    patient     time   group   missing response site  region
#>    <chr>       <chr>  <chr>   <lgl>      <dbl> <chr> <chr> 
#>  1 patient_001 time_1 group_1 FALSE         NA area2 area2 
#>  2 patient_001 time_2 group_1 FALSE         NA area2 area2 
#>  3 patient_001 time_3 group_1 FALSE         NA area2 area2 
#>  4 patient_001 time_4 group_1 FALSE         NA area2 area2 
#>  5 patient_002 time_1 group_1 FALSE         NA area2 area2 
#>  6 patient_002 time_2 group_1 FALSE         NA area2 area2 
#>  7 patient_002 time_3 group_1 TRUE          NA area2 area2 
#>  8 patient_002 time_4 group_1 TRUE          NA area2 area2 
#>  9 patient_003 time_1 group_1 FALSE         NA area2 area2 
#> 10 patient_003 time_2 group_1 FALSE         NA area2 area2 
#> # ℹ 790 more rows