simulate study test data — sim_test_data

evenly distributes a number of given patients across a number of given sites. Then simulates ae development of each patient reducing the number of reported AEs for patients distributed to AE-under-reporting sites.

Usage

sim_test_data_study(
  n_pat = 1000,
  n_sites = 20,
  frac_site_with_ur = 0,
  ur_rate = 0,
  max_visit_mean = 20,
  max_visit_sd = 4,
  ae_per_visit_mean = c(0.5),
  ae_rates = c(NULL),
  event_names = list("ae")
)

Arguments

n_pat: integer, number of patients, Default: 1000
n_sites: integer, number of sites, Default: 20
frac_site_with_ur: fraction of AE under-reporting sites, Default: 0
ur_rate: AE under-reporting rate, will lower mean ae per visit used to simulate patients at sites flagged as AE-under-reporting. Negative Values will simulate over-reporting., Default: 0
max_visit_mean: mean of the maximum number of visits of each patient, Default: 20
max_visit_sd: standard deviation of maximum number of visits of each patient, Default: 4
ae_per_visit_mean: mean event per visit per patient, Default: 0.5
ae_rates: vector with visit-specific event rates, Default: Null
event_names: vector, contains the event names, default = "ae"

Value

tibble with columns site_number, patnum, is_ur, max_visit_mean, max_visit_sd, ae_per_visit_mean, visit, n_ae

Details

maximum visit number will be sampled from normal distribution with characteristics derived from max_visit_mean and max_visit_sd, while the ae per visit will be sampled from a poisson distribution described by ae_per_visit_mean.

Examples

set.seed(1)
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5)
df_visit[which(df_visit$patnum == "P000001"),]
#> # A tibble: 17 × 8
#>    patnum  site_number is_ur max_visit_mean max_visit_sd ae_per_visit_mean visit
#>    <chr>   <chr>       <lgl>          <dbl>        <dbl>             <dbl> <int>
#>  1 P000001 S0001       FALSE             20            4               0.5     1
#>  2 P000001 S0001       FALSE             20            4               0.5     2
#>  3 P000001 S0001       FALSE             20            4               0.5     3
#>  4 P000001 S0001       FALSE             20            4               0.5     4
#>  5 P000001 S0001       FALSE             20            4               0.5     5
#>  6 P000001 S0001       FALSE             20            4               0.5     6
#>  7 P000001 S0001       FALSE             20            4               0.5     7
#>  8 P000001 S0001       FALSE             20            4               0.5     8
#>  9 P000001 S0001       FALSE             20            4               0.5     9
#> 10 P000001 S0001       FALSE             20            4               0.5    10
#> 11 P000001 S0001       FALSE             20            4               0.5    11
#> 12 P000001 S0001       FALSE             20            4               0.5    12
#> 13 P000001 S0001       FALSE             20            4               0.5    13
#> 14 P000001 S0001       FALSE             20            4               0.5    14
#> 15 P000001 S0001       FALSE             20            4               0.5    15
#> 16 P000001 S0001       FALSE             20            4               0.5    16
#> 17 P000001 S0001       FALSE             20            4               0.5    17
#> # ℹ 1 more variable: n_ae <int>
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5,
    frac_site_with_ur = 0.2, ur_rate = 0.5)
df_visit[which(df_visit$patnum == "P000001"),]
#> # A tibble: 23 × 8
#>    patnum  site_number is_ur max_visit_mean max_visit_sd ae_per_visit_mean visit
#>    <chr>   <chr>       <lgl>          <dbl>        <dbl>             <dbl> <int>
#>  1 P000001 S0001       TRUE              20            4              0.25     1
#>  2 P000001 S0001       TRUE              20            4              0.25     2
#>  3 P000001 S0001       TRUE              20            4              0.25     3
#>  4 P000001 S0001       TRUE              20            4              0.25     4
#>  5 P000001 S0001       TRUE              20            4              0.25     5
#>  6 P000001 S0001       TRUE              20            4              0.25     6
#>  7 P000001 S0001       TRUE              20            4              0.25     7
#>  8 P000001 S0001       TRUE              20            4              0.25     8
#>  9 P000001 S0001       TRUE              20            4              0.25     9
#> 10 P000001 S0001       TRUE              20            4              0.25    10
#> # ℹ 13 more rows
#> # ℹ 1 more variable: n_ae <int>
ae_rates <- c(0.7, rep(0.5, 8), rep(0.3, 5))
sim_test_data_study(n_pat = 100, n_sites = 5, ae_rates = ae_rates)
#> # A tibble: 1,968 × 8
#>    patnum  site_number is_ur max_visit_mean max_visit_sd ae_per_visit_mean visit
#>    <chr>   <chr>       <lgl>          <dbl>        <dbl>             <dbl> <int>
#>  1 P000001 S0001       FALSE             20            4             0.443     1
#>  2 P000001 S0001       FALSE             20            4             0.443     2
#>  3 P000001 S0001       FALSE             20            4             0.443     3
#>  4 P000001 S0001       FALSE             20            4             0.443     4
#>  5 P000001 S0001       FALSE             20            4             0.443     5
#>  6 P000001 S0001       FALSE             20            4             0.443     6
#>  7 P000001 S0001       FALSE             20            4             0.443     7
#>  8 P000001 S0001       FALSE             20            4             0.443     8
#>  9 P000001 S0001       FALSE             20            4             0.443     9
#> 10 P000001 S0001       FALSE             20            4             0.443    10
#> # ℹ 1,958 more rows
#> # ℹ 1 more variable: n_ae <dbl>