Use with simulated portfolio data to generate under-reporting stats for specified scenarios.
dataframe as returned by sim_test_data_portfolio
numeric, set maximum number of additional under-reporting sites, see details Default: 3
numeric vector, set under-reporting rates for scenarios Default: c(0.25, 0.5)
integer, denotes number of simulations, default = 1000
logical, calculates poisson.test pvalue
logical, calculates probability for getting a lower value
logical, use parallel processing see details, Default: FALSE
logical, show progress bar, Default: TRUE
named list of parameters passed to
site_aggr
, Default: list()
named list of parameters passed to
eval_sites
, Default: list()
dataframe with the following columns:
study identification
site identification
number of patients at site
number of patients at site with visit_med75
median(max(visit)) * 0.75
mean AE at visit_med75 site level
mean AE at visit_med75 study level
number of patients at site with visit_med75 at study excl site
additional sites with under-reporting patients
ratio of patients in study that are under-reporting
under-reporting rate
p-value as
returned by poisson.test
bootstrapped probability for having mean_ae_site_med75 or lower
adjusted p-values
adjusted bootstrapped probability for having mean_ae_site_med75 or lower
probability under-reporting as 1 - pval_adj, poisson.test (use as benchmark)
probability under-reporting as 1 - prob_low_adj, bootstrapped (use)
The function will apply under-reporting scenarios to each site. Reducing the number of AEs by a given under-reporting (ur_rate) for all patients at the site and add the corresponding under-reporting statistics. Since the under-reporting probability is also affected by the number of other sites that are under-reporting we additionally calculate under-reporting statistics in a scenario where additional under reporting sites are present. For this we use the median number of patients per site at the study to calculate the final number of patients for which we lower the AEs in a given under-reporting scenario. We use the furrr package to implement parallel processing as these simulations can take a long time to run. For this to work we need to specify the plan for how the code should run, e.g. plan(multisession, workers = 18)
sim_test_data_study
get_config
sim_test_data_portfolio
sim_ur_scenarios
get_portf_perf
# \donttest{
df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10,
frac_site_with_ur = 0.4, ur_rate = 0.6)
df_visit1$study_id <- "A"
df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10,
frac_site_with_ur = 0.2, ur_rate = 0.1)
df_visit2$study_id <- "B"
df_visit <- dplyr::bind_rows(df_visit1, df_visit2)
df_site_max <- df_visit %>%
dplyr::group_by(study_id, site_number, patnum) %>%
dplyr::summarise(max_visit = max(visit),
max_ae = max(n_ae),
.groups = "drop")
df_config <- get_config(df_site_max)
df_config
#> # A tibble: 20 × 6
#> study_id ae_per_visit_mean site_number max_visit_sd max_visit_mean n_pat
#> <chr> <dbl> <chr> <dbl> <dbl> <int>
#> 1 0001 0.416 0001 3.39 18.2 10
#> 2 0001 0.416 0002 2.72 19.6 10
#> 3 0001 0.416 0003 2.12 19.6 10
#> 4 0001 0.416 0004 3.92 18.4 10
#> 5 0001 0.416 0005 3.63 19.9 10
#> 6 0001 0.416 0006 3.83 20 10
#> 7 0001 0.416 0007 2.69 21.1 10
#> 8 0001 0.416 0008 4.24 19 10
#> 9 0001 0.416 0009 3.59 21.3 10
#> 10 0001 0.416 0010 2.95 19.7 10
#> 11 0002 0.468 0001 4.42 20 10
#> 12 0002 0.468 0002 4.03 21.4 10
#> 13 0002 0.468 0003 4.13 20.2 10
#> 14 0002 0.468 0004 2.58 18.3 10
#> 15 0002 0.468 0005 4.64 17.8 10
#> 16 0002 0.468 0006 2.37 17.6 10
#> 17 0002 0.468 0007 4.80 19.8 10
#> 18 0002 0.468 0008 2 20 10
#> 19 0002 0.468 0009 3.17 19.5 10
#> 20 0002 0.468 0010 6.57 19.9 10
df_portf <- sim_test_data_portfolio(df_config)
df_portf
#> # A tibble: 3,815 × 8
#> study_id ae_per_visit_mean site_number max_visit…¹ max_v…² patnum visit n_ae
#> <chr> <dbl> <chr> <dbl> <dbl> <chr> <int> <int>
#> 1 0001 0.416 0001 3.39 18.2 0001 1 1
#> 2 0001 0.416 0001 3.39 18.2 0001 2 1
#> 3 0001 0.416 0001 3.39 18.2 0001 3 1
#> 4 0001 0.416 0001 3.39 18.2 0001 4 1
#> 5 0001 0.416 0001 3.39 18.2 0001 5 2
#> 6 0001 0.416 0001 3.39 18.2 0001 6 3
#> 7 0001 0.416 0001 3.39 18.2 0001 7 4
#> 8 0001 0.416 0001 3.39 18.2 0001 8 4
#> 9 0001 0.416 0001 3.39 18.2 0001 9 4
#> 10 0001 0.416 0001 3.39 18.2 0001 10 5
#> # … with 3,805 more rows, and abbreviated variable names ¹max_visit_sd,
#> # ²max_visit_mean
df_scen <- sim_ur_scenarios(df_portf,
extra_ur_sites = 2,
ur_rate = c(0.5, 1))
#> aggregating site level
#> prepping for simulation
#> generating scenarios
#> getting under-reporting stats
#> evaluating stats
df_scen
#> # A tibble: 140 × 14
#> study…¹ site_…² n_pat n_pat…³ visit…⁴ mean_…⁵ mean_…⁶ n_pat…⁷ extra…⁸ frac_…⁹
#> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 0001 0001 10 10 12 5.9 4.74 88 0 0
#> 2 0001 0001 10 10 12 2.95 4.74 88 0 0.102
#> 3 0001 0001 10 10 12 0 4.74 88 0 0.102
#> 4 0001 0001 10 10 12 2.95 4.53 88 1 0.202
#> 5 0001 0001 10 10 12 0 4.33 88 1 0.202
#> 6 0001 0001 10 10 12 2.95 4.24 88 2 0.302
#> 7 0001 0001 10 10 12 0 3.75 88 2 0.302
#> 8 0001 0002 10 9 15 4.78 6.52 79 0 0
#> 9 0001 0002 10 9 15 2.39 6.52 79 0 0.102
#> 10 0001 0002 10 9 15 0 6.52 79 0 0.102
#> # … with 130 more rows, 4 more variables: ur_rate <dbl>, prob_low <dbl>,
#> # prob_low_adj <dbl>, prob_low_prob_ur <dbl>, and abbreviated variable names
#> # ¹study_id, ²site_number, ³n_pat_with_med75, ⁴visit_med75,
#> # ⁵mean_ae_site_med75, ⁶mean_ae_study_med75, ⁷n_pat_with_med75_study,
#> # ⁸extra_ur_sites, ⁹frac_pat_with_ur
df_perf <- get_portf_perf(df_scen)
df_perf
#> # A tibble: 27 × 5
#> fpr thresh extra_ur_sites ur_rate tpr
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.001 0.990 0 0 0.05
#> 2 0.001 0.990 1 0 0.05
#> 3 0.001 0.990 2 0 0.05
#> 4 0.001 0.990 0 0.5 1
#> 5 0.001 0.990 1 0.5 0.9
#> 6 0.001 0.990 2 0.5 0.85
#> 7 0.001 0.990 0 1 1
#> 8 0.001 0.990 1 1 1
#> 9 0.001 0.990 2 1 1
#> 10 0.01 0.961 0 0 0.05
#> # … with 17 more rows
# }