Introduction

Load

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(knitr))
suppressPackageStartupMessages(library(simaerep))

Simulate adverse event reporting in clinical trials with the goal of detecting under-reporting sites.

Monitoring of Adverse Event (AE) reporting in clinical trials is important for patient safety. We use bootstrap-based simulation to assign an AE under-reporting probability to each site in a clinical trial. The method is inspired by the ‘infer’ R package and Allen Downey’s blog article: “There is only one test!”.

Adverse Events

An adverse event (AE) is any untoward medical occurrence in a patient or clinical investigation subject administered a pharmaceutical product and which does not necessarily have a causal relationship with this treatment. It is important for patient safety that AEs are reported back to the sponsor. An important part of quality monitoring is to detect clinical trial sites that are not propagating all of the AEs reported to them by their patients to the sponsor. In a clinical trial patients follow a strict visiting schedule at which times treatments are given and exams are being performed. Typically AEs get reported by a patient when they are at an on-site visit at the clinic. So the total number of AEs reported by a site depends on the number of patients enrolled at the site and the total number of visits.

Algorithm

In a nutshell we will perform the following steps.

Record visit_med75, number of patients that have reached visit_med75, mean cumulative AE count at visit_med75 for a clinical trial site.
Create patient pool with all patients in study that have reached visit_med75 as determined in 1) and their cumulative AE count at visit_med75.
Draw with replacement as many patients from 2) as determined in 1) and calculate the mean cumulative AE count of the draw (figure 1b).
Repeat 3) 1000 times.
Calculate probability of obtaining mean cumulative AE count or lower as obtained in 1) based on results of 4).
Repeat 1-5) for all sites in trial.
Adjust probabilities using the Benjamin Hochberg Procedure in order to correct alpha error.

Sample Data

Patient

Patient level AE data is characterized by the number of consecutive visits and the number of AEs that have been reported each time. For the maximum consecutive visit we sample from a normal distribution and for the AEs reported at each visit we sample from a poisson distribution.

Here we simulate the AEs generated by 3 patients

set.seed(1)

replicate(
   3,
   sim_test_data_patient(
    .f_sample_max_visit = function() rnorm(1, mean = 20, sd = 4),
    .f_sample_ae_per_visit = function(max_visit) rpois(max_visit, 0.5)
  )
)

## [[1]]
##  [1]  0  1  1  2  4  5  6  6  6  6  7  7  8  8  9 12 12
## 
## [[2]]
##  [1] 0 1 1 1 1 1 1 2 2 2 2 2 2 3 4 5 5 6 6 7 8 9 9
## 
## [[3]]
##  [1] 0 0 1 2 2 3 3 3 3 3 3 3 4 4 6 6 6 6 7 7

Study

In order to simulate patient data for an entire study we assume make the simplification that all sites have the same number of patients. Further we specify a fraction of sites that is under-reporting AEs.

df_visit <- sim_test_data_study(
  n_pat = 120,
  n_sites = 6,
  frac_site_with_ur = 0.4,
  ur_rate = 0.6,
  max_visit_mean = 20,
  max_visit_sd = 4,
  ae_per_visit_mean = 0.5
)

df_visit$study_id <- "A"

df_visit %>%
  head(10) %>%
  knitr::kable()

patnum	site_number	is_ur	max_visit_mean	max_visit_sd	ae_per_visit_mean	visit	n_ae	study_id
P000001	S0001	TRUE	20	4	0.2	1	0	A
P000001	S0001	TRUE	20	4	0.2	2	1	A
P000001	S0001	TRUE	20	4	0.2	3	1	A
P000001	S0001	TRUE	20	4	0.2	4	2	A
P000001	S0001	TRUE	20	4	0.2	5	2	A
P000001	S0001	TRUE	20	4	0.2	6	2	A
P000001	S0001	TRUE	20	4	0.2	7	2	A
P000001	S0001	TRUE	20	4	0.2	8	3	A
P000001	S0001	TRUE	20	4	0.2	9	4	A
P000001	S0001	TRUE	20	4	0.2	10	4	A

df_visit %>%
  select(site_number, is_ur) %>%
  distinct() %>%
  knitr::kable()

site_number	is_ur
S0001	TRUE
S0002	TRUE
S0003	FALSE
S0004	FALSE
S0005	FALSE
S0006	FALSE

In our sample data 2 sites (S0001 and S0002) are under-reporting AEs

Algorithm Execution

S3 interface

We will describe the internal details of the algorithm next. However, it is recommended to use the S3 interface, which manages the execution of the internal functions and stores the intermediate results.

aerep <- simaerep(df_visit)
aerep

## simaerep object:
## Check aerep$df_eval prob_low_prob_ur column for under-reporting probabililty.
## Plot results using plot() generic.

Specifying the Evaluation Point visit_med75

In an ongoing trial all patients will have a different number of consecutive visits. To find a cut-off visit to normalize the data we specify a single evaluation point for a given site based on the number of visits of the patient population.

For determining the visit number at which we are going to evaluate AE reporting we take the maximum visit of each patient at a site and take the median. Then we multiply with 0.75 which will give us a cut-off point determining which patient will be evaluated. Of those patients we will evaluate we take the minimum of all maximum visits hence ensuring that we take the highest visit number possible without excluding more patients from the analysis. In order to ensure that the sampling pool for that visit is large enough we ensure that at least 20% of all patients at the study are available for sampling. We limit visit_med75 to the 80% quantile of all patient maximum visits of the entire study.

For this we use site_aggr to aggregate the simulated visit level data to site level. Adjusting min_pat_pool will change the minimum ratio of patients available for sampling that determines the maximum values for visit_med75.

df_site <- site_aggr(df_visit, method = "med75_adj", min_pat_pool = 0.2)

df_site %>%
  kable()

study_id	site_number	n_pat	n_pat_with_med75	visit_med75	mean_ae_site_med75
A	S0001	20	17	15	3.000000
A	S0002	20	17	14	3.352941
A	S0003	20	16	17	8.500000
A	S0004	20	17	16	8.058824
A	S0005	20	18	16	7.555556
A	S0006	20	20	15	7.850000

plot_visit_med75(df_visit, df_site, study_id_str = "A", n_site = 6)

## purple line:          mean site ae of patients with visit_med75
## grey line:            patient included
## black dashed line:    patient excluded
## dotted vertical line: visit_med75, 0.75 x median of maximum patient visits of site 
## solid vertical line:  visit_med75 adjusted, increased to minimum maximum patient visit of included patients
## dashed vertical line: maximum value for visit_med75 adjusted, 80% quantile of maximum patient visits of study

Using the adjusted visit_med75 will drop a few patients (dashed lines) but will by definition at least include 50% of all patients favoring patients that have higher number of visits. By looking at the mean ae development (purple line) we can already easily spot the two under-reporting sites. The same goes for looking at the mean_ae_site_med75 column in df_site. Next we will use bootstrap simulations to determine the probability for obtaining each mean AE value or lower at visit_med75 by chance.

Bootstrap Simulations

Advantage Over Classic Statistical Tests

We could use a classical parametric test and calculate the probability with which we can reject the NULL hypothesis for the AE counts that we observe at the visit_med75.

As we sample the AEs from a poisson distribution the R implementation of the poisson.test would be appropriate.

But there are four major problems that we often encounter when we try to describe real life count data with the poisson distribution which is only described by a single parameter:

over and underdispersion
skewness (long right tail)
inflated zeros
variance mean relationship might not be fixed

(see Distribution for Modelling Location Scale and Shape 5.1.2)

The true distribution of the AE counts will vary from study to study with the degree of the influence of these 4 problems being unknown unless thoroughly investigated.

With the non-parametric approach that we propose we do not need to worry about these statistical assumptions. Since the distribution of the AE count from the simulated patient pool we draw from will be very close to the unknown true distribution of the underlying AE generating process.

Disadvantages Over Classic Statistical Tests

Upper Limit for Uncompliant Sites. We simulate an underlying compliant patient population using the data we are given. If the fraction of uncompliant examples becomes too high, detection rates will decrease. We find that detection rates start decreasing with 30-50% of under-reporting sites and are not usable anymore if a majority > 50% of all sites are under-reporting see article on usability limits.
Lower Probability Limit The number of repeats determines the smallest probability greater than zero, for example a for r=1000 the smallest value greater than zero is 0.001
Computationally Expensive

Methodology

How likely is it to get a mean AE value that is equal or lower to what we observe at site C with the same number of patients?

For illustration purposes we start to simulate 10 hypothetical patient groups for site C by drawing (with replacement) from all patients in the study marking those for which we have obtained an equal or lower mean AE value than initially observed (middle).

Instead of 10 times we simulate 1000 times and count how many times we have observed an equal or lower mean AE value than initially observed and convert it to a percentage (right).

To illustrate the effect of under-reporting we repeat the same process after having removed 2 AEs per patient from site C. We see how the probability for obtaining an equal or lower mean AE value than initially observed decreases (bottom).

plot_sim_examples(size_dot = 4, size_raster_label = 10)

Application

We can run the above described simulation and as a benchmark also perform a poisson test.

df_sim_sites <- sim_sites(df_site, df_visit, r = 1000, poisson_test = TRUE)

df_sim_sites %>%
  kable()

study_id	site_number	n_pat	n_pat_with_med75	visit_med75	mean_ae_site_med75	mean_ae_study_med75	n_pat_with_med75_study	pval	prob_low
A	S0001	20	17	15	3.000000	6.712644	87	0.0e+00	0
A	S0002	20	17	14	3.352941	6.021739	92	7.5e-06	0
A	S0003	20	16	17	8.500000	6.513514	74	1.0e+00	1
A	S0004	20	17	16	8.058824	6.261905	84	1.0e+00	1
A	S0005	20	18	16	7.555556	6.349398	83	1.0e+00	1
A	S0006	20	20	15	7.850000	5.690476	84	1.0e+00	1

We find that the probability of getting a mean ae at visit_med75 for sites S0001 and S0002 is 0 or near zero.

Alpha Error Correction

Our simulated test data set consists of just 6 sites. However, it is not uncommon for 100 sites or more to participate in the same clinical trial. This would mean that we need to perform 100 statistical tests, and applying a 5% significance threshold would lead to on average of 5 False Positives (FP). We therefore need to adjust the calculated p-values and bootstrapped probabilities using stats::p.adjust(p, method = "BH"), which applies the Benjamin Hochberg Procedure. eval_sites() uses the the inverted adjusted values to calculates the final bootstrapped AE under-reporting probability (prob_low_prob_ur) and includes the poisson test derived under-reporting probability as a reference (pval_prob_ur).

df_eval <- eval_sites(df_sim_sites, method = "BH")

df_eval %>%
  kable()

study_id	site_number	n_pat	n_pat_with_med75	visit_med75	mean_ae_site_med75	mean_ae_study_med75	n_pat_with_med75_study	pval	prob_low	pval_adj	pval_prob_ur	prob_low_adj	prob_low_prob_ur
A	S0001	20	17	15	3.000000	6.712644	87	0.0e+00	0	0.00e+00	1.0000000	0	1
A	S0002	20	17	14	3.352941	6.021739	92	7.5e-06	0	2.24e-05	0.9999776	0	1
A	S0003	20	16	17	8.500000	6.513514	74	1.0e+00	1	1.00e+00	0.0000000	1	0
A	S0004	20	17	16	8.058824	6.261905	84	1.0e+00	1	1.00e+00	0.0000000	1	0
A	S0005	20	18	16	7.555556	6.349398	83	1.0e+00	1	1.00e+00	0.0000000	1	0
A	S0006	20	20	15	7.850000	5.690476	84	1.0e+00	1	1.00e+00	0.0000000	1	0

Plot Results

plot_study will plot mean ae development of all sites and visit level data for each flagged site with an AE under-reporting threshold of 95%.

plot_study(df_visit, df_site, df_eval, study = "A")

Applying {simaerep}

Instead of executing all previous steps manually we can also use the S3 interface which manages all intermediate results and provides more convenient plotting using the plot() generic function.

aerep <- simaerep(df_visit)
aerep

## simaerep object:
## Check aerep$df_eval prob_low_prob_ur column for under-reporting probabililty.
## Plot results using plot() generic.

str(aerep)

## List of 12
##  $ visit           :List of 4
##   ..$ dim        : int [1:2] 2336 9
##   ..$ df_summary : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
##   .. ..$ n_studies : int 1
##   .. ..$ n_sites   : int 6
##   .. ..$ n_patients: int 120
##   .. ..$ n_visits  : int 2336
##   .. ..$ n_aes     : int 942
##   ..$ str_call   : chr "df_visit"
##   ..$ event_names: chr "ae"
##   ..- attr(*, "class")= chr "orivisit"
##  $ df_site         : tibble [6 × 6] (S3: tbl_df/tbl/data.frame)
##   ..$ study_id          : chr [1:6] "A" "A" "A" "A" ...
##   ..$ site_number       : chr [1:6] "S0001" "S0002" "S0003" "S0004" ...
##   ..$ n_pat             : int [1:6] 20 20 20 20 20 20
##   ..$ n_pat_with_med75  : num [1:6] 17 17 16 17 18 20
##   ..$ visit_med75       : Named num [1:6] 15 14 17 16 16 15
##   .. ..- attr(*, "names")= chr [1:6] "80%" "80%" "80%" "80%" ...
##   ..$ mean_ae_site_med75: num [1:6] 3 3.35 8.5 8.06 7.56 ...
##  $ df_sim_sites    : tibble [6 × 9] (S3: tbl_df/tbl/data.frame)
##   ..$ study_id              : chr [1:6] "A" "A" "A" "A" ...
##   ..$ site_number           : chr [1:6] "S0001" "S0002" "S0003" "S0004" ...
##   ..$ n_pat                 : int [1:6] 20 20 20 20 20 20
##   ..$ n_pat_with_med75      : num [1:6] 17 17 16 17 18 20
##   ..$ visit_med75           : Named num [1:6] 15 14 17 16 16 15
##   .. ..- attr(*, "names")= chr [1:6] "80%" "80%" "80%" "80%" ...
##   ..$ mean_ae_site_med75    : num [1:6] 3 3.35 8.5 8.06 7.56 ...
##   ..$ mean_ae_study_med75   : num [1:6] 6.71 6.02 6.51 6.26 6.35 ...
##   ..$ n_pat_with_med75_study: int [1:6] 87 92 74 84 83 84
##   ..$ prob_low              : num [1:6] 0 0 1 1 1 1
##  $ df_eval         : tibble [6 × 11] (S3: tbl_df/tbl/data.frame)
##   ..$ study_id              : chr [1:6] "A" "A" "A" "A" ...
##   ..$ site_number           : chr [1:6] "S0001" "S0002" "S0003" "S0004" ...
##   ..$ n_pat                 : int [1:6] 20 20 20 20 20 20
##   ..$ n_pat_with_med75      : num [1:6] 17 17 16 17 18 20
##   ..$ visit_med75           : Named num [1:6] 15 14 17 16 16 15
##   .. ..- attr(*, "names")= chr [1:6] "80%" "80%" "80%" "80%" ...
##   ..$ mean_ae_site_med75    : num [1:6] 3 3.35 8.5 8.06 7.56 ...
##   ..$ mean_ae_study_med75   : num [1:6] 6.71 6.02 6.51 6.26 6.35 ...
##   ..$ n_pat_with_med75_study: int [1:6] 87 92 74 84 83 84
##   ..$ prob_low              : num [1:6] 0 0 1 1 1 1
##   ..$ prob_low_adj          : num [1:6] 0 0 1 1 1 1
##   ..$ prob_low_prob_ur      : num [1:6] 1 1 0 0 0 0
##  $ r               : num 1000
##  $ visit_med75     : logi TRUE
##  $ inframe         : logi FALSE
##  $ under_only      : logi TRUE
##  $ param_site_aggr :List of 2
##   ..$ method      : chr "med75_adj"
##   ..$ min_pat_pool: num 0.2
##  $ param_sim_sites :List of 4
##   ..$ r           : num 1000
##   ..$ poisson_test: logi FALSE
##   ..$ prob_lower  : logi TRUE
##   ..$ under_only  : logi TRUE
##  $ param_eval_sites:List of 2
##   ..$ method    : chr "BH"
##   ..$ under_only: logi TRUE
##  $ event_names     : chr "ae"
##  - attr(*, "class")= chr "simaerep"

plot(aerep)

## study = NULL, defaulting to study:A

For modifying the default parameters please check out the simaerep() documentation.