Skip to contents

Introduction

This vignette will explore the process and viability of using the {simaerep} algorithm to detect patient discontinuities and flag sites where there are more discontinuations than expected.

Patient Discontinuations

A discontinuation occurs any time a patient leaves a clinical trial for any reason. It is important to minimize patient discontinuation in order to maintain patient participation and therefore robust data collection in clinical studies.

Sample Data

The sample data is from the open source r package {clindata}. Sampled data is from data frames rawplus_visdt and rawplus_studcomp.

Implementation

Cumulative vs Binary events

The main difference between detecting cumulative clinical events such as AEs and and binary events such as discontinuations is that with cumulative events there can be multiple occurrences per patient, but with binary events there can only be one event per patient. This difference needs to be addressed and the data must be formulated into a version that {simaerep} can process, otherwise the algorithm’s output will not make sense. The other major difference is that AE detection is focused on determining sites below the norm (under-reporting), but with discontinuation detection we want to look for sites that are above the norm (over-reporting).

Data Preparation

rawplus_studycomp contains one entry for each patient that has discontinued. We interpret column mincreated_dts to be the timestamp the discontinuation has been entered and therefore to be a good proxy for the actual discontinuation date.

rawplus_studcomp
## # A tibble: 133 × 31
##    studyid        siteid invid scrnid subjid subjectid   datapagename datapageid
##    <chr>          <chr>  <chr> <chr>  <chr>  <chr>       <chr>        <chr>     
##  1 AA-AA-000-0000 58     0X091 020    1236   X0911236-0… Study Compl… 1         
##  2 AA-AA-000-0000 128    0X149 123    1023   X1491023-1… Study Compl… 1         
##  3 AA-AA-000-0000 155    0X125 058    1346   X1251346-0… Study Compl… 1         
##  4 AA-AA-000-0000 43     0X159 113    0760   X1590760-1… Study Compl… 1         
##  5 AA-AA-000-0000 127    0X043 058    0854   X0430854-0… Study Compl… 1         
##  6 AA-AA-000-0000 71     0X083 142    0561   X0830561-1… Study Compl… 1         
##  7 AA-AA-000-0000 140    0X161 091    0290   X1610290-0… Study Compl… 1         
##  8 AA-AA-000-0000 53     0X015 142    1127   X0151127-1… Study Compl… 1         
##  9 AA-AA-000-0000 71     0X083 020    1152   X0831152-0… Study Compl… 1         
## 10 AA-AA-000-0000 184    0X123 123    0720   X1230720-1… Study Compl… 1         
## # ℹ 123 more rows
## # ℹ 23 more variables: foldername <chr>, instancename <chr>, recordid <chr>,
## #   record_dt <chr>, recordposition <dbl>, mincreated_dts <chr>,
## #   maxupdated_dts <chr>, compyn <chr>, compreas <chr>, subjid_nsv <chr>,
## #   scrnid_nsv <chr>, subjinit_nsv <chr>, invid_nsv <chr>, subject_nsv <chr>,
## #   instanceid_nsv <dbl>, folder_nsv <chr>, folderseq_nsv <dbl>,
## #   compyn_std_nsv <chr>, compreas_std_nsv <chr>, compfu_nsv <chr>, …
df_disc <- rawplus_studcomp %>%
  rename(date = mincreated_dts) %>%
  mutate(event = "disc") %>%
  distinct(studyid, siteid, subjid, date, event)

df_disc %>%
  head() %>%
  knitr::kable()
studyid siteid subjid date event
AA-AA-000-0000 58 1236 2009-07-08T05:50:04 disc
AA-AA-000-0000 128 1023 2016-04-23T01:48:05 disc
AA-AA-000-0000 155 1346 2018-10-23T05:37:28 disc
AA-AA-000-0000 43 0760 2004-10-14T06:54:46 disc
AA-AA-000-0000 127 0854 2011-09-30T11:51:19 disc
AA-AA-000-0000 71 0561 2015-12-15T02:17:47 disc

We continue to build another event table for visits and horizontally bind both event tables

df_vs <- rawplus_visdt %>%
  rename(date = visit_dt) %>%
  mutate(event = "visit") %>%
  # We ignore visits that have no date
  filter(! is.na(date)) %>%
  # We are not interested in same day visits
  distinct(studyid, siteid, subjid, date, event)

df_events <- bind_rows(df_vs, df_disc) %>%
  arrange(studyid, siteid, subjid, date)

df_events %>%
  filter(subjid == "0002") %>%
  knitr::kable()
studyid siteid subjid date event
AA-AA-000-0000 76 0002 2017-04-03 visit
AA-AA-000-0000 76 0002 2017-04-10 visit
AA-AA-000-0000 76 0002 2017-04-18 visit
AA-AA-000-0000 76 0002 2017-05-05 visit
AA-AA-000-0000 76 0002 2017-05-22 visit
AA-AA-000-0000 76 0002 2017-06-05T19:31:19 disc

We add the cumulative event counts and aggregate on visit level to prepare data in a format that is ready to use by {simaerep}.

df_visit <- df_events %>%
  mutate(
    cum_visit = cumsum(ifelse(event == "visit", 1, 0)),
    cum_disc = cumsum(ifelse(event == "disc", 1, 0)),
    .by = c("studyid", "siteid", "subjid")
  ) %>%
  # aggregate counts on visit level
  summarise(
    cum_disc = max(cum_disc),
    .by = c("studyid", "siteid", "subjid", "cum_visit")
  ) %>%
  # remove patients with 0 visits
  filter(max(cum_visit) > 0 , .by = c("studyid", "siteid", "subjid"))

df_visit %>%
  filter(subjid == "0002") %>%
  knitr::kable()
studyid siteid subjid cum_visit cum_disc
AA-AA-000-0000 76 0002 1 0
AA-AA-000-0000 76 0002 2 0
AA-AA-000-0000 76 0002 3 0
AA-AA-000-0000 76 0002 4 0
AA-AA-000-0000 76 0002 5 1

Sampling Correction

Notice that patient 0002 only has 5 entries, due to their early departure. The limited records of discontinued patients will create a problem with {simaerep}’s sampling algorithm, as the algorithm samples patients that have at least the same number of visits as the patient that is being replaced. This leads to an survivor bias where discontinued patients, since they have less visits, are unable to be used as replacement values for patients who had more visits.

To address this we would have to use the planned visits instead of the occurred visits, but {clindata} does not provide those.

As an approximation to the planned visits the discontinued patient’s records will be artificially inflated to 15 visits, a cut off point that includes roughly 80% of the patients not discontinued have reached during the study. This change allows for the proper sampling to occur and for the production of correct results.

subj_disc <- df_visit %>%
  filter(cum_disc == 1) %>%
  pull(subjid) %>%
  unique()

df_fill <- df_visit %>%
  distinct(studyid, siteid, subjid) %>%
  filter(subjid %in% subj_disc) %>%
  cross_join(
    tibble(
      cum_visit = seq(1, 15),
      disc_fill = 1
    )
  )

df_visit_disc <- df_visit %>%
  filter(subjid %in% subj_disc) %>%
  full_join(
    df_fill,
    by = c("studyid", "siteid" ,"subjid" ,"cum_visit")
  ) %>%
  mutate(
    cum_disc = coalesce(cum_disc, disc_fill)
  ) %>%
  select(- disc_fill) %>%
  arrange(subjid, cum_visit)

df_visit_not_disc <- df_visit %>%
  filter(! subjid %in% subj_disc)

df_visit_fill <- bind_rows(df_visit_disc, df_visit_not_disc) %>%
  rename(n_disc = cum_disc)

# Displays patient 0002, modified to 15 visits
df_visit_fill %>%
  filter(subjid == "0002") %>%
  arrange(cum_visit) %>%
  kable()
studyid siteid subjid cum_visit n_disc
AA-AA-000-0000 76 0002 1 0
AA-AA-000-0000 76 0002 2 0
AA-AA-000-0000 76 0002 3 0
AA-AA-000-0000 76 0002 4 0
AA-AA-000-0000 76 0002 5 1
AA-AA-000-0000 76 0002 6 1
AA-AA-000-0000 76 0002 7 1
AA-AA-000-0000 76 0002 8 1
AA-AA-000-0000 76 0002 9 1
AA-AA-000-0000 76 0002 10 1
AA-AA-000-0000 76 0002 11 1
AA-AA-000-0000 76 0002 12 1
AA-AA-000-0000 76 0002 13 1
AA-AA-000-0000 76 0002 14 1
AA-AA-000-0000 76 0002 15 1

{simaerep}

Since discontinuities are inherently more rare than AEs, it is necessary to run {simaerep} with a larger bootstrap iteration so that the results in between runs are more stable. The following code is run with 50,000 bootstrap repetitions, which is much higher than {simaereps}’s default of 1,000. This change allows the algorithm to provide a more stable and therefore more accurate model.

Here is the output of a {simaerep} call on df_visit modified for discontinuities.

discrep <- simaerep(
    df_visit_fill,
    r = 50000,
    event_names = "disc",
    col_names = list(
      study_id = "studyid",
      site_id = "siteid",
      patient_id = "subjid",
      visit = "cum_visit"
    )
  )

discrep
## simaerep object:
## ----------------
## Plot results using plot() generic.
## Full results available in "df_eval".
## 
## Summary:
## Number of sites: 176
## Number of studies: 1
## 
## Multiplicity correction applied to '*_prob' columns.
## 
## First 10 rows of df_eval:
## # A tibble: 10 × 10
##    studyid        siteid disc_count disc_per_visit_site visits n_pat
##    <chr>          <chr>       <dbl>               <dbl>  <dbl> <int>
##  1 AA-AA-000-0000 10              2             0.00494    405    20
##  2 AA-AA-000-0000 100             0             0           41     2
##  3 AA-AA-000-0000 101             0             0           63     3
##  4 AA-AA-000-0000 102             0             0           68     3
##  5 AA-AA-000-0000 103             0             0           86     4
##  6 AA-AA-000-0000 104             2             0.0106     188     9
##  7 AA-AA-000-0000 105             0             0           65     3
##  8 AA-AA-000-0000 106             0             0           69     3
##  9 AA-AA-000-0000 107             0             0           63     3
## 10 AA-AA-000-0000 109             0             0           89     4
## # ℹ 4 more variables: disc_per_visit_study <dbl>, disc_prob_no_mult <dbl>,
## #   disc_prob <dbl>, disc_delta <dbl>
plot(discrep)
## study = NULL, defaulting to study:AA-AA-000-0000

Top 100 out of 176 sites by discontinuation probability without multiplicity correction in descending order.

discrep$df_eval %>%
  mutate(disc_ratio_per_pat = disc_count / n_pat) %>%
  select(
    siteid,
    disc_count,
    visits,
    n_pat,
    disc_ratio_per_pat,
    disc_per_visit_site,
    disc_per_visit_study,
    disc_prob_no_mult,
    disc_prob,
    disc_delta
  ) %>%
  arrange(desc(disc_prob_no_mult)) %>%
  mutate(rank = row_number(), .before = siteid) %>%
  head(100) %>%
  knitr::kable(digits = 3)
rank siteid disc_count visits n_pat disc_ratio_per_pat disc_per_visit_site disc_per_visit_study disc_prob_no_mult disc_prob disc_delta
1 28 4 99 5 0.800 0.040 0.003 1.000 0.993 3.746
2 166 6 470 20 0.300 0.013 0.002 1.000 0.986 5.141
3 60 3 78 4 0.750 0.038 0.002 1.000 0.986 2.817
4 161 3 89 5 0.600 0.034 0.002 0.999 0.975 2.787
5 26 3 143 7 0.429 0.021 0.003 0.996 0.879 2.641
6 13 3 158 8 0.375 0.019 0.002 0.996 0.879 2.639
7 115 2 54 3 0.667 0.037 0.002 0.995 0.874 1.868
8 77 4 365 17 0.235 0.011 0.002 0.992 0.823 3.172
9 52 2 53 3 0.667 0.038 0.003 0.991 0.821 1.833
10 15 2 108 5 0.400 0.019 0.002 0.985 0.757 1.799
11 43 5 719 33 0.152 0.007 0.002 0.985 0.757 3.536
12 165 1 16 1 1.000 0.062 0.001 0.977 0.715 0.977
13 154 1 16 1 1.000 0.062 0.001 0.977 0.715 0.977
14 89 2 127 6 0.333 0.016 0.002 0.974 0.715 1.735
15 114 2 161 7 0.286 0.012 0.002 0.973 0.715 1.734
16 174 2 124 6 0.333 0.016 0.002 0.973 0.715 1.720
17 159 1 17 1 1.000 0.059 0.002 0.971 0.715 0.971
18 48 1 18 1 1.000 0.056 0.002 0.970 0.715 0.970
19 170 4 626 27 0.148 0.006 0.002 0.969 0.715 2.790
20 175 1 20 1 1.000 0.050 0.002 0.968 0.715 0.968
21 190 2 154 7 0.286 0.013 0.002 0.962 0.681 1.672
22 34 2 172 8 0.250 0.012 0.002 0.951 0.620 1.633
23 104 2 188 9 0.222 0.011 0.002 0.948 0.620 1.620
24 71 2 217 10 0.200 0.009 0.002 0.948 0.620 1.620
25 67 2 223 11 0.182 0.009 0.002 0.944 0.608 1.610
26 145 1 15 1 1.000 0.067 0.004 0.941 0.600 0.941
27 50 2 192 8 0.250 0.010 0.002 0.936 0.581 1.570
28 46 1 36 2 0.500 0.028 0.003 0.910 0.453 0.909
29 182 1 37 2 0.500 0.027 0.003 0.902 0.453 0.899
30 185 1 48 2 0.500 0.021 0.002 0.889 0.453 0.885
31 3 1 39 2 0.500 0.026 0.003 0.888 0.453 0.885
32 130 1 63 3 0.333 0.016 0.002 0.885 0.453 0.880
33 116 1 57 3 0.333 0.018 0.002 0.881 0.453 0.877
34 16 1 56 3 0.333 0.018 0.002 0.881 0.453 0.876
35 140 5 1456 69 0.072 0.003 0.002 0.881 0.453 2.382
36 179 1 106 4 0.250 0.009 0.001 0.879 0.453 0.874
37 64 1 82 4 0.250 0.012 0.002 0.879 0.453 0.874
38 53 1 65 3 0.333 0.015 0.002 0.877 0.453 0.872
39 144 1 80 4 0.250 0.013 0.002 0.875 0.453 0.870
40 173 3 602 27 0.111 0.005 0.002 0.873 0.453 1.752
41 40 1 57 3 0.333 0.018 0.002 0.873 0.453 0.867
42 96 1 82 4 0.250 0.012 0.002 0.869 0.452 0.862
43 62 3 711 34 0.088 0.004 0.002 0.865 0.450 1.727
44 91 2 366 16 0.125 0.005 0.002 0.859 0.450 1.336
45 168 1 59 3 0.333 0.017 0.003 0.859 0.450 0.852
46 81 1 78 4 0.250 0.013 0.002 0.856 0.450 0.849
47 124 1 85 4 0.250 0.012 0.002 0.849 0.434 0.840
48 163 1 72 3 0.333 0.014 0.002 0.839 0.412 0.831
49 10 2 405 20 0.100 0.005 0.002 0.836 0.412 1.277
50 127 1 102 5 0.200 0.010 0.002 0.832 0.409 0.819
51 25 1 66 3 0.333 0.015 0.003 0.828 0.408 0.818
52 135 1 109 5 0.200 0.009 0.002 0.823 0.407 0.809
53 128 1 95 5 0.200 0.011 0.002 0.822 0.407 0.808
54 39 1 92 4 0.250 0.011 0.002 0.814 0.393 0.799
55 131 1 141 7 0.143 0.007 0.001 0.808 0.387 0.790
56 86 2 477 22 0.091 0.004 0.002 0.798 0.366 1.169
57 184 1 104 5 0.200 0.010 0.002 0.788 0.345 0.767
58 58 2 459 20 0.100 0.004 0.002 0.780 0.337 1.119
59 29 1 142 7 0.143 0.007 0.002 0.778 0.337 0.752
60 134 1 130 6 0.167 0.008 0.002 0.773 0.333 0.748
61 186 1 119 5 0.200 0.008 0.002 0.751 0.281 0.722
62 157 1 155 7 0.143 0.006 0.002 0.733 0.252 0.698
63 83 1 142 7 0.143 0.007 0.002 0.732 0.252 0.696
64 59 1 180 8 0.125 0.006 0.002 0.714 0.213 0.670
65 4 1 196 9 0.111 0.005 0.002 0.668 0.102 0.606
66 155 1 230 10 0.100 0.004 0.002 0.648 0.060 0.576
67 51 1 237 11 0.091 0.004 0.002 0.615 0.000 0.524
68 94 2 768 36 0.056 0.003 0.002 0.608 0.000 0.655
69 69 1 271 12 0.083 0.004 0.002 0.574 0.000 0.459
70 63 1 240 10 0.100 0.004 0.002 0.570 0.000 0.455
71 92 1 312 15 0.067 0.003 0.002 0.561 0.000 0.432
72 76 1 273 12 0.083 0.004 0.002 0.547 0.000 0.414
73 167 1 321 15 0.067 0.003 0.002 0.536 0.000 0.393
74 61 1 346 15 0.067 0.003 0.002 0.460 0.000 0.245
75 54 1 390 17 0.059 0.003 0.002 0.436 0.000 0.189
76 172 1 601 28 0.036 0.002 0.002 0.343 0.000 -0.053
77 56 1 545 25 0.040 0.002 0.002 0.334 0.000 -0.071
78 143 1 601 28 0.036 0.002 0.002 0.318 0.000 -0.120
79 132 0 34 1 0.000 0.000 0.000 0.000 0.000 0.000
80 171 0 33 1 0.000 0.000 0.000 0.000 0.000 0.000
81 126 0 21 1 0.000 0.000 0.001 -0.031 0.000 -0.031
82 14 0 21 1 0.000 0.000 0.001 -0.031 0.000 -0.031
83 42 0 21 1 0.000 0.000 0.001 -0.031 0.000 -0.031
84 36 0 21 1 0.000 0.000 0.001 -0.031 0.000 -0.031
85 33 0 21 1 0.000 0.000 0.002 -0.032 0.000 -0.032
86 158 0 21 1 0.000 0.000 0.002 -0.032 0.000 -0.032
87 45 0 21 1 0.000 0.000 0.002 -0.032 0.000 -0.032
88 99 0 20 1 0.000 0.000 0.002 -0.032 0.000 -0.032
89 180 0 21 1 0.000 0.000 0.002 -0.032 0.000 -0.032
90 38 0 21 1 0.000 0.000 0.002 -0.032 0.000 -0.032
91 20 0 20 1 0.000 0.000 0.002 -0.032 0.000 -0.032
92 110 0 21 1 0.000 0.000 0.002 -0.033 0.000 -0.033
93 6 0 19 1 0.000 0.000 0.002 -0.036 0.000 -0.036
94 78 0 22 1 0.000 0.000 0.002 -0.041 0.000 -0.041
95 55 0 22 1 0.000 0.000 0.002 -0.042 0.000 -0.042
96 87 0 22 1 0.000 0.000 0.002 -0.042 0.000 -0.042
97 66 0 22 1 0.000 0.000 0.002 -0.042 0.000 -0.042
98 123 0 22 1 0.000 0.000 0.002 -0.043 0.000 -0.043
99 65 0 22 1 0.000 0.000 0.002 -0.043 0.000 -0.043
100 117 0 23 1 0.000 0.000 0.002 -0.045 0.000 -0.045

We find 4 sites that have a high reporting probability (>= 99% w/o multiplicity correction and >= 95% with) for discontinuations. As is expected the discontinuation ratio per patient which is a common monitoring metric is a poor indicator for which sites have high over-reporting probability as ratios from small samples tend to have more variability and are more likely to have outlier.

Summary

In order to use {simaerep} for binary events such as patient discontinuation we have to account for survival bias by using planned visits rather than actual visits. Then {simaerep} can produce very plausible over-reporting probabilities with the same advantages we expect {simaerep} to have over parametric statistical methods for other event reporting metrics.