Note: this page is “under construction” until the
clinsight
package developers create helper functions that
transform at least one other EDC data source into the desired format. In
the meantime, this page provides useful documentation to the existing
data specifications. Eventually, it will also share guidelines about how
to combine your input data with metadata sources, etc.
Intro
In order to get started and plug your organizations EDC data into the
clinsight
application, please notice this app is outfitted
with a ./inst/golem-config.yml
file to configure several
elements of deployment. Below is a typical configuration file:
default:
golem_name: clinsight
golem_version: 0.0.0.9004
app_prod: no
user_identification: test_user
study_data: !expr clinsight::clinsightful_data
meta_data: !expr clinsight::metadata
user_db: user_db.sqlite
user_roles:
Administrator: admin
Medical Monitor: medical_monitor
Data Manager: data_manager
shinymanager:
app_prod: yes
user_identification: shinymanager
study_data: study_data.rds
meta_data: metadata.rds
credentials_db: credentials_db.sqlite
First and foremost, notice that the configuration can vary depending
on deployment use case. For example, the above file is designed to run
(by default) with test data built into the app. Only once in
production
mode, does the app actually leverage data you’ve
gathered from your EDC system, as RDS files. As seen above, there are
several other elements defined in golem-config.yml
that can
be configured before launching the application for the first time.
The main two elements are meta_data
and
study_data
, accepting file paths to the app’s primary data
sources, stored as RDS files. The study_data
object should
be created with the pre-processing helper function called
merge_meta_with_data()
which accepts raw data sources and
merges them with the meta_data
object defined in this
article.
Other elements are:
-
user_db
a Character string providing the path to the app’s review database. If it does not exist, one will be created based on thestudy_data
andmeta_data
, with all data labeled as new/not yet reviewed. -
credentials_db
Character string. Path to the credentials database. Only needed when usingshinymanager
for user identification. The database will be created automatically if needed.
Depending on the EDC vendor used, the size, shape, and format of their “raw data” may vary. We’ve compiled a few functions that help admin users pre-process the data they own. However, before we deep dive into how to use those, let’s focus on what the final result of those pre-processing steps to understand what’s being fed to the app.
Data Specifications
Baked into the clinsight
package is an internal
data.frame called clinsightful_data
, which is comprised of
randomly generated test data for this application. Here is a preview of
that data:
# Run to load the data into an R session.
# pkg_name <- "clinsight"
# library(pkg_name, character.only = TRUE)
# data("clinsightful_data")
head(clinsight::clinsightful_data)
#> # A tibble: 6 × 24
#> site_code subject_id event_repeat event_id event_name event_date form_id
#> <chr> <chr> <int> <chr> <chr> <date> <chr>
#> 1 Site 08 BEL_08_885 1 SCR Screening 2023-06-05 LBHM
#> 2 Site 08 BEL_08_885 1 SCR Screening 2023-06-05 LBHM
#> 3 Site 08 BEL_08_885 1 SCR Screening 2023-06-05 LBHM
#> 4 Site 08 BEL_08_885 1 SCR Screening 2023-06-05 LBHM
#> 5 Site 08 BEL_08_885 1 SCR Screening 2023-06-05 LBHM
#> 6 Site 08 BEL_08_885 1 SCR Screening 2023-06-05 LBHM
#> # ℹ 17 more variables: form_repeat <int>, edit_date_time <dttm>,
#> # db_update_time <dttm>, region <chr>, day <drtn>, vis_day <dbl>,
#> # vis_num <dbl>, event_label <chr>, item_name <chr>, item_type <chr>,
#> # item_group <chr>, item_value <chr>, item_unit <chr>, lower_lim <dbl>,
#> # upper_lim <dbl>, significance <chr>, reason_notdone <chr>
This object is an example of a healthy study_data
object, that should preferably be stored as an RDS file prior to
launching the app. let’s inspect it a little more:
study_data
The RDS file (or data.frame) ported to the study_data
element contains the following required columns below.
-
site_code
: character or integer, identifier for study site; If an integer, recommended to add prefix “Site” as this will display more intuitively in the application’s UI -
subject_id
: character, unique identifier for a subject -
event_repeat
: integer, helps keep track of uniqueevent_id
for a singlesubject_id
andevent_date
-
event_id
: character, names that help classify types ofevent_name
s into like-groups, generally characterized by site visits. For example, “SCR” for the screening visit, “VIS” for Visit X (where X is some integer), and “EXIT” for when the patient exits the study trial. However, someevent_id
s track events that could apply outside of any visit, like AE, ConMed, Medical History, etc. -
event_name
: character, an “event” generally characterizes some sort of site visit, whether that be a “Screening”, “Visit X” (where X is some integer), “Exit”, or “Any Visit”. -
event_date
: Date, the date associated withevent_name
-
form_id
: character, a unique identifier for the form theitem_name
metric anditem_value
were pulled from. Note: whenitem_type
is continuous,form_id
can contain several differentitem_group
s. However, whenitem_type
is ‘other’,item_group
can be made up of severalform_id
values. -
form_repeat
: integer, helps keep track of uniqueitem_name
s collected from a specificform_id
for a givensubject_id
.form_repeat
is particularly helpful when consolidating data like Adverse Events into this data format. Specifically, if more than one AE is collected on a patient, they’ll have more than oneform_repeat
-
edit_date_time
: datetime (POSIXct), the last time this record was edited -
db_update_time
: datetime (POSIXct), the last time the database storing this record was updated. -
region
: character, describing the region code thatsite_code
falls under -
day
: a difftime number, meaning it contains both a number and unit of time. It measures the number of days each visit is from screening -
vis_day
: numeric, a numeric representation ofday
-
vis_num
: numeric, a numeric representation ofevent_name
-
event_label
: character, an abbreviation ofevent_name
-
item_name
: character, describes a metric or parameter of interest. -
item_type
: character, classifiesitem_name
s into either ‘continuous’ or ‘other’, where continuous types are those generally associated with the CDISC “basic data structure” (BDS). That is, eachitem_name
metric is collected over time at a patient visit (event_name
). The ‘other’ type represents all non-time dependent measures, like demographic info, adverse events, Medications, medical history, etc. -
item_group
: character, provides is a high level category that groups like-item_name
s together. For example, anditem_group
= ‘Vital Signs’ will group together pertinentitem_name
metrics like BMI, Pulse, Blood pressure, etc. -
item_value
: character, the measurement collected for a givenitem_name
. The value collected may be a number like 150 (when collecting a patient’s weight) or a word (such as ‘white’ for the subject’s race). -
item_unit
: character, tracking the unit of measurement foritem_name
anditem_value
. -
lower_lim
: numeric, someitem_name
s (particularly the ‘continuous’ type) have a pre-defined range of values that are considered normal. This is the lower limit to that range. -
upper_lim
: numeric, someitem_name
s (particularly the ‘continuous’ type) have a pre-defined range of values that are considered normal. This is the upper limit to that range. -
significance
: character, either ‘CS’ which means ‘Clinically Significant’ or ‘NCS’ which means ‘Not Clinically Significant’ -
reason_notdone
: character, an effort to describe why theitem_value
field isNA
/ missing.
Processing your Raw Data
So, the next logical question is “How do I get my EDC’s data into the
study_data
format?” Well, this package currently offers a
pre-processing helper function called
merge_meta_with_data()
which accepts raw data sources from
the [Viedoc] EDC vendor and merges them with the meta_data
object (defined below) to create a viable study_data
object. As such, we’ll spend some time covering what this helper
function expects of your raw data and meta_data
object and
how it transforms it into the study_data
object we need for
app launch.
First, let’s discuss the app’s metadata needs!
meta_data
The meta_data
object is a list of data.frames which (not
surprisingly) contains metadata information for the application. It will
provide study-specific settings, and controls where study data in the
application will be visible. It can be created in the right format by
changing the Excel template in the data-raw/metadata.xlsx
,
and then create a metadata object with the
functionget_metadata
(e.g. meta <- clinsight::get_metadata("path-to-custom-metadata.xlsx")
).
The meta_data
object should be saved as a .rds
file so that clinsight
can use it
(e.g. saveRDS(meta, "data_folder/metadata.rds")
).
As stated previously, the metadata should also be used to shape the
raw study_data
in the right format, and to dictate which
variables will be included in the study_data
. This can be
done by using the function merge_meta_with_data()
, and will
be described in detail in the next section. The goal is that most, if
not all study-specific data will be captured in the metadata, leaving
the scripts to run the application largely unaltered between
studies.
Just like for study_data
, this package also bundles a
built-in metadata object called metadata
. To view an
example metadata file, run the following chunk of code:
meta_data <- clinsight::metadata
lapply(meta_data, head)
#> $column_names
#> # A tibble: 6 × 2
#> name_raw name_new
#> <chr> <chr>
#> 1 SiteCode site_code
#> 2 SubjectId subject_id
#> 3 EventId event_id
#> 4 EventDate event_date
#> 5 EventName event_name
#> 6 EventSeq event_repeat
#>
#> $events
#> # A tibble: 6 × 3
#> event_number event_name event_label
#> <chr> <chr> <chr>
#> 1 0 Screening V0
#> 2 1 Visit 1 V1
#> 3 2 Visit 2 V2
#> 4 3 Visit 3 V3
#> 5 4 Visit 4 V4
#> 6 5 Visit 5 V5
#>
#> $common_forms
#> item_name item_type item_group
#> 1 AE Number other Adverse events
#> 2 AE Name other Adverse events
#> 3 AESI other Adverse events
#> 4 AE start date other Adverse events
#> 5 AE end date other Adverse events
#> 6 AE date of worsening other Adverse events
#>
#> $study_forms
#> item_name item_type item_group unit lower_limit
#> 1 Systolic blood pressure continuous Vital signs mmHg 90
#> 2 Diastolic blood pressure continuous Vital signs mmHg 55
#> 3 Pulse continuous Vital signs beats/min 60
#> 4 Resp continuous Vital signs breaths/min 12
#> 5 Temperature continuous Vital signs °C 35
#> 6 Weight change since screening continuous Vital signs <NA> <NA>
#> upper_limit
#> 1 160
#> 2 90
#> 3 100
#> 4 20
#> 5 38.5
#> 6 <NA>
#>
#> $general
#> item_name item_type item_group
#> 1 Age other General
#> 2 Sex other General
#> 3 ECOG other General
#> 4 Eligible other General
#> 5 Eligible_Date other General
#> 6 WHO.classification other General
#>
#> $form_level_data
#> item_group item_scale use_unscaled_limits review_required
#> 1 Adverse events NA NA TRUE
#> 2 Medication NA NA TRUE
#> 3 Conc. Procedures NA NA TRUE
#> 4 Medical History NA NA TRUE
#> 5 Vital signs FALSE TRUE TRUE
#> 6 Electrolytes TRUE FALSE TRUE
#>
#> $table_names
#> # A tibble: 6 × 2
#> table_name raw_name
#> <chr> <chr>
#> 1 Edit date edit_date_time
#> 2 Date event_date
#> 3 Event event_label
#> 4 Event event_name
#> 5 eN event_repeat
#> 6 Form item_group
#>
#> $items_expanded
#> # A tibble: 6 × 8
#> var suffix item_name item_type item_group unit lower_limit upper_limit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 AE_AESPID NA AE Number other Adverse e… NA NA NA
#> 2 AE_AETERM NA AE Name other Adverse e… NA NA NA
#> 3 AE_AESI NA AESI other Adverse e… NA NA NA
#> 4 AE_AESTDAT NA AE start… other Adverse e… NA NA NA
#> 5 AE_AEENDAT NA AE end d… other Adverse e… NA NA NA
#> 6 AE_AESTDA… NA AE date … other Adverse e… NA NA NA
Specifications for the list of data.frames include:
-
column_names
: Used to map raw data variable names over to new names.-
name_raw
: character, variable name in the raw data source -
name_new
: character, desired variable name to use instudy_data
and in the application
-
-
events
: Used to create a simple timeline in the application, with predefined number of planned visits, N. It contains the following columns:-
event_number
: integer. Example: 0, 1, 2, …, N -
event_name
: character. Example: “Screening”, “Visit 1”, “Visit 2”, …, “Visit N” -
event_label
: character. Example: “V0”, “V1”, “V2”, …, “VN”
-
-
common_forms
: Used to select and rename the variables of interest in the raw data when transformed into the desiredstudy_data
format. Note: creating thestudy_data
data.frame should usemerge_meta_with_data()
where (not surprisingly), the metadata is merged with the raw study data.common_forms
contains the columns below:-
var
: character, the variable name to display in the table, mapped from a knownitem_name
provided instudy_data
. Example:item_name = "AE Name"
will be replaced by “AE_AETERM” whenvar = "AE_AETERM"
. -
suffix
: Usually blank in this data.frame. This column is more commonly used in thestudy_forms
data.frame -
item_name
: character, knownitem_name
s found instudy_data
. There are certainitem_name
s that are required, even if missing instudy_data
, including: “AE Name”, “AE start date”, “AE end date”, “AE date of worsening”, “AE CTCAE severity”, “AE CTCAE severity worsening”, “Serious Adverse Event”, and “SAE Start date”. -
item_type
: character, knownitem_type
corresponding to those found foritem_name
s instudy_data
. -
item_group
: character, knownitem_group
corresponding to those found foritem_name
s instudy_data
.
-
-
study_forms
: Contains the same columns as the data.framecommon_forms
, and in addition the columnsunit
,lower_limit
,upper_limit
. Used to select and rename the raw data variables of interest. In addition, thesuffix
column is used more regularly in this “study” context. This is because, these variable names may have a consistent trunk / stem, with varying suffixes to describe a similar style measurement. So instead of creating a new row for these variables in themeta_data
data.frame, we allow for inclusion of several suffixes. For example, VS_PULSE measures beats/min. Typically, these measures are collected using VS_PULSE_VSORRES & VS_PULSE_VSREAND, but in this format, we can list the stem “VS_PULSE” as thevar
and"VSORRES, VSREAND"
in the suffix field. As for the new columns, they are defined as follows:-
unit
: character, unit of measure -
lower_limit
: numeric, the lower limit of what’s considered clinically significant -
upper_limit
: numeric, the upper limit of what’s considered clinically significant
-
general
: Contains the same columns ascommon_forms
and is used in the same way. That is, it’s used to select and rename the raw data when transformed into the desiredstudy_data
format. Note: creating thestudy_data
data.frame should usemerge_meta_with_data()
where (not surprisingly), the metadata is merged with the raw study data. Please refer to thecommon_forms
spec above. However, I will note that there are certainitem_name
s that are required, even if missing instudy_data
, including: “Age”, “Sex”, “ECOG”, “Eligible”, “WHO.classification”, “DiscontinuationReason”, “DrugAdminDate”, and “DrugAdminDose”.groups
: Contains the columnsitem_group
,item_type
,item_scale
,use_unscaled_limits
.-
table_names
: Used for renaming table column names into a more readable format. It is not required to name all column names; if the column names are not defined here, the raw name will be used instead.-
table_name
: character, -
raw_name
: character,
-
get_metadata()
& items_expanded
So, lastly, you may notice that there is a 7th data.frame called
items_expanded
. This data.frame is actually derived after
the user runs a helper function called get_metadata()
which
takes an XLSX file containing the first 6 data.frames (one per tab) and
expands the tabs of your choosing with the column of your choosing. In
other words, get_metadata()
makes sure your existing
metadata is in the correct format and helps us get create something a
bit more polished & digestible, like items_expanded
for
clinsight::metadata
. As you’ll see later, performing this
step is crucial for merge_meta_with_data()
which comes
next. So, after you’ve compiled your metadata XLSX spreadsheet with the
first 6 data.frames (tabs) mentioned above, you’re ready to run code
that looks like the following:
# usethis::edit_r_environ()
meta_path <- Sys.getenv("METADATA_PATH")
meta_data <- get_metadata(file.path(meta_path, "my_metadata.xlsx"),
expand_tab_items = c("common_forms", "study_forms", "general"),
expand_cols = "suffix")
In summary, get_metadata
will initiate the
meta_data
object with the first 6 data.frames directly from
the Excel file, and then items_expanded
will be created by
expanding common_forms
, study_forms
, and
general
data.frames by the values stored in the
suffix
column. The result will be appended onto the
meta_data
object as the 7th data.frame in the list.
Once complete, save your metadata to an .RDS file:
saveRDS(meta_data, file.path(Sys.getenv("METADATA_PATH"), "meta_data.rds"))
Read in your study data with get_raw_csv_data()
Now that we have an understanding of our meta_data
specs, we can discuss how they interact with your raw data. The rest of
this vignette will feel more like an R script as we discuss how this
happens. First, we need to read in our raw data. There is yet another
helper function bundled in this package called
get_raw_csv_data()
to help us do that. It’s a pretty simple
wrapper function that basically reads in raw data files stored as CSVs
from a designated folder. As such, your function call should look
something like the following code chunk. Notice, you can either set your
raw data path explicitly, or in your .Renviron
file.
# usethis::edit_r_environ()
data_path <- Sys.getenv("DATA_PATH")
raw_data <- get_raw_csv_data(data_path)
Finish the job with merge_meta_with_data()
Now that our data has been read in and minimally cleaned up, we can
finally use the merge_meta_with_data()
function as shown
below:
study_data <- merge_meta_with_data(data = raw_data, meta = meta_data)
This function uses the rest of the metadata data.frames to further organize your raw data into something usable by the app, including but not limited to the following:
- renaming columns to the required column names needed in the
application
- it will use the column provided in the metadata tab
column_names
, and will verify if the new names still match with the expected/required names. Notice that all of these variables are required, so it’s important that you identify which variables from your EDC mirror them.
- it will use the column provided in the metadata tab
- converting columns to the required column type (e.g. dates, character, integer)
- fixing multiple choice variables using a function called
fix_multiple_choice_vars()
- adding derivative values and columns such as:
-
vis_day
andvis_num
,day
(variables based onevent_id
andevent_date
).
-
-
event_name
&event_label
have some clean up performed on it’s values to standardize them for presentation in the app. Last, the data is ordered bysite_code
andsubject_id
- merging the raw data with
meta_data$items_expanded
- applying any study specific fixes using a function called
apply_study_specfific_fixes()
. #TODO
Once complete, save your metadata to an .RDS file:
saveRDS(study_data, file.path(Sys.getenv("DATA_PATH"), "study_data.rds"))
Launch the app
Circling back to the configuration file we shared at the beginning of this vignette, you’re now ready to launch this application.
default:
golem_name: clinsight
golem_version: 0.0.0.9004
app_prod: no
user_identification: test_user
study_data: !expr clinsight::clinsightful_data
meta_data: !expr clinsight::metadata
user_db: user_db.sqlite
user_roles:
Administrator: admin
Medical Monitor: medical_monitor
Data Manager: data_manager
shinymanager:
app_prod: yes
user_identification: shinymanager
study_data: study_data.rds
meta_data: metadata.rds
credentials_db: credentials_db.sqlite