App Configuration & Input Data Specifications

Note: this page contains the old package documentation and will slowly be replaced by or merged into newer vignettes. Keep in mind that the information in this vignette could be outdated. See vignette(‘clinsight’), vignette(‘Metadata’) and vignette(‘Deployment’) for the latest information.

Introduction

In order to get started and plug your organizations EDC data into the clinsight application, please notice this app is outfitted with a ./inst/golem-config.yml file to configure several elements of deployment. Below is a typical configuration file:

default:
  golem_name: clinsight
  golem_version: 0.1.1.9022
  app_prod: no
  user_identification: test_user
  study_data: !expr clinsight::clinsightful_data
  meta_data: !expr clinsight::metadata
  user_db: user_db.sqlite
  user_roles:
    Administrator: admin
    Medical Monitor: medical_monitor
    Data Manager: data_manager
  allow_to_review: [admin, medical_monitor]
  allow_listing_download: TRUE
  allow_query_inputs: TRUE
shinymanager:
  app_prod: yes
  user_identification: shinymanager
  study_data: study_data.rds
  meta_data: metadata.rds
  credentials_db: credentials_db.sqlite
shinyproxy:
  app_prod: yes
  user_identification: http_headers
  study_data: study_data/study_data.rds
  meta_data: study_data/metadata.rds
  user_db: study_data/user_db.sqlite
posit_connect:
  app_prod: yes
  user_identification: shiny_session
  study_data: study_data.rds
  meta_data: metadata.rds
  user_db: user_db.sqlite
  allow_to_review: [admin, medical_monitor, data_manager]

First and foremost, notice that the configuration can vary depending on deployment use case. For example, the above file is designed to run (by default) with test data built into the app. Only once in production mode, does the app actually leverage data you’ve gathered from your EDC system, as RDS files. As seen above, there are several other elements defined in golem-config.yml that can be configured before launching the application for the first time.

The main two elements are meta_data and study_data, accepting file paths to the app’s primary data sources, stored as RDS files. The study_data object should be created with the pre-processing helper function called merge_meta_with_data() which accepts raw data sources and merges them with the meta_data object defined in this article.

Other elements are:

user_db a Character string providing the path to the app’s review database. If it does not exist, one will be created based on the study_data and meta_data, with all data labeled as new/not yet reviewed.
credentials_db Character string. Path to the credentials database. Only needed when using shinymanager for user identification. The database will be created automatically if needed.

Depending on the EDC vendor used, the size, shape, and format of their “raw data” may vary. We’ve compiled a few functions that help admin users pre-process the data they own. However, before we deep dive into how to use those, let’s focus on what the final result of those pre-processing steps to understand what’s being fed to the app.

Data Specifications

Baked into the clinsight package is an internal data.frame called clinsightful_data, which is comprised of randomly generated test data for this application. Here is a preview of that data:

# Run to load the data into an R session. 
# pkg_name <- "clinsight"
# library(pkg_name, character.only = TRUE)
# data("clinsightful_data") 

head(clinsight::clinsightful_data)
#> # A tibble: 6 × 22
#>   site_code subject_id event_id event_date event_repeat form_id form_repeat
#>   <chr>     <chr>      <chr>    <date>            <int> <chr>         <int>
#> 1 BEL04     BEL_04_133 SCR      2023-06-05            1 DM                1
#> 2 BEL04     BEL_04_133 SCR      2023-06-05            1 DM                1
#> 3 BEL04     BEL_04_133 SCR      2023-06-05            1 DM                1
#> 4 BEL04     BEL_04_133 SCR      2023-06-05            1 DM                1
#> 5 BEL04     BEL_04_133 SCR      2023-06-05            1 DM                1
#> 6 BEL04     BEL_04_133 SCR      2023-06-05            1 ECOG              1
#> # ℹ 15 more variables: edit_date_time <dttm>, day <drtn>, event_name <chr>,
#> #   event_label <fct>, form_type <chr>, item_name <chr>, item_type <chr>,
#> #   item_group <chr>, item_unit <chr>, lower_lim <dbl>, upper_lim <dbl>,
#> #   item_value <chr>, significance <chr>, reason_notdone <chr>, region <chr>

This object is an example of a healthy study_data object, that should preferably be stored as an RDS file prior to launching the app. let’s inspect it a little more:

`study_data`

The RDS file (or data.frame) ported to the study_data element contains the following required columns below.

site_code: character or integer, identifier for study site; If an integer, recommended to add prefix “Site” as this will display more intuitively in the application’s UI
subject_id: character, unique identifier for a subject
event_repeat: integer, helps keep track of unique event_id for a single subject_id and event_date
event_id: character, names that help classify types of event_names into like-groups, generally characterized by site visits. For example, “SCR” for the screening visit, “VIS” for Visit X (where X is some integer), and “EXIT” for when the patient exits the study trial. However, some event_ids track events that could apply outside of any visit, like AE, ConMed, Medical History, etc.
event_name: character, an “event” generally characterizes some sort of site visit, whether that be a “Screening”, “Visit X” (where X is some integer), “Exit”, or “Any Visit”.
event_date: Date, the date associated with event_name
form_id: character, a unique identifier for the form the item_name metric and item_value were pulled from. Note: when item_type is continuous, form_id can contain several different item_groups. However, when item_type is ‘other’, item_group can be made up of several form_id values.
form_repeat: integer, helps keep track of unique item_names collected from a specific form_id for a given subject_id. form_repeat is particularly helpful when consolidating data like Adverse Events into this data format. Specifically, if more than one AE is collected on a patient, they’ll have more than one form_repeat
edit_date_time: datetime (POSIXct), the last time this record was edited
db_update_time: datetime (POSIXct), the last time the database storing this record was updated.
region: character, describing the region code that site_code falls under
day: a difftime number, meaning it contains both a number and unit of time. It measures the number of days each visit is from screening
vis_day: numeric, a numeric representation of day
vis_num: numeric, a numeric representation of event_name
event_label: character, an abbreviation of event_name
item_name: character, describes a metric or parameter of interest.
item_type: character, classifies item_names into either ‘continuous’ or ‘other’, where continuous types are those generally associated with the CDISC “basic data structure” (BDS). That is, each item_name metric is collected over time at a patient visit (event_name). The ‘other’ type represents all non-time dependent measures, like demographic info, adverse events, Medications, medical history, etc.
item_group: character, provides is a high level category that groups like-item_names together. For example, and item_group = ‘Vital Signs’ will group together pertinent item_name metrics like BMI, Pulse, Blood pressure, etc.
item_value: character, the measurement collected for a given item_name. The value collected may be a number like 150 (when collecting a patient’s weight) or a word (such as ‘white’ for the subject’s race).
item_unit: character, tracking the unit of measurement for item_name and item_value.
lower_lim: numeric, some item_names (particularly the ‘continuous’ type) have a pre-defined range of values that are considered normal. This is the lower limit to that range.
upper_lim: numeric, some item_names (particularly the ‘continuous’ type) have a pre-defined range of values that are considered normal. This is the upper limit to that range.
significance: character, either ‘CS’ which means ‘Clinically Significant’ or ‘NCS’ which means ‘Not Clinically Significant’
reason_notdone: character, an effort to describe why the item_value field is NA / missing.

Processing your Raw Data

So, the next logical question is “How do I get my EDC’s data into the study_data format?” Well, this package currently offers a pre-processing helper function called merge_meta_with_data() which accepts raw data sources from the [Viedoc] EDC vendor and merges them with the meta_data object (defined below) to create a viable study_data object. As such, we’ll spend some time covering what this helper function expects of your raw data and meta_data object and how it transforms it into the study_data object we need for app launch.

First, let’s discuss the app’s metadata needs!

`meta_data`

The meta_data object is a list of data.frames which (not surprisingly) contains metadata information for the application. It will provide study-specific settings, and controls where study data in the application will be visible. It can be created in the right format by changing the Excel template in the data-raw/metadata.xlsx, and then create a metadata object with the functionget_metadata (e.g. meta <- clinsight::get_metadata("path-to-custom-metadata.xlsx")). The meta_data object should be saved as a .rds file so that clinsight can use it (e.g. saveRDS(meta, "data_folder/metadata.rds")).

As stated previously, the metadata should also be used to shape the raw study_data in the right format, and to dictate which variables will be included in the study_data. This can be done by using the function merge_meta_with_data(), and will be described in detail in the next section. The goal is that most, if not all study-specific data will be captured in the metadata, leaving the scripts to run the application largely unaltered between studies.

Just like for study_data, this package also bundles a built-in metadata object called metadata. To view an example metadata file, run the following chunk of code:


meta_data <- clinsight::metadata
lapply(meta_data, head)
#> $column_names
#> # A tibble: 6 × 2
#>   name_raw  name_new    
#>   <chr>     <chr>       
#> 1 SiteCode  site_code   
#> 2 SubjectId subject_id  
#> 3 EventId   event_id    
#> 4 EventDate event_date  
#> 5 EventSeq  event_repeat
#> 6 FormId    form_id     
#> 
#> $events
#> # A tibble: 6 × 10
#>   event_id event_id_pattern is_regular_visit event_label_custom
#>   <chr>    <chr>            <lgl>            <chr>             
#> 1 SCR      ^SCR$            TRUE             NA                
#> 2 VIS1     ^VIS1$           TRUE             V1                
#> 3 VIS2     ^VIS2$           TRUE             V2                
#> 4 VIS3     ^VIS3$           TRUE             V3                
#> 5 VIS4     ^VIS4$           TRUE             V4                
#> 6 VIS5     ^VIS5$           TRUE             V5                
#> # ℹ 6 more variables: event_name_custom <chr>, is_baseline_event <lgl>,
#> #   generate_labels <lgl>, meta_event_order <int>, add_visit_number <lgl>,
#> #   add_event_repeat_number <lgl>
#> 
#> $common_forms
#>              item_name item_type     item_group merge_with
#> 1            AE Number     other Adverse events       <NA>
#> 2              AE Name     other Adverse events       <NA>
#> 3                 AESI     other Adverse events       <NA>
#> 4        AE start date     other Adverse events       <NA>
#> 5          AE end date     other Adverse events       <NA>
#> 6 AE date of worsening     other Adverse events       <NA>
#> 
#> $study_forms
#>                       item_name  item_type  item_group        unit lower_limit
#> 1       Systolic blood pressure continuous Vital signs        mmHg          90
#> 2      Diastolic blood pressure continuous Vital signs        mmHg          55
#> 3                         Pulse continuous Vital signs   beats/min          60
#> 4                          Resp continuous Vital signs breaths/min          12
#> 5                   Temperature continuous Vital signs          °C          35
#> 6 Weight change since screening continuous Vital signs        <NA>        <NA>
#>   upper_limit
#> 1         160
#> 2          90
#> 3         100
#> 4          20
#> 5        38.5
#> 6        <NA>
#> 
#> $general
#>            item_name item_type item_group
#> 1                Age     other    General
#> 2                Sex     other    General
#> 3               ECOG     other    General
#> 4           Eligible     other    General
#> 5      Eligible_Date     other    General
#> 6 WHO.classification     other    General
#> 
#> $form_level_data
#>         item_group item_scale use_unscaled_limits review_required
#> 1   Adverse events         NA                  NA            TRUE
#> 2       Medication         NA                  NA            TRUE
#> 3 Conc. Procedures         NA                  NA            TRUE
#> 4  Medical History         NA                  NA            TRUE
#> 5      Vital signs      FALSE                TRUE            TRUE
#> 6     Electrolytes       TRUE               FALSE            TRUE
#> 
#> $table_names
#> # A tibble: 6 × 2
#>   table_name raw_name      
#>   <chr>      <chr>         
#> 1 Edit date  edit_date_time
#> 2 Date       event_date    
#> 3 Event      event_label   
#> 4 Event      event_name    
#> 5 eN         event_repeat  
#> 6 Form       item_group    
#> 
#> $settings
#> $settings$pre_pivot_fns
#> [1] "apply_study_specific_suffix_fixes"
#> 
#> $settings$post_pivot_fns
#> [1] "apply_edc_specific_changes"
#> 
#> $settings$post_merge_fns
#> [1] "apply_study_specific_fixes"
#> 
#> $settings$treatment_label
#> [1] "💊 Tₓ"
#> 
#> 
#> $items_expanded
#> # A tibble: 6 × 9
#>   form_type    var       suffix item_name item_type item_group unit  lower_limit
#>   <chr>        <chr>     <chr>  <chr>     <chr>     <chr>      <chr> <chr>      
#> 1 common_forms AE_AESPID NA     AE Number other     Adverse e… NA    NA         
#> 2 common_forms AE_AETERM NA     AE Name   other     Adverse e… NA    NA         
#> 3 common_forms AE_AESI   NA     AESI      other     Adverse e… NA    NA         
#> 4 common_forms AE_AESTD… NA     AE start… other     Adverse e… NA    NA         
#> 5 common_forms AE_AEEND… NA     AE end d… other     Adverse e… NA    NA         
#> 6 common_forms AE_AESTD… NA     AE date … other     Adverse e… NA    NA         
#> # ℹ 1 more variable: upper_limit <chr>

Specifications for the list of data.frames include:

column_names: Used to map raw data variable names over to new names.
- name_raw: character, variable name in the raw data source
- name_new: character, desired variable name to use in study_data and in the application
events: Used to create a simple timeline in the application, with predefined number of planned visits, N. It contains the following columns:
- event_number: integer. Example: 0, 1, 2, …, N
- event_name: character. Example: “Screening”, “Visit 1”, “Visit 2”, …, “Visit N”
- event_label: character. Example: “V0”, “V1”, “V2”, …, “VN”
common_forms: Used to select and rename the variables of interest in the raw data when transformed into the desired study_data format. Note: creating the study_data data.frame should use merge_meta_with_data() where (not surprisingly), the metadata is merged with the raw study data. common_forms contains the columns below:
- var: character, the variable name to display in the table, mapped from a known item_name provided in study_data. Example: item_name = "AE Name" will be replaced by “AE_AETERM” when var = "AE_AETERM".
- suffix: Usually blank in this data.frame. This column is more commonly used in the study_forms data.frame
- item_name: character, known item_names found in study_data. There are certain item_names that are required, even if missing in study_data, including: “AE Name”, “AE start date”, “AE end date”, “AE date of worsening”, “AE CTCAE severity”, “AE CTCAE severity worsening”, “Serious Adverse Event”, and “SAE Start date”.
- item_type: character, known item_type corresponding to those found for item_names in study_data.
- item_group: character, known item_group corresponding to those found for item_names in study_data.
study_forms: Contains the same columns as the data.frame common_forms, and in addition the columns unit, lower_limit, upper_limit. Used to select and rename the raw data variables of interest. In addition, the suffix column is used more regularly in this “study” context. This is because, these variable names may have a consistent trunk / stem, with varying suffixes to describe a similar style measurement. So instead of creating a new row for these variables in the meta_data data.frame, we allow for inclusion of several suffixes. For example, VS_PULSE measures beats/min. Typically, these measures are collected using VS_PULSE_VSORRES & VS_PULSE_VSREAND, but in this format, we can list the stem “VS_PULSE” as the var and "VSORRES, VSREAND" in the suffix field. As for the new columns, they are defined as follows:
- unit: character, unit of measure
- lower_limit: numeric, the lower limit of what’s considered clinically significant
- upper_limit: numeric, the upper limit of what’s considered clinically significant
general: Contains the same columns as common_forms and is used in the same way. That is, it’s used to select and rename the raw data when transformed into the desired study_data format. Note: creating the study_data data.frame should use merge_meta_with_data() where (not surprisingly), the metadata is merged with the raw study data. Please refer to the common_forms spec above. However, I will note that there are certain item_names that are required, even if missing in study_data, including: “Age”, “Sex”, “ECOG”, “Eligible”, “WHO.classification”, “DiscontinuationReason”, “DrugAdminDate”, and “DrugAdminDose”.
groups: Contains the columns item_group, item_type, item_scale,use_unscaled_limits.
table_names: Used for renaming table column names into a more readable format. It is not required to name all column names; if the column names are not defined here, the raw name will be used instead.
- table_name: character,
- raw_name: character,

`get_metadata()` & `items_expanded`

So, lastly, you may notice that there is a 7th data.frame called items_expanded. This data.frame is actually derived after the user runs a helper function called get_metadata() which takes an XLSX file containing the first 6 data.frames (one per tab) and expands the tabs of your choosing with the column of your choosing. In other words, get_metadata() makes sure your existing metadata is in the correct format and helps us get create something a bit more polished & digestible, like items_expanded for clinsight::metadata. As you’ll see later, performing this step is crucial for merge_meta_with_data() which comes next. So, after you’ve compiled your metadata XLSX spreadsheet with the first 6 data.frames (tabs) mentioned above, you’re ready to run code that looks like the following:

# usethis::edit_r_environ()
meta_path <- Sys.getenv("METADATA_PATH")
meta_data <- get_metadata(file.path(meta_path, "my_metadata.xlsx"),
                          expand_tab_items = c("common_forms", "study_forms", "general"),
                          expand_cols = "suffix")

In summary, get_metadata will initiate the meta_data object with the first 6 data.frames directly from the Excel file, and then items_expanded will be created by expanding common_forms, study_forms, and general data.frames by the values stored in the suffix column. The result will be appended onto the meta_data object as the 7th data.frame in the list.

Once complete, save your metadata to an .RDS file:


saveRDS(meta_data, file.path(Sys.getenv("METADATA_PATH"), "meta_data.rds"))

Read in your study data with `get_raw_csv_data()`

Now that we have an understanding of our meta_data specs, we can discuss how they interact with your raw data. The rest of this vignette will feel more like an R script as we discuss how this happens. First, we need to read in our raw data. There is yet another helper function bundled in this package called get_raw_csv_data() to help us do that. It’s a pretty simple wrapper function that basically reads in raw data files stored as CSVs from a designated folder. As such, your function call should look something like the following code chunk. Notice, you can either set your raw data path explicitly, or in your .Renviron file.

# usethis::edit_r_environ()
data_path <- Sys.getenv("DATA_PATH")
raw_data <- get_raw_csv_data(data_path)

Finish the job with `merge_meta_with_data()`

Now that our data has been read in and minimally cleaned up, we can finally use the merge_meta_with_data() function as shown below:


study_data <- merge_meta_with_data(data = raw_data, meta = meta_data)

This function uses the rest of the metadata data.frames to further organize your raw data into something usable by the app, including but not limited to the following:

renaming columns to the required column names needed in the application
- it will use the column provided in the metadata tab column_names, and will verify if the new names still match with the expected/required names. Notice that all of these variables are required, so it’s important that you identify which variables from your EDC mirror them.
converting columns to the required column type (e.g. dates, character, integer)
fixing multiple choice variables using a function called fix_multiple_choice_vars()
adding derivative values and columns such as:
- vis_day and vis_num, day (variables based on event_id and event_date).
event_name & event_label have some clean up performed on it’s values to standardize them for presentation in the app. Last, the data is ordered by site_code and subject_id
merging the raw data with meta_data$items_expanded
applying any study specific fixes using a function called apply_study_specfific_fixes(). #TODO

Once complete, save your metadata to an .RDS file:


saveRDS(study_data, file.path(Sys.getenv("DATA_PATH"), "study_data.rds"))

Launch the app

Circling back to the configuration file we shared at the beginning of this vignette, you’re now ready to launch this application.

default:
  golem_name: clinsight
  golem_version: 0.0.0.9004
  app_prod: no
  user_identification: test_user
  study_data: !expr clinsight::clinsightful_data
  meta_data: !expr clinsight::metadata
  user_db: user_db.sqlite
  user_roles:
   Administrator: admin
   Medical Monitor: medical_monitor
   Data Manager: data_manager
shinymanager:
  app_prod: yes
  user_identification: shinymanager
  study_data: study_data.rds
  meta_data: metadata.rds
  credentials_db: credentials_db.sqlite

Introduction

Data Specifications

study_data

Processing your Raw Data

meta_data

get_metadata() & items_expanded