Slice database rows — db_slice

Selects rows in a database with the maximum value of one or more variables within every group, and collects the resulting table in memory. Mainly designed to be used with date-time variables, thus selecting on latest time, but can be used with any variable for row selection.

Usage

db_slice_rows(
  data,
  db_table = "all_review_data",
  slice_vars = c("timestamp", "edit_date_time"),
  group_vars = c("subject_id", "event_name", "item_group", "form_repeat", "item_name"),
  warn_duplicates = TRUE
)

Arguments

data: Can either be a data frame, or a character string with the file path to the database.
db_table: Character string. Name of the table to collect. Will only be used if data is a character string to a database.
slice_vars: Character vector. Names of the variables that will be used to slice the data frame. Note that the order matters: Slicing will occur for each variable in this vector successively,
group_vars: Character vector. Variable names of the variables to perform the grouping on.
warn_duplicates: Logical. Whether to warn if duplicate rows are detected after performing the grouping.

Value

A data frame with the same or less amount of rows as the data frame in the database.

Details

Needs dbplyr version 2.4.0 or later, otherwise the .data pronoun will not work within dplyr::slice_max() when used on a SQL database connection (see this issue).

In the application, we slice for two date-time variables to collect active review data. The first date-time variable is the timestamp added after reviewing a data point in the application. This allows us to keep an audit trail, by keeping data of all review actions in the database, but only selecting the latest review data each timestamp allows us to add an audit trail of the review actions in the main database and allow the user to update their review, while only showing the main data in the application.