Selects rows in a database with the maximum value of one or more variables within every group, and collects the resulting table in memory. Mainly designed to be used with date-time variables, thus selecting on latest time, but can be used with any variable for row selection.
Arguments
- data
Can either be a data frame, or a character string with the file path to the database.
- db_table
Character string. Name of the table to collect. Will only be used if
data
is a character string to a database.- slice_vars
Character vector. Names of the variables that will be used to slice the data frame. Note that the order matters: Slicing will occur for each variable in this vector successively,
- group_vars
Character vector. Variable names of the variables to perform the grouping on.
- warn_duplicates
Logical. Whether to warn if duplicate rows are detected after performing the grouping.
Details
Needs dbplyr
version 2.4.0 or later, otherwise the .data
pronoun will not
work within dplyr::slice_max()
when used on a SQL database connection (see
this issue).
In the application, we slice for two date-time variables to
collect active review data. The first date-time variable is the timestamp
added after reviewing a data point in the application. This allows
us to keep an audit trail, by keeping data of all review actions in the
database, but only selecting the latest review data each timestamp allows us
to add an audit trail of the review actions in the main database and allow
the user to update their review, while only showing the main data in the
application.