Introduction
The rbmi multiple imputation pipeline produces several intermediate objects – draws, imputation, analysis, and pool – each containing useful metadata about the imputation model and results. rbmiUtils v0.3.0 adds tools to inspect these objects and extract diagnostic statistics, making it easier to verify that the MI pipeline behaved as expected.
This vignette covers three features:
-
describe_draws(): structured metadata from draws objects (method, formula, samples, MCMC convergence) -
describe_imputation(): structured metadata from imputation objects (method, M, references, missingness breakdown) -
pool_to_ard()MI diagnostic enrichment: fraction of missing information (FMI), lambda, relative increase in variance (RIV), and other Rubin’s rules diagnostics embedded in the ARD output
Setup
We load the required packages and prepare data for the
pool_to_ard() diagnostic enrichment examples in Section 5.
This setup uses analyse_mi_data() which works directly with
the pre-imputed ADMI dataset (containing an
IMPID column), so there is no need for draws()
or impute().
library(rbmiUtils)
library(rbmi)
library(dplyr)
data("ADMI", package = "rbmiUtils")
ADMI <- ADMI |>
mutate(
TRT = factor(TRT, levels = c("Placebo", "Drug A")),
USUBJID = factor(USUBJID),
AVISIT = factor(AVISIT)
)
vars <- set_vars(
subjid = "USUBJID", visit = "AVISIT", group = "TRT",
outcome = "CHG", covariates = c("BASE", "STRATA", "REGION")
)
method <- method_bayes(
n_samples = 100,
control = control_bayes(warmup = 200, thin = 5)
)
ana_obj <- analyse_mi_data(ADMI, vars, method, fun = ancova)
pool_obj <- pool(ana_obj)Inspecting Draws with describe_draws()
The describe_draws() function extracts structured
metadata from an rbmi draws object, providing a quick
summary of the imputation model configuration and, for Bayesian methods,
MCMC convergence diagnostics.
The code below uses ADEFF data and defines its own
vars and method objects. We use
eval = FALSE because draws() runs MCMC
sampling, which is too slow for vignette builds.
data("ADEFF", package = "rbmiUtils")
ADEFF <- ADEFF |>
mutate(
TRT = factor(TRT01P, levels = c("Placebo", "Drug A")),
USUBJID = factor(USUBJID),
AVISIT = factor(AVISIT, levels = c("Week 24", "Week 48"))
)
vars <- set_vars(
subjid = "USUBJID", visit = "AVISIT", group = "TRT",
outcome = "CHG", covariates = c("BASE", "STRATA", "REGION")
)
method <- method_bayes(
n_samples = 100,
control = control_bayes(warmup = 200, thin = 2)
)
dat <- ADEFF |> select(USUBJID, STRATA, REGION, TRT, BASE, CHG, AVISIT)
draws_obj <- draws(data = dat, vars = vars, method = method)
desc <- describe_draws(draws_obj)
print(desc)Example output from describe_draws():
-- Draws Summary --
Method: Bayesian (MCMC via Stan)
Formula: CHG ~ 1 + BASE + STRATA + REGION + TRT + AVISIT + TRT:AVISIT
Samples: 100
Failures: 0
Covariance: us
Same covariance across groups: Yes
--
-- MCMC Convergence --
v All Rhat < 1.1 (42 parameters)
Max Rhat: 1.003
Min ESS: 245.2
The returned object is a list with programmatic access to all fields:
-
$method– human-readable method name (e.g., “Bayesian (MCMC via Stan)”) -
$method_class– raw class:"bayes","approxbayes", or"condmean" -
$formula– the deparsed model formula string -
$n_samples– total number of samples drawn -
$n_failures– number of failed samples -
$mcmc– (Bayesian only) list withrhat,ess,max_rhat,min_ess,n_params,converged
Inspecting Imputations with describe_imputation()
The describe_imputation() function extracts metadata
from an rbmi imputation object, including the method,
number of imputations (M), reference arm mappings, and a missingness
breakdown by visit and treatment arm.
This section continues from the draws_obj created in the
code above (Section 3). Again, we use eval = FALSE because
the pipeline requires MCMC.
impute_obj <- impute(
draws_obj,
references = c("Placebo" = "Placebo", "Drug A" = "Placebo")
)
desc <- describe_imputation(impute_obj)
print(desc)Example output from describe_imputation():
-- Imputation Summary --
Method: Bayesian (MCMC via Stan)
Imputations (M): 100
Subjects: 200
--
-- References --
Placebo -> Placebo
Drug A -> Placebo
-- Missingness by Visit and Arm --
visit group n_total n_miss pct_miss
Week 24 Placebo 100 8 8.0
Week 24 Drug A 100 10 10.0
Week 48 Placebo 100 15 15.0
Week 48 Drug A 100 18 18.0
The returned object provides programmatic access to:
-
$method– human-readable method name -
$n_imputations– number of imputations (M) -
$n_subjects– total number of unique subjects -
$references– named character vector of reference arm mappings (orNULL) -
$missingness– adata.framewith columnsvisit,group,n_total,n_miss,pct_miss
MI Diagnostic Statistics in ARD
The pool_to_ard() function converts a pool object to the
pharmaverse Analysis Results Dataset (ARD) format. When you also pass
the analysis_obj, it enriches the ARD with MI diagnostic
statistics computed from Rubin’s rules.
# Base ARD (no diagnostics)
ard <- pool_to_ard(pool_obj)
# Enriched ARD with MI diagnostics
ard_enriched <- pool_to_ard(pool_obj, analysis_obj = ana_obj)The enriched ARD includes additional rows for each parameter with diagnostic statistics. We can filter and display them:
ard_enriched |>
dplyr::filter(stat_name %in% c("fmi", "lambda", "riv", "df.adjusted", "re")) |>
dplyr::select(group1_level, variable_level, stat_name, stat)
#> {cards} data frame: 30 x 4
#> group1_level variable_level stat_name stat
#> 1 Week 24 NA fmi 0.023
#> 2 Week 24 NA lambda 0.019
#> 3 Week 24 NA riv 0.019
#> 4 Week 24 NA df.adjusted 480.772
#> 5 Week 24 NA re 1
#> 6 Week 24 NA fmi 0.021
#> 7 Week 24 NA lambda 0.017
#> 8 Week 24 NA riv 0.018
#> 9 Week 24 NA df.adjusted 481.809
#> 10 Week 24 NA re 1
#> ℹ 20 more rows
#> ℹ Use `print(n = ...)` to see more rowsEach diagnostic statistic has a specific interpretation:
-
FMI (fraction of missing information) – the
adjusted proportion of total sampling variance attributable to missing
data, following the mice convention:
(riv + 2/(df + 3)) / (1 + riv) - lambda – the proportion of total variance due to between-imputation variance (missingness)
-
RIV (relative increase in variance) – the ratio of
between-imputation variance to within-imputation variance, scaled by
(1 + 1/M) - df.adjusted – Barnard-Rubin adjusted degrees of freedom, which accounts for finite complete-data degrees of freedom
-
re (relative efficiency) –
1 / (1 + fmi/M), the efficiency of the MI estimator relative to an estimator with infinite imputations
When Diagnostics Are Not Available
Non-Rubin pooling methods (e.g., conditional mean with jackknife) do
not produce MI diagnostic statistics because the variance decomposition
does not apply. When pool_to_ard() is called with an
analysis_obj from a non-Rubin method, it emits an
informative message and omits diagnostic rows from the ARD.
The describe_draws() and
describe_imputation() functions work with all method types
(Bayesian, approximate Bayesian, and conditional mean).
Learn More
- From rbmi Analysis to Regulatory Tables – the full end-to-end pipeline vignette
-
pool_to_ard()– function documentation with ARD format details -
describe_draws()anddescribe_imputation()– function documentation with full field descriptions
