Statistical Specifications

1 Scope of this document

This document describes the statistical methods implemented in the bonsaiforest2 R package for Bayesian subgroup analysis of continuous, binary, count, and time-to-event outcomes in randomized clinical trials (RCTs).

The package enable the implementation of both a global modeling approach (Wolbers et al. 2025), that estimates all prognostic and predictive effects in a single model, and a One-variable-at a time (Wang et al. 2024), that estimates the predictive effects using a different model for each subgrouping variable.

The core features described in this document are:

Flexible Model Specification: A unified framework to distinguish between:
- Prognostic (main) vs. Predictive (interaction) effects.
- Shrunk vs. Unshrunk coefficients. Users can specify unshrunk terms (with weakly informative priors), and add shrunk prognostic effects and/or shrunk predictive effects (with strong regularization priors) as needed for their analysis.
- Customizable Priors: Users can specify custom priors for individual coefficients or coefficient groups when domain expertise is available. If no custom priors are specified, the package applies automatic, well-calibrated default priors.
Advanced Shrinkage Priors: Supports state-of-the-art shrinkage priors, including the Regularized Horseshoe (Piironen and Vehtari 2017; Wolbers et al. 2025) and R2D2 (Zhang et al. 2022).
Marginal Effect Estimation: Use of standardization (G-computation) to derive interpretable marginal subgroup treatment effects, which average over the covariate distributions within each subgroup (Hernán and Robins 2023). This is implemented through:
- estimate_subgroup_effects() for the computational G-computation workflow.
- summary_subgroup_effects() for user-friendly effect summaries.

This document is structured as follows: Section 2 provides a conceptual introduction to subgroup analysis challenges and the Bayesian shrinkage solution. Section 3 details the statistical methodology, including notation, the global and the one-variable at a time models, endpoint-specific likelihoods, prior specifications, and the standardization procedure. Section 4 maps these statistical methods to the core functions in the bonsaiforest2 package.

2 Introduction to Subgroup Analysis and Estimation

2.1 The Challenge of Subgroup Analysis

Exploratory subgroup analyses are a standard part of RCT reporting, typically visualized in forest plots to assess the consistency of treatment effects. However, their interpretation is notoriously difficult due to several statistical challenges:

Low Power: Subgroups have smaller sample sizes, leading to low precision (wide confidence intervals) and an inability to detect true, modest differences in effect.
Multiplicity: Examining many subgroups (e.g., 20 or more) inflates the risk of false-positive findings. Investigators may focus on the most extreme results, which are often spurious.
Overfitting: Standard subgroup-specific estimates (fitting a model only to data within that subgroup) are unbiased but have high variance, leading to an overestimation of heterogeneity.

This has led to the common recommendation that the overall treatment effect is often a more reliable estimate for a subgroup than the estimate derived from the subgroup’s data alone.

2.2 Prognostic vs. Predictive Effects

To build a meaningful model, it is crucial to distinguish how a baseline variable relates to the outcome:

Prognostic Effect: A variable is prognostic if it is associated with the clinical outcome, regardless of the treatment received. It describes the natural course of the disease. In a model, this is a main effect.
Predictive Effect: A variable is predictive if its value modifies the magnitude or direction of the treatment’s effect. It identifies heterogeneity. In a model, this is a treatment-by-variable interaction.

bonsaiforest2 is built to model both types of effects explicitly and apply different modeling strategies (e.g., shrinkage) to each.

2.3 The Role of Bayesian Shrinkage

Bayesian shrinkage methods directly address the challenges of subgroup analysis by providing a principled compromise between the two extremes of “no pooling” (subgroup-specific estimates) and “full pooling” (overall population estimate).

This compromise, often called partial pooling, is achieved using hierarchical models. Subgroup effects are assumed to be exchangeable (drawn from a common distribution). This assumption allows subgroups to “borrow strength” from each other:

Subgroups with small sample sizes and extreme results are “shrunk” heavily toward the overall mean, reducing their variance.
Subgroups with large sample sizes and strong evidence of a real effect are shrunk less, allowing their data to dominate.

This results in estimates with a better bias-variance trade-off: we accept a small amount of bias (by pulling real effects slightly toward the mean) in exchange for a large reduction in variance, leading to a lower overall error and better control of spurious findings.

2.4 Conditional vs. Marginal Estimands

When covariates are included in a non-linear model (like a logistic or Cox model), the resulting coefficients represent conditional effects. For example, the treatment effect is the effect for a patient with a specific set of covariate values (e.g., the reference values).

This is problematic for forest plots, where we want to know the marginal effect: “What is the average treatment effect for all patients in the ‘Female’ subgroup, averaging over their different ages, regions, etc.?”.

Because different subgroups have different covariate distributions (e.g., the age >= 65 subgroup will have a different health status profile than the age < 65 group), these conditional coefficients are not directly comparable.

bonsaiforest2 solves this by using Standardization (G-computation) (Hernán and Robins 2023). This procedure correctly averages over the specific covariate distribution of each subgroup to estimate a valid marginal treatment effect, ensuring all subgroups are compared on the same interpretable scale. (See Section 3.10 for a detailed explanation of how this standardization is performed in the package.)

3 Statistical Methodology

3.1 Setting and Notation

We establish the following notation for the global model:

\(i = 1, \dots, N\): Index for individual patients.
\(y_i\): The outcome for patient \(i\). We will consider that this variable can only be a binary, count, continuous or time-to-event endpoint.
\(s_i\): The binary treatment indicator (\(s_i=1\) for treatment, \(s_i=0\) for control).
\(l = 1, \dots, L\): Index for an overall subgroup level (e.g., “Male”, “Female”, “EU”, “USA”).
\(x_{il}\): Indicator (0/1) for patient \(i\) belonging to subgroup level \(l\) (used for prognostic effects).
\(z_{il} = s_i \cdot x_{il}\): The interaction between treatment and subgroup level \(l\) (used for predictive effects).
\(u_{iv}\): Value of the \(v\)-th additional non-subgroup covariate for patient \(i\). This additional covariates don’t need to be binary or categorical.
\(\boldsymbol{\mu}\): The \(N \times 1\) vector of expected outcomes.
\(g(\cdot)\): A link function. In Section 3.7 we define the used link function for each type of endpoint.
\(\boldsymbol{\eta}\): The \(N \times 1\) vector of the linear predictor.

3.2 Design Matrix Considerations

A critical implementation detail is the construction of the design matrix, which varies based on whether coefficients are subject to shrinkage.

The package applies different contrast coding schemes for shrunk versus unshrunk terms:

For unshrunk terms (prognostic or predictive): Standard dummy coding is used, where one level per factor is treated as the reference category and omitted from the design matrix. This approach:
- Ensures interpretability of coefficients relative to a baseline
- Avoids collinearity with the intercept
- Results in L−J total columns for J subgrouping variables with L total levels
- Is appropriate when coefficients are given weakly informative priors without strong regularization
For shrunk terms (prognostic or predictive): One-hot encoding is used, creating explicit dummy variables for all levels of each factor, with no reference level omitted. For example, a subgrouping variable with levels {A, B, C} creates three columns: subgroupA, subgroupB, subgroupC (for prognostic effects) or trt:subgroupA, trt:subgroupB, trt:subgroupC (for predictive effects). This approach ensures:
- All subgroup levels are treated symmetrically under the exchangeability assumption of the shrinkage prior
- No single level serves as an anchor point toward which other levels are shrunk
- Each level has its own interpretable parameter that can be independently regularized
- Predictions work correctly through G-computation (see Section 3.10)

The rationale for one-hot encoding in shrunk terms is that shrinkage priors assume exchangeability among coefficients—all levels are drawn from the same prior distribution. Having a reference level would break this symmetry and bias the shrinkage process.

3.3 The Principle of Model Hierarchy

A fundamental concept in regression modeling is the principle of model hierarchy (or marginality). This principle states that if a model includes an interaction term (e.g., A:B), it should also include the corresponding main effects (e.g., A and B).

In the context of bonsaiforest2, this means that any subgrouping variable included as predictive (i.e., in an interaction with treatment, like trt:subgroup) should also be included as prognostic (i.e., as a main effect, subgroup).

Including the main effect ensures that the interaction term purely captures the modification of the treatment effect, rather than a mix of the main prognostic effect and the interaction, leading to a more stable and interpretable model.

The bonsaiforest2 package checks adherence to this principle during model specification. The prepare_formula_model function examines whether a main effect is specified as prognostic for every variable included in a predictive term. If it is not, the function issues a warning recommending that the user include the corresponding main effect to ensure proper model hierarchy. However, the package does not enforce this automatically, allowing users the flexibility to override this recommendation if justified by domain knowledge or specific modeling goals.

3.4 The Global Model

The package enables the implementation of a global model (Wolbers et al. 2025) where all effects are estimated in a single, comprehensive regression formula.

3.4.1 Full Model Equation

Matrix form:

The model for the linear predictor, \(\boldsymbol{\eta}\), for all \(N\) patients is given by:

\[g(\boldsymbol{\mu}) = \boldsymbol{\eta} = \underbrace{\mathbf{X_p}\boldsymbol{\beta_{1,p}}}_{\text{Penalized prognostic effects}} +\underbrace{\mathbf{X_n}\boldsymbol{\beta_{1,n}}}_{\text{Non-penalized prognostic effects}} +\underbrace{\mathbf{Z_p}\boldsymbol{\beta_{2,p}}}_{\text{Penalized predictive effects}} +\] \[ \underbrace{\mathbf{Z_n}\boldsymbol{\beta_{2,n}}}_{\text{Non-penalized predictive effects}}+ \underbrace{\mathbf{U}\boldsymbol{\beta_{3}}}_{\text{Extra prognostic effects}} \]

The Bayesian hierarchical structure is applied as follows:

Penalized Coefficients: \(\boldsymbol{\beta_{1,p}}\) and \(\boldsymbol{\beta_{2,p}}\) are given shrinkage priors.
Non-Penalized Coefficients: \(\boldsymbol{\beta_{1,n}}\) and \(\boldsymbol{\beta_{2,n}}\) are given standard, weakly informative priors.
Extra prognostic effects Coefficients: \(\boldsymbol{\beta_3}\) are given standard, weakly informative priors.

Components definition:

\(\mathbf{X}= [\mathbf{X_p} \;\; \mathbf{X}_n]\) is the \(N \times M\) design matrix for the prognostic effects. It typically includes:
- An intercept column. In Section 3.6 we discuss the inclusion and the prior we give to the intercept in detail
- A column for the main treatment effect.
- Columns for the prognostic effects of each subgroup level. The contrast coding depends on shrinkage status:
  - Unshrunk terms (\(\mathbf{X}_n\)): Use dummy coding (one reference level per factor is dropped)
  - Shrunk terms (\(\mathbf{X}_p\)): Use one-hot encoding (all levels included to maintain exchangeability)
  Example:
```
# Example: Unshrunk prognostic effects using dummy coding
design_matrix_df <- data.frame(
  Patient    = c(1, 2, 3, 4),
  Intercept  = c(1, 1, 1, 1),
  trt        = c(0, 1, 0, 1),
  sexF       = c(0, 1, 1, 0),
  regionUS   = c(0, 0, 1, 1),
  regionAsia = c(0, 1, 0, 0)
)

knitr::kable(
  design_matrix_df,
  caption = "Example design matrix for unshrunk prognostic effects (dummy coding).",
  align = c('l', 'c', 'c', 'c', 'c')
)
```
Table 3.1: Example design matrix for unshrunk prognostic effects (dummy coding).
Patient Intercept trt sexF regionUS regionAsia

1 1 0 0 0 0

2 1 1 1 0 1

3 1 0 1 1 0

4 1 1 0 1 0

Note: For unshrunk terms, M=1+1+L-J= 1 for intercept + 1 for treatment + L (total subgroup levels) - J (number of subgrouping variables) because reference levels are dropped. For shrunk terms, all L levels are included (no reference level dropped).
\(\boldsymbol{\beta_{1,p}}\) is the \(P \times 1\) vector of coefficients for the prognostic effects that we want to shrink. We will give an shrinkage prior to this coefficients.
\(\boldsymbol{\beta_{1,n}}\) is the \((M-P) \times 1\) vector of coefficients for the prognostic effects that we don’t want to shrink.
\(\mathbf{Z}= [\mathbf{Z_p} \;\; \mathbf{Z}_n]\) is the design matrix for the predictive effects. The contrast coding depends on shrinkage status:
- Unshrunk predictive terms (\(\mathbf{Z}_n\)): Use dummy coding (one reference interaction per factor is dropped)
- Shrunk predictive terms (\(\mathbf{Z}_p\)): Use one-hot encoding where each column trt_subgroupLEVEL equals 1 when patient \(i\) is both on treatment (\(s_i = 1\)) AND belongs to that specific subgroup level, and 0 otherwise. All \(L\) interaction dummies are included to ensure all subgroups are treated symmetrically under the exchangeability assumption.

Table 3.1: Example design matrix for unshrunk prognostic effects (dummy coding).
Patient	Intercept	trt	sexF	regionUS	regionAsia
1	1	0	0	0	0
2	1	1	1	0	1
3	1	0	1	1	0
4	1	1	0	1	0

# Example: Shrunk predictive effects using one-hot encoding
# Each column represents trt_subgroupLEVEL (1 if treated AND in that level, 0 otherwise)
interaction_matrix <- data.frame(
  Patient           = 1:6,
  `trt_sexM`        = c(1, 0, 1, 1, 0, 0),
  `trt_sexF`        = c(0, 1, 0, 0, 1, 1),
  `trt_regionEU`    = c(0, 1, 0, 0, 1, 0),
  `trt_regionUS`    = c(1, 0, 0, 1, 0, 0),
  `trt_regionAsia`  = c(0, 0, 1, 0, 0, 1)
)

knitr::kable(
  interaction_matrix,
  align = 'c',
  col.names = c("Patient", "trt_sexM", "trt_sexF", "trt_regionEU", "trt_regionUS", "trt_regionAsia"),
  caption = "Example design matrix for shrunk predictive effects (one-hot encoding)"
)

Table 3.2: Example design matrix for shrunk predictive effects (one-hot encoding)
Patient	trt_sexM	trt_sexF	trt_regionEU	trt_regionUS	trt_regionAsia
1	1	0	0	1	0
2	0	1	1	0	0
3	1	0	0	0	1
4	1	0	0	1	0
5	0	1	1	0	0
6	0	1	0	0	1

\(\boldsymbol{\beta_{2,p}}\) is the \(R \times 1\) vector of coefficients for the predictive effects that we want to shrink. We will give an shrinkage prior to this coefficients.
\(\boldsymbol{\beta_{2,n}}\) is the \((L-R) \times 1\) vector of coefficients for the predictive effects that we don’t want to shrink.
\(U\) is the \(N \times V\) design matrix for the extra variables we want to take into account in the fitting of the model. This will also be unpenalized.
\(\boldsymbol{\beta_3}\) is the \(V \times 1\) vector of extra prognostic effects coefficients

	Shrinkage	Not Shrinkage
Prognostic Effect	\(\mathbf{X_p}\boldsymbol{\beta_{1,p}}\)	\(\mathbf{X_n}\boldsymbol{\beta_{1,n}}\)
Predictive Effect	\(\mathbf{Z_p}\boldsymbol{\beta_{2,p}}\)	\(\mathbf{Z_n}\boldsymbol{\beta_{2,n}}\)

Linear form:

To make it more concrete, the linear predictor for a single patient \(i\) can be written as:

\[ \eta_i = \underbrace{\beta_0 + \beta_{\text{treat}}s_i + \underbrace{\sum_{l=1}^{P} \beta^l_{1,p}x^{il}_n}_\text{penalized}+ \underbrace{\sum_{l=P+1}^{M} \beta^l_{1,n}x^{il}_n}_\text{Not penalized}+\underbrace{\sum_{v=1}^{V} \beta_{3}^vu^{iv}}_{\text{Extra }}}_{\text{Prognostic Effects}} + \underbrace{\underbrace{\sum_{l=1}^{L-R} \beta_{2,p}^l z_p^{il}}_{\text{penalized }} + \underbrace{\sum_{l=L-R+1}^{L} \beta_{2,n}^l z_n^{il}}_{\text{Not penalized}} }_{\text{Predictive effects}} \]

Note: See that even if \(z^{il}=x^{il}\cdot s^{i}\), as the prognostic effect or the predictive effect of the same subgrouping variable may be penalized or not, we separate it in \(x\) and \(z\) in the formula.

3.4.2 Example

A randomized, double-blind trial comparing a new drug, DrugX, against a placebo in patients with hypertension.

Primary Endpoint (\(Y_{i}\)): Change from baseline in systolic blood pressure (SBP) in mmHg at 6 months.
Subgroups of Interest:
- Age Group: <65 vs. ≥65 years.
- Sex: Male vs. Female.
- Kidney Function: eGFR <60 (impaired) vs. ≥60 (normal).
Covariates: BaselineSBPᵢ, BMIᵢ, Smokerᵢ (binary).

Global Model with Shrinkage

The outcome is assumed to be normally distributed: \(Y_{i} \sim N(\mu_{i}, \sigma^2)\).

The linear predictor \(\mu_{i}\) is modeled as:

\[ \mu_{i} = \underbrace{\beta_0}_{\text{Intercept}} + \underbrace{\beta_{\text{treat}} \cdot \text{Treatment}_i}_{\text{Main Trt Effect}} + \underbrace{\sum_{k=1}^{K} \alpha_k \cdot \text{Subgroup}_{ik}}_{\text{Prognostic Subgroup Effects}} + \underbrace{\sum_{k=1}^{K} \gamma_k \cdot (\text{Subgroup}_{ik} \cdot \text{Treatment}_i)}_{\text{Shrunken Predictive Effects}} + \underbrace{\sum_{j=1}^{J} \delta_j \cdot \text{Covariate}_{ij}}_{\text{Extra Prognostic effects}} \]

\[ =\underbrace{\beta_0}_{\text{Intercept}} + \underbrace{\beta_{\text{treat}} \cdot \text{Treatment}_i}_{\text{Main Trt Effect}} + \underbrace{\alpha_1 \cdot \text{Age Group}_i+\alpha_2 \cdot \text{Sex}_i+\alpha_3 \cdot \text{eGFR}_i}_{\text{Prognostic Subgroup Effects}}+ \]

\[ + \underbrace{\gamma_1 \cdot (\text{Age Group}_i\cdot \text{Treatment}_i)+\gamma_2 \cdot (\text{Sex}_i\cdot \text{Treatment}_i)+\gamma_3 \cdot (\text{eGFR}_i\cdot \text{Treatment}_i)}_{\text{Predictive Effects}} + \underbrace{\delta_1 \cdot \text{BaselineSBP}_i+\delta_2 \cdot \text{BMI}_i+\delta_3 \cdot \text{Smoker}_i}_{\text{Extra Prognostic effects}} \]

3.5 One-Variable-at-a-Time Model

The one-variable-at-a-time model (Wang et al. 2024) is a Bayesian hierarchical model (BHM) that improves upon standard subgroup analysis by borrowing information across subgroup levels to produce more stable and reliable estimates. Instead of analyzing each subgroup level in isolation, it analyzes all levels within a single subgrouping variable (e.g., all age groups) together in one model.

For example, for the variable Sex (with levels Male and Female), you would fit one hierarchical model. In this single model, the treatment effects for Male (\(\beta_{2,\text{sex,male}}\)) and Female (\(\beta_{2,\text{sex,female}}\)) are treated as exchangeable and are linked by a common prior distribution. Then, a completely separate hierarchical model would be fit for the variable Region.

3.5.1 Full Model Equation

For a given subgrouping variable \(j\), we can specify a flexible linear predictor that allows for some covariate effects to be constant across the levels of \(j\), while others are allowed to vary.

Let’s partition the patient-level covariates into two sets:

\(\mathbf{u}_i\): A vector of covariates assumed to have a constant prognostic effect across all levels of the subgrouping variable \(j\).
\(\mathbf{v}_i\): A vector of covariates for which there is a strong a priori reason to believe their prognostic effect varies across the levels \(k\) of the subgrouping variable \(j\).

The linear predictor for patient \(i\) in level \(k\) of variable \(j\) is then: \[g(\mu_{i,j,k}) = \beta_{1,j,k} + \beta_{2,j,k}z_i + \boldsymbol{\beta}_{3,j}\mathbf{u}_i + \boldsymbol{\beta}_{4,j,k}\mathbf{v}_i\]

\(\beta_{2,j,k}\): The predictive effect for level \(k\) of variable \(j\). This is the primary parameter of interest for shrinkage.
\(\boldsymbol{\beta}_{3,j}\mathbf{u}_i\): The effect of covariates assumed to be constant across levels. The coefficient vector \(\boldsymbol{\beta}_{3,j}\) does not have a \(k\) subscript.
\(\boldsymbol{\beta}_{4,j,k}\mathbf{v}_i\): The effect of covariates assumed to be level-specific. The coefficient vector \(\boldsymbol{\beta}_{4,j,k}\) does have a \(k\) subscript, allowing its effect to be different for each subgroup level. This can be seen as an interaction between variable \(x_j\) and \(v_i\).

Note that for coefficients \(\boldsymbol{\beta}_{3,j}\) and \(\boldsymbol{\beta}_{4,j,k}\) we can use shrinkage or standard priors depending in our prior beliefs.

3.5.2 Example

Similarly to the example showed for the Global model. The example below is for the Age Group variable, where \(k \in \{ \text{<65}, \text{≥65} \}\).

The outcome for a patient i in age subgroup k is: \(Y_{ik} \sim N(\mu_{ik}, \sigma_k^2)\).

The linear predictor \(\mu_{ik}\) is:

\[ \mu_{ik} = \underbrace{\beta_{1,k}}_{\text{Subgroup Intercept}} + \underbrace{\beta_{2,k} \cdot \text{Treatment}_i}_{\text{Predictive Effect}} + \underbrace{\delta_1 \cdot \text{BMI}_i + \delta_2 \cdot \text{Smoker}_i}_{\text{Constant Prognostic Effects}} + \underbrace{\delta_{3,k} \cdot \text{BaselineSBP}_i}_{\text{Varying Prognostic Effect}} \]

Here, \(\beta_{1,k}\) (subgroup-specific intercept) and \(\beta_{2,k}\) (subgroup-specific predictive effect) are given hierarchical priors to borrow information across age groups. For example: \(\beta_{2,k} \sim N(\mu_2, \tau_2^2)\).
The effect of Baseline SBP is also allowed to vary by age group (\(\delta_{3,k}\)), while BMI and Smoking effects are assumed constant

3.6 The Logic of the Intercept

While the intercept (\(\beta_0\)) is not subject to shrinkage (it is a fixed effect within the unshrunktermeffect component), it is important to understand how it is specified in the package.

Default Specification: The package uses NULL (brms default) for the intercept prior, allowing brms to automatically calibrate a weakly informative prior.

User Customization: Users can specify custom intercept priors via the intercept_prior parameter in fit_brms_model() or run_brms_analysis(). For example:

For continuous outcomes: "normal(0, 10)" provides a weakly informative prior centered at 0.
For binary and count outcomes: "normal(0, 5)" provides a prior on the log-odds or log-rate scale covering plausible effect sizes (e.g., ratios from approximately 0.08 to 12).

Special Case: Time-to-Event Endpoints. For Cox proportional hazards models, the intercept is not identifiable and is omitted from the linear predictor.

Mechanism: The intercept is effectively absorbed into the non-parametric baseline hazard function, \(h_0(t)\).
Implementation: The brms backend handles this automatically by estimating the baseline hazard (via bhaz()) rather than a fixed intercept parameter. Therefore, no intercept prior is applied for survival models.

Additional Note: The intercept prior is automatically omitted if the user specifies a formula without an intercept (using ~ 0 + ... syntax).

3.7 Endpoint-Specific Models

The package supports four primary endpoint types by specifying the appropriate likelihood and link function for \(g(\boldsymbol{\mu}) = \boldsymbol{\eta}\).

Endpoint	Distribution / Model	Link Function	Notes
Continuous	Normal: \(y_i \sim N(\mu_i, \sigma^2)\)	Identity: \(\mu_i = \eta_i\)	\(\sigma\) can be stratified (e.g., `sigma ~ 0 + clinic_site`).
Binary	Bernoulli: \(y_i \sim \text{Bernoulli}(p_i)\)	Logit: \(\text{logit}(p_i) = \eta_i\)
Count	Negative Binomial: \(y_i \sim \text{NegBin}(\mu_i, \phi)\)	Log: \(\log(\mu_i) = \eta_i\)	\(\phi\) (overdispersion) can be stratified. Supports `offset()`.
Time-to-event	Cox Proportional Hazards: \(h_i(t) = h_0(t)\exp(\eta_i)\)	Log (on the hazard)	Intercept is absorbed by the baseline hazard \(h_0(t)\). \(h_0(t)\) can be stratified.

3.8 Prior Specifications

3.8.1 Default Priors

When no user-specified priors are provided, the package applies the following automatic defaults:

For all response types (continuous, binary, count, survival):

Intercept: NULL — Uses brms default prior (weakly informative)
Unshrunk terms: NULL — Uses brms default priors (weakly informative)
Shrunk prognostic: horseshoe(1) — Strong regularization for main effects
Shrunk predictive: horseshoe(1) — Strong regularization for treatment interactions

Special cases:

Random effects (one-way model, pipe-pipe || syntax): SD parameters automatically receive normal(0, 1) priors on the standard deviation scale, regardless of the shrunk_predictive_prior setting.
Survival models: No intercept prior is applied, as Cox proportional hazards models do not have a global intercept; it is absorbed into the non-parametric baseline hazard.
Formulas without intercepts: If the user specifies a formula using ~ 0 + ... syntax (no intercept), the intercept prior is omitted.

User customization: Users can override any default prior by specifying custom priors as character strings (e.g., "normal(0, 2.5)") or as brmsprior objects via the parameters intercept_prior, unshrunk_prior, shrunk_prognostic_prior, and shrunk_predictive_prior in fit_brms_model() or run_brms_analysis().

3.8.2 Weakly Informative Priors (Unshrunk Terms)

For all unshrunk coefficients (both prognostic and predictive terms in the unshrunktermeffect component, as well as extra covariates \(\boldsymbol{\beta_{3}}\)), the package uses brms default weakly informative priors to aid computational stability without strongly influencing the posterior.

Default: NULL (uses brms defaults, typically normal(0, ...)). These priors are weakly regularizing on the linear predictor scale.
User customization: Users can override this by specifying custom priors (e.g., "normal(0, 2.5)") via the unshrunk_prior parameter in fit_brms_model() or run_brms_analysis().

3.8.3 Regularized Horseshoe Prior (Shrunk Terms)

This is the default shrinkage prior in bonsaiforest2, recommended for its excellent adaptive shrinkage properties (Piironen and Vehtari 2017).

Concept: A global-local prior. It has an infinitely tall spike at zero (to aggressively shrink noise) and heavy tails (to leave true, large signals unshrunk).
Hierarchical Specification:

\[\begin{aligned} \beta_{2,l} &\sim N(0, \tau^2 \tilde{\lambda}_l^2) \\ \tilde{\lambda}_l^2 &= \frac{c^2 \lambda_l^2}{c^2 + \tau^2 \lambda_l^2} \\ \lambda_l &\sim C^+(0, 1) \quad (\text{Local shrinkage}) \\ \tau &\sim C^+(0, \tau_0) \quad (\text{Global shrinkage}) \\ c^2 &\sim \text{Inverse-Gamma}(\nu/2, \nu s^2/2) \end{aligned}\]
Hyperparameter Justification: The package supports two ways to set the crucial global scale \(\tau_0\):
1. Fixed Default (scale_global): horseshoe(scale_global = 1). This is the package default, a robust, general-purpose choice.
2. Elicited Prior (par_ratio): Sets \(\tau_0\) based on the prior belief about the number of effective (non-zero) subgroups, \(p_{eff}\), out of the total \(L\) (Bornkamp 2025; Piironen and Vehtari 2017). This is specified via \(\text{par\_ratio} = \frac{p_{eff}}{L - p_{eff}}\). This scales the prior based on sample size (\(N\)) and the expected sparsity.

Regularized Horseshoe Implementation in brms

The brms package implements the Regularized Horseshoe prior using the function horseshoe(...). The regularization term (the “slab”) is added to the standard Horseshoe to ensure robustness and improve computational stability in Stan.

Horseshoe Function and Key Arguments

he Horseshoe prior is set using the function call:

horseshoe(df = 1, scale_global = 1, df_global = 1, scale_slab = 2, df_slab = 4, par_ratio = NULL, autoscale = TRUE, main = FALSE)

Horseshoe Argument	Theoretical Equivalent	Role and Implication
\(\mathbf{scale\_global}\)	\(\tau_0\)	Scale of the \(\text{Student-}t\) prior on the global shrinkage parameter \(\tau\) Lower values \(\implies\) more aggressive global shrinkage.
\(\mathbf{par\_ratio}\)	\(\frac{p_{eff}}{L - p_{eff}}\)	Alternative to \(\mathbf{scale\_global}\). Specifies the ratio of expected non-zero to zero coefficients, automatically calculating \(\tau_0\).
\(\mathbf{df}\)	DOF for \(\lambda_l\)	Degrees of freedom of \(\text{Student-}t\) prior on local shrinkage parameters (\(\lambda_l\)). Default \(\mathbf{df=1}\) gives the standard half-Cauchy.
\(\mathbf{scale\_slab}\)	\(s\)	Scale of the regularization slab. Default \(\mathbf{scale\_slab=2}\) prevents large coefficients from being over-shrunk.
\(\mathbf{autoscale}\)	Scaling by \(\sigma\)	Logical. If \(\mathbf{TRUE}\) (default), the prior is automatically scaled using the residual standard deviation (\(\sigma\)).

Relationship to Justification: The \(\mathbf{brms}\) function offers two primary ways to set the global shrinkage level, which controls the overall sparsity of the model:

Fixed Global Scale: Using the \(\mathbf{scale\_global}\) argument, often set to \(\mathbf{1.0}\) as a general-purpose default. This value is internally multiplied by the residual standard deviation (\(\sigma\)) when autoscale = TRUE.
Elicited Global Scale (Recommended): Using the \(\mathbf{par\_ratio}\) argument, which allows the user to specify their prior belief about the sparsity of the model (i.e., the expected number of true non-zero coefficients, \(p_{eff}\)). This is the recommended approach as it accounts for the number of coefficients \(L\) and the sample size \(N\) to determine the optimal \(\tau_0\).

Choosing \(\mathbf{scale\_global}\) and \(\mathbf{par\_ratio}\): The choice of \(\tau_0\) (controlled by \(\mathbf{scale\_global}\) or \(\mathbf{par\_ratio}\)) is crucial, as it sets the overall magnitude of shrinkage.
- Fixed Default (\(\mathbf{scale\_global=1}\)):
  - Use when: You want a simple, robust, general-purpose prior without explicit sparsity knowledge.
  - Caveat: May result in insufficient shrinkage if the true number of non-zero coefficients is small.
- Elicited Sparsity (\(\mathbf{par\_ratio}\)):
  - Calculation: \(\mathbf{par\_ratio} = \frac{p_{eff}}{L - p_{eff}}\), where \(p_{eff}\) is the expected number of non-zero coefficients and \(L\) is the total number of coefficients.
  - Use when: You have a strong prior belief about the number of relevant effects. This is generally preferred for adaptive shrinkage as it scales \(\tau_0\) appropriately.

\(\mathbf{par\_ratio}\) Value	Expected \(p_{eff}\) (Sparsity)	Interpretation of Shrinkage
Small (e.g., 0.05)	Very Few Non-Zero Coefficients	High Shrinkage: \(\tau_0\) is small, aggressively shrinking coefficients towards zero.
Moderate (e.g., 0.2)	A Moderate Fraction is Non-Zero	Moderate Shrinkage: A less aggressive \(\tau_0\), allowing more coefficients to remain unshrunk.
\(\mathbf{par\_ratio}\) Ignored	\(\mathbf{scale\_global}\) used	Uses a fixed \(\tau_0\) that doesn’t explicitly depend on expected sparsity.

Recommendation: While \(\mathbf{scale\_global=1}\) is the quick default, specifying \(\mathbf{par\_ratio}\) based on expert knowledge of expected sparsity is the most theoretically sound and adaptive method for setting the global shrinkage scale.

3.8.4 R2D2 Prior (Shrunk Terms)

Concept: This prior is uniquely derived by first placing a prior on the model’s coefficient of determination, \(R^2\), which quantifies the proportion of variance explained by the predictors. This is arguably more intuitive than specifying priors directly on coefficients.
Hierarchical Specification (Marginal Version): \[ \begin{aligned} \beta_{2,l} &\sim DE(\sigma \sqrt{\phi_l \omega / 2}) \quad (\text{Double-Exponential/Laplace kernel}) \\ \boldsymbol{\phi} &= (\phi_1, \dots, \phi_L) \sim \text{Dirichlet}(a_\pi, \dots, a_\pi) \quad (\text{Local shrinkage}) \\ \omega &\sim \text{Beta-Prime}(a, b) \quad (\text{Global shrinkage}) \end{aligned} \] This is equivalent to assuming that the proportion of variance explained by the interaction terms follows \(R^2 = \frac{\omega}{1+\omega} \sim \text{Beta}(a,b)\).
Justification of Hyperparameters:
- Zhang et al. (2022) provide clear guidance. A fully automatic approach is to set \(b=0.5\) to achieve Cauchy-like heavy tails.
- The concentration parameter \(a_\pi\) controls sparsity. A smaller \(a_\pi\) (e.g., 0.2) implies higher shrinkage, concentrating the prior variance on fewer coefficients, while a larger \(a_\pi\) (e.g., 0.5) spreads the variance more evenly, implying lower shrinkage.
- Default Recommendation: Set \(b=0.5\) and offer options for the user based on desired shrinkage strength:
  - High Shrinkage: \(a_\pi=0.2\)
  - Low Shrinkage: \(a_\pi=0.5\)

R2D2 Prior Implementation in brms

The brms package implements this prior using the function R2D2(...), providing a high-level parametrization that is more intuitive than setting the parameters \(a\), \(b\), and \(a_\pi\) directly. This function translates desired \(\mathbf{R^2}\) properties and shrinkage strength into the underlying prior distribution’s parameters.

R2D2 Function and Key Arguments

The R2D2 prior is set using the function call:

R2D2(mean_R2 = 0.5, prec_R2 = 2, cons_D2 = 0.5, autoscale = TRUE, main = FALSE)

R2D2 Argument	Theoretical Equivalent	Role and Implication
\(\mathbf{mean\_R2}\)	\(\mathbb{E}[R^2]\)	Sets the prior expected value for the model’s fixed effect \(R^2\).
\(\mathbf{prec\_R2}\)	\(a+b\) (Concentration)	Controls the precision of the Beta prior on \(R^2\). Lower values mean a flatter prior.
\(\mathbf{cons\_D2}\)	\(a_{\pi}\)	The concentration parameter for the \(\text{Dirichlet}\) distribution on the variance components (local shrinkage). Lower values imply higher shrinkage/sparsity.
\(\mathbf{autoscale}\)	Scaling by \(\sigma\)	Logical. If \(\mathbf{TRUE}\) (default), the prior is automatically scaled using the residual standard deviation (\(\sigma\)). This ensures the prior is scale-invariant.

Relationship to Justification: The brms function parameters are directly related to the justified hierarchical parameters:
- The \(\mathbf{cons\_D2}\) parameter is the direct counterpart to the sparsity parameter \(\mathbf{a_{\pi}}\). The default recommendation \(\mathbf{cons\_D2 = 0.5}\) corresponds precisely to the \(\mathbf{a_{\pi}=0.5}\) low shrinkagesetting.
- The \(\mathbf{mean\_R2}\) and \(\mathbf{prec\_R2}\) arguments define the \(\text{Beta}(a,b)\) prior on \(R^2\), where: \[a = \text{mean_R2} \times \text{prec_R2}\] \[b = (1 - \text{mean_R2}) \times \text{prec_R2}\] The choice of \(\mathbf{prec\_R2=2}\) (the default) alongside \(\mathbf{mean\_R2=0.5}\) results in \(a=1\) and \(b=1\), which is a Uniform prior on \(R^2 \sim \text{Beta}(1, 1)\), representing a maximally weak prior belief on global fit.
- The essential \(\mathbf{b=0.5}\) setting for Cauchy-like heavy tails (preventing over-shrinkage) is implicitly handled by the R2D2 function’s underlying structure and does not require manual specification.
Choosing \(\mathbf{mean\_R2}\) and \(\mathbf{prec\_R2}\) : Choosing these parameters involves quantifying your prior knowledge about the model’s overall predictive power before observing the data. They define a \(\text{Beta}(a,b)\) distribution on \(R^2\).
- Choosing \(\mathbf{mean\_R2}\) (Prior Expected \(R^2\)): This sets the center of your prior belief.
  - Default (0.5): Use if you have no strong prior knowledge. It implies a neutral expectation that the fixed effects will explain half of the variance.
  - Informative Low (e.g., 0.1 to 0.3): Use if you expect a very weak or noisy relationship.
  - Informative High (e.g., 0.7 to 0.9): Use if you expect a highly predictive model based on strong prior evidence.
- Choosing \(\mathbf{prec\_R2}\) (Prior Confidence in \(R^2\)): This determines the concentration or certainty of the Beta prior around the chosen \(\text{mean_R2}\).

\(\mathbf{prec\_R2}\)	Resulting Beta Parameters (\(\text{mean_R2}=0.5\))	Interpretation
2 (Default)	\(a=1, b=1\) (\(\text{Uniform}\))	Weakly Informative: All \(R^2\) values are equally likely.
5 to 10	\(a=2.5, b=2.5\) to \(a=5, b=5\)	Moderately Informative: The prior is bell-shaped and centered at 0.5, constraining \(R^2\) away from extreme values (0 and 1).
> 20	Highly Peaked	Highly Informative: Use only with very strong external evidence for the \(R^2\) value.

2 (Default)

\(a=1, b=1\) (\(\text{Uniform}\))

Weakly Informative: All \(R^2\) values are equally likely.

5 to 10

\(a=2.5, b=2.5\) to \(a=5, b=5\)

Moderately Informative: The prior is bell-shaped and centered at 0.5, constraining \(R^2\) away from extreme values (0 and 1).

> 20

Highly Peaked

Highly Informative: Use only with very strong external evidence for the \(R^2\) value.

Recommendation: The Default \(\mathbf{mean\_R2 = 0.5}\) and \(\mathbf{prec\_R2 = 2}\) provides the most flexible, weakly informative setting for \(R^2\), which is generally recommended unless strong prior knowledge dictates otherwise.

3.8.5 Normal Hierarchical Prior (for One-Variable-at-a-Time Model)

This is the standard hierarchical model for subgroup analysis, which assumes that treatment effects for levels within a subgrouping variable are drawn from a common normal distribution. It’s less complex than the Horseshoe or R2D2 priors but provides effective shrinkage, especially when the number of subgroup levels is small.

Hierarchical Specification: For a given subgrouping variable \(j\) with levels \(k=1, \dots, K_j\):

\[\begin{aligned} \beta_{2,j,k} &\sim \mathcal{N}(\mu_{2,j}, \tau^2_{2,j}) \quad (\text{Level-specific treatment effects}) \\ \mu_{2,j} &\sim \mathcal{N}(0, s^2) \quad (\text{Common mean effect}) \\ \tau_{2,j} &\sim \text{Half-Normal}(a) \quad (\text{Between-subgroup heterogeneity}) \end{aligned} \]

Justification of Hyperparameters: The key is setting the scale parameter a for the Half-Normal prior on the between-subgroup standard deviation, \(\tau_{2,j}\), as this controls the degree of shrinkage. As recommended by Bornkamp (2025), this can be linked to the planned treatment effect ( \(\delta_{plan}\)) from the trial protocol, making the prior choice interpretable and consistent across different endpoints. The choice of a implies a prior on the plausible difference in treatment effects between any two subgroups.

3.9 Estimation (MCMC)

The joint posterior distribution of all parameters is complex and has no closed-form solution. We use Markov Chain Monte Carlo (MCMC) methods to draw samples from this distribution. Specifically, the package uses the No-U-Turn Sampler (NUTS), an efficient variant of Hamiltonian Monte Carlo (HMC), as implemented in Stan via the brms package. The output is a set of posterior samples (e.g., 4000 draws) representing the joint posterior distribution.

3.10 Standardization for Marginal Effects (G-computation)

To obtain interpretable marginal effects for each subgroup, the package implements a standardization (G-computation) procedure. This process is repeated for each MCMC draw to generate a full posterior distribution of the marginal effect.

For a single MCMC draw, the step-by-step process is:

Select Parameters: Take one draw from the joint posterior distribution for all model parameters (all \(\boldsymbol{\beta}\)’s, \(\sigma\), \(\phi\), etc.).
Create Counterfactuals: For every patient \(i\) in the original dataset, create two scenarios:
- Scenario A: Patient \(i\)’s covariates, with treatment \(s_i=0\) (Control).
- Scenario B: Patient \(i\)’s covariates, with treatment \(s_i=1\) (Treatment).
For Scenario A (control), all treatment-by-subgroup interaction dummies are set to 0. For Scenario B (treatment), each interaction dummy trt_subgroupLEVEL is set to 1 if the patient belongs to that subgroup level, 0 otherwise.
Predict Outcomes: Use the model formula and the parameters from Step 1 to predict the outcome for every patient under both Scenario A (\(\hat{\mu}_{i,0}\)) and Scenario B (\(\hat{\mu}_{i,1}\)). The brms::posterior_epred() function automatically handles the computation using the model’s design matrix and parameters.
Average within Subgroups: For a subgroup of interest \(l\) (e.g., “Female”):
- Calculate the average predicted outcome under control: \(\hat{\mu}_{l,0} = \text{mean}(\hat{\mu}_{i,0})\) for all \(i\) where \(x_{il}=1\).
- Calculate the average predicted outcome under treatment: \(\hat{\mu}_{l,1} = \text{mean}(\hat{\mu}_{i,1})\) for all \(i\) where \(x_{il}=1\).
Calculate Effect Measure: Compute the marginal effect for subgroup \(l\) from these averaged predictions. This depends on the endpoint type:
- Continuous: Mean Difference = \(\hat{\mu}_{l,1} - \hat{\mu}_{l,0}\).
- Binary: Odds Ratio = \(\frac{\hat{\mu}_{l,1} / (1 - \hat{\mu}_{l,1})}{\hat{\mu}_{l,0} / (1 - \hat{\mu}_{l,0})}\) (where \(\hat{\mu}\) is the predicted probability).
- Count: Rate Ratio = \(\hat{\mu}_{l,1} / \hat{\mu}_{l,0}\) (where \(\hat{\mu}\) is the predicted rate).
- Time-to-Event: Average Hazard Ratio (AHR). This is the most complex, as it requires predicting the full survival curve \(S_i(t)\) for each patient. The marginal survival curves \(\hat{S}_{l,0}(t)\) and \(\hat{S}_{l,1}(t)\) are calculated, and the AHR is computed from them, as the marginal curves may not be proportional.
Repeat: This process (Steps 1-5) is repeated for all MCMC draws, yielding a full posterior distribution (e.g., 4000 values) for the marginal effect of each subgroup. The final point estimate (median) and 95% credible interval are taken from this distribution.

4 Mapping of Statistical Methods to `bonsaiforest2` Functions

This section connects the statistical methodology (Section 3) to the package functions. The package provides a three-level architecture: high-level user interface functions, modular worker functions for advanced control, and internal helpers.

4.1 High-Level User Interface

These are the main functions users typically call in their analysis workflow:

run_brms_analysis()
- Maps to: Sections 3.4 (Global Model), 3.5 (One-Variable-at-a-Time Model), 3.7 (Endpoints), 3.8 (Priors), 3.9 (Estimation).
- Action: Main entry point that orchestrates the complete modeling workflow.
  - Internally calls prepare_formula_model() to construct the three-component brmsformula structure and validate inputs.
  - Then calls fit_brms_model() to run MCMC sampling via brms::brm() and Stan.
  - Stores essential metadata (treatment variable, response type, original data) as attributes on the fitted model object for downstream analysis.
  - Handles all formula components: unshrunktermeffect (unshrunk terms), shprogeffect (shrunk prognostic), and shpredeffect (shrunk predictive).
  - Returns a brmsfit object with enriched attributes.
summary_subgroup_effects()
- Maps to: Section 3.10 (Standardization / G-computation).
- Action: User-facing wrapper for marginal effect summarization.
  - Internally calls estimate_subgroup_effects() to perform the G-computation procedure.
  - Automatically extracts treatment variable, response type, and data from model attributes (set by fit_brms_model()).
  - Auto-detects subgroups from all formula components by searching for treatment interactions in unshrunktermeffect, shprogeffect, and shpredeffect.
  - Returns a subgroup_summary S3 object containing estimates, credible intervals, and metadata.
plot()
- Maps to: Final visualization of Section 3.10 results.
- Action: S3 method for subgroup_summary objects. Generates publication-ready forest plots from marginal effect estimates with median and 95% credible intervals.

4.2 Advanced/Modular Interface

For users requiring fine-grained control over the modeling process, the package exposes intermediate worker functions:

prepare_formula_model()
- Parameters:
  - data: Dataset containing all variables
  - response_formula: Response specification (e.g., outcome ~ trt or Surv(time, status) ~ trt)
  - unshrunk_terms_formula: Unshrunk terms specification (may include main effects and treatment interactions)
  - shrunk_prognostic_formula: Prognostic main effects to be regularized (optional)
  - shrunk_predictive_formula: Treatment interactions to be regularized (optional)
  - response_type: One of "binary", "count", "continuous", or "survival"
  - stratification_formula: Stratification variable (optional)
- Action: Constructs and validates the three-component brmsformula structure:
  - unshrunktermeffect: Unshrunk terms with weakly informative priors. Can include both prognostic effects (intercept, main treatment effect, pre-specified covariates) and predictive effects (treatment interactions). Specified via unshrunk_terms_formula.
  - shprogeffect: Shrunk prognostic effects with strong regularization priors. Specified via shrunk_prognostic_formula.
  - shpredeffect: Shrunk predictive effects (treatment interactions) with strong regularization priors. Specified via shrunk_predictive_formula.
- Supports flexible interaction syntax: colon notation (trt:subgroup), star notation (trt*subgroup), or random effects notation ((trt || subgroup)), which can be mixed in the same model.
- Validates model hierarchy: checks if predictive terms have corresponding prognostic main effects, issuing warnings if violations are detected (star notation automatically includes main effects and is excluded from this check).
- Applies automatic contrast coding: one-hot encoding for shrunk terms (all levels represented for exchangeability) and dummy encoding for unshrunk terms (reference level dropped).
- Constructs design matrices \(\mathbf{X_n}, \mathbf{X_p}, \mathbf{Z_n}, \mathbf{Z_p}, \mathbf{U}\) from formula specifications.
- Returns a list containing the complete brmsformula, transformed data with proper contrasts, response type, treatment variable name, Stan parameter names, and metadata.
fit_brms_model()
- Action: Executes the MCMC sampling and model fitting.
  - Takes output from prepare_formula_model() or manually specified formula components.
  - Assigns priors to each formula component based on user specifications or defaults.
  - Calls brms::brm() to compile Stan code and run MCMC chains.
  - Handles endpoint-specific configurations (Section 3.7): likelihood functions, link functions, and stratification.
  - Attaches essential metadata as attributes to the fitted model for use by estimate_subgroup_effects().
estimate_subgroup_effects()
- Action: Implements the complete G-computation procedure described in Section 3.10:
  1. For each MCMC draw, creates counterfactual datasets (all patients as treated vs. control).
  2. Generates posterior predictions using brms::posterior_epred() or brms::posterior_survfit().
  3. Averages predictions within each subgroup to obtain marginal expected outcomes.
  4. Computes endpoint-specific effect measures:
    - Continuous: Mean difference (\(\hat{\mu}_{l,1} - \hat{\mu}_{l,0}\))
    - Binary: Odds ratio
    - Count: Rate ratio
    - Survival: Average hazard ratio (AHR)
- Auto-detects subgroups by parsing all formula components for treatment interactions (both colon : syntax and pipe || syntax for random effects).
- Returns raw posterior draws and summary statistics (median, credible intervals).

References

Bornkamp, Björn. 2025. “Benchmarking Bayesian subgroup shrinkage methods on clinical data.” In PSI Conference 2025. London, UK.

Hernán, Miguel A., and James M. Robins. 2023. Causal Inference: What If. Boca Raton: CRC Press.

Piironen, J., and A. Vehtari. 2017. “Sparsity information and regularization in the horseshoe and other shrinkage priors.” Electronic Journal of Statistics 11: 5018–51.

Wang, Yun, Wenda Tu, William Koh, James Travis, Robert Abugov, Kiya Hamilton, Mengjie Zheng, Roberto Crackel, Pablo Bonangelino, and Mark Rothmann. 2024. “Bayesian hierarchical models for subgroup analysis.” Pharmaceutical Statistics 23: 1065–83.

Wolbers, Marcel, Mar Vázquez Rabuñal, Ke Li, Kaspar Rufibach, and Daniel Sabanés Bové. 2025. “Using shrinkage methods to estimate treatment effects in overlapping subgroups in randomized clinical trials with a time-to-event endpoint.” Statistical Methods in Medical Research, 1–12.

Zhang, Yan Dora, Brian P. Naughton, Howard D. Bondell, and Brian J. Reich. 2022. “Bayesian regression using a prior on the model fit: The R2-D2 shrinkage prior.” Journal of the American Statistical Association 117 (538): 862–74.

Miriam Pedrera Gomez, Isaac Gravestock, and Marcel Wolbers