
Statistical Specifications
Miriam Pedrera Gomez, Isaac Gravestock, and Marcel Wolbers
Source:vignettes/Statistical_Specifications.Rmd
Statistical_Specifications.Rmd1 Scope of this document
This document describes the statistical methods implemented in the bonsaiforest2 R package for Bayesian subgroup analysis of continuous, binary, count, and time-to-event outcomes in randomized clinical trials (RCTs).
The package enable the implementation of both a global modeling approach (Wolbers et al. 2025), that estimates all prognostic and predictive effects in a single model, and a One-variable-at a time (Wang et al. 2024), that estimates the predictive effects using a different model for each subgrouping variable.
The core features described in this document are:
-
Flexible Model Specification: A unified framework to distinguish between:
- Prognostic (main) vs. Predictive (interaction) effects.
- Shrunk vs. Unshrunk coefficients. Users can specify unshrunk terms (with weakly informative priors), and add shrunk prognostic effects and/or shrunk predictive effects (with strong regularization priors) as needed for their analysis.
- Customizable Priors: Users can specify custom priors for individual coefficients or coefficient groups when domain expertise is available. If no custom priors are specified, the package applies automatic, well-calibrated default priors.
- Advanced Shrinkage Priors: Supports state-of-the-art shrinkage priors, including the Regularized Horseshoe (Piironen and Vehtari 2017; Wolbers et al. 2025) and R2D2 (Zhang et al. 2022).
-
Marginal Effect Estimation: Use of standardization (G-computation) to derive interpretable marginal subgroup treatment effects, which average over the covariate distributions within each subgroup (Hernán and Robins 2023). This is implemented through:
-
estimate_subgroup_effects()for the computational G-computation workflow. -
summary_subgroup_effects()for user-friendly effect summaries.
-
This document is structured as follows: Section 2 provides a conceptual introduction to subgroup analysis challenges and the Bayesian shrinkage solution. Section 3 details the statistical methodology, including notation, the global and the one-variable at a time models, endpoint-specific likelihoods, prior specifications, and the standardization procedure. Section 4 maps these statistical methods to the core functions in the bonsaiforest2 package.
2 Introduction to Subgroup Analysis and Estimation
2.1 The Challenge of Subgroup Analysis
Exploratory subgroup analyses are a standard part of RCT reporting, typically visualized in forest plots to assess the consistency of treatment effects. However, their interpretation is notoriously difficult due to several statistical challenges:
- Low Power: Subgroups have smaller sample sizes, leading to low precision (wide confidence intervals) and an inability to detect true, modest differences in effect.
- Multiplicity: Examining many subgroups (e.g., 20 or more) inflates the risk of false-positive findings. Investigators may focus on the most extreme results, which are often spurious.
- Overfitting: Standard subgroup-specific estimates (fitting a model only to data within that subgroup) are unbiased but have high variance, leading to an overestimation of heterogeneity.
This has led to the common recommendation that the overall treatment effect is often a more reliable estimate for a subgroup than the estimate derived from the subgroup’s data alone.
2.2 Prognostic vs. Predictive Effects
To build a meaningful model, it is crucial to distinguish how a baseline variable relates to the outcome:
- Prognostic Effect: A variable is prognostic if it is associated with the clinical outcome, regardless of the treatment received. It describes the natural course of the disease. In a model, this is a main effect.
- Predictive Effect: A variable is predictive if its value modifies the magnitude or direction of the treatment’s effect. It identifies heterogeneity. In a model, this is a treatment-by-variable interaction.
bonsaiforest2 is built to model both types of effects explicitly and apply different modeling strategies (e.g., shrinkage) to each.
2.3 The Role of Bayesian Shrinkage
Bayesian shrinkage methods directly address the challenges of subgroup analysis by providing a principled compromise between the two extremes of “no pooling” (subgroup-specific estimates) and “full pooling” (overall population estimate).
This compromise, often called partial pooling, is achieved using hierarchical models. Subgroup effects are assumed to be exchangeable (drawn from a common distribution). This assumption allows subgroups to “borrow strength” from each other:
- Subgroups with small sample sizes and extreme results are “shrunk” heavily toward the overall mean, reducing their variance.
- Subgroups with large sample sizes and strong evidence of a real effect are shrunk less, allowing their data to dominate.
This results in estimates with a better bias-variance trade-off: we accept a small amount of bias (by pulling real effects slightly toward the mean) in exchange for a large reduction in variance, leading to a lower overall error and better control of spurious findings.
2.4 Conditional vs. Marginal Estimands
When covariates are included in a non-linear model (like a logistic or Cox model), the resulting coefficients represent conditional effects. For example, the treatment effect is the effect for a patient with a specific set of covariate values (e.g., the reference values).
This is problematic for forest plots, where we want to know the marginal effect: “What is the average treatment effect for all patients in the ‘Female’ subgroup, averaging over their different ages, regions, etc.?”.
Because different subgroups have different covariate distributions (e.g., the age >= 65 subgroup will have a different health status profile than the age < 65 group), these conditional coefficients are not directly comparable.
bonsaiforest2 solves this by using Standardization (G-computation) (Hernán and Robins 2023). This procedure correctly averages over the specific covariate distribution of each subgroup to estimate a valid marginal treatment effect, ensuring all subgroups are compared on the same interpretable scale. (See Section 3.10 for a detailed explanation of how this standardization is performed in the package.)
3 Statistical Methodology
3.1 Setting and Notation
We establish the following notation for the global model:
- \(i = 1, \dots, N\): Index for individual patients.
- \(y_i\): The outcome for patient \(i\). We will consider that this variable can only be a binary, count, continuous or time-to-event endpoint.
- \(s_i\): The binary treatment indicator (\(s_i=1\) for treatment, \(s_i=0\) for control).
- \(l = 1, \dots, L\): Index for an overall subgroup level (e.g., “Male”, “Female”, “EU”, “USA”).
- \(x_{il}\): Indicator (0/1) for patient \(i\) belonging to subgroup level \(l\) (used for prognostic effects).
- \(z_{il} = s_i \cdot x_{il}\): The interaction between treatment and subgroup level \(l\) (used for predictive effects).
- \(u_{iv}\): Value of the \(v\)-th additional non-subgroup covariate for patient \(i\). This additional covariates don’t need to be binary or categorical.
- \(\boldsymbol{\mu}\): The \(N \times 1\) vector of expected outcomes.
- \(g(\cdot)\): A link function. In Section 3.7 we define the used link function for each type of endpoint.
- \(\boldsymbol{\eta}\): The \(N \times 1\) vector of the linear predictor.
3.2 Design Matrix Considerations
A critical implementation detail is the construction of the design matrix, which varies based on whether coefficients are subject to shrinkage.
The package applies different contrast coding schemes for shrunk versus unshrunk terms:
-
For unshrunk terms (prognostic or predictive): Standard dummy coding is used, where one level per factor is treated as the reference category and omitted from the design matrix. This approach:
- Ensures interpretability of coefficients relative to a baseline
- Avoids collinearity with the intercept
- Results in L−J total columns for J subgrouping variables with L total levels
- Is appropriate when coefficients are given weakly informative priors without strong regularization
-
For shrunk terms (prognostic or predictive): One-hot encoding is used, creating explicit dummy variables for all levels of each factor, with no reference level omitted. For example, a subgrouping variable with levels {A, B, C} creates three columns:
subgroupA,subgroupB,subgroupC(for prognostic effects) ortrt:subgroupA,trt:subgroupB,trt:subgroupC(for predictive effects). This approach ensures:- All subgroup levels are treated symmetrically under the exchangeability assumption of the shrinkage prior
- No single level serves as an anchor point toward which other levels are shrunk
- Each level has its own interpretable parameter that can be independently regularized
- Predictions work correctly through G-computation (see Section 3.10)
The rationale for one-hot encoding in shrunk terms is that shrinkage priors assume exchangeability among coefficients—all levels are drawn from the same prior distribution. Having a reference level would break this symmetry and bias the shrinkage process.
3.3 The Principle of Model Hierarchy
A fundamental concept in regression modeling is the principle of model hierarchy (or marginality). This principle states that if a model includes an interaction term (e.g., A:B), it should also include the corresponding main effects (e.g., A and B).
In the context of bonsaiforest2, this means that any subgrouping variable included as predictive (i.e., in an interaction with treatment, like trt:subgroup) should also be included as prognostic (i.e., as a main effect, subgroup).
Including the main effect ensures that the interaction term purely captures the modification of the treatment effect, rather than a mix of the main prognostic effect and the interaction, leading to a more stable and interpretable model.
The bonsaiforest2 package checks adherence to this principle during model specification. The prepare_formula_model function examines whether a main effect is specified as prognostic for every variable included in a predictive term. If it is not, the function issues a warning recommending that the user include the corresponding main effect to ensure proper model hierarchy. However, the package does not enforce this automatically, allowing users the flexibility to override this recommendation if justified by domain knowledge or specific modeling goals.
3.4 The Global Model
The package enables the implementation of a global model (Wolbers et al. 2025) where all effects are estimated in a single, comprehensive regression formula.
3.4.1 Full Model Equation
Matrix form:
The model for the linear predictor, \(\boldsymbol{\eta}\), for all \(N\) patients is given by:
\[g(\boldsymbol{\mu}) = \boldsymbol{\eta} = \underbrace{\mathbf{X_p}\boldsymbol{\beta_{1,p}}}_{\text{Penalized prognostic effects}} +\underbrace{\mathbf{X_n}\boldsymbol{\beta_{1,n}}}_{\text{Non-penalized prognostic effects}} +\underbrace{\mathbf{Z_p}\boldsymbol{\beta_{2,p}}}_{\text{Penalized predictive effects}} +\] \[ \underbrace{\mathbf{Z_n}\boldsymbol{\beta_{2,n}}}_{\text{Non-penalized predictive effects}}+ \underbrace{\mathbf{U}\boldsymbol{\beta_{3}}}_{\text{Extra prognostic effects}} \]
The Bayesian hierarchical structure is applied as follows:
Penalized Coefficients: \(\boldsymbol{\beta_{1,p}}\) and \(\boldsymbol{\beta_{2,p}}\) are given shrinkage priors.
Non-Penalized Coefficients: \(\boldsymbol{\beta_{1,n}}\) and \(\boldsymbol{\beta_{2,n}}\) are given standard, weakly informative priors.
Extra prognostic effects Coefficients: \(\boldsymbol{\beta_3}\) are given standard, weakly informative priors.
Components definition:
-
\(\mathbf{X}= [\mathbf{X_p} \;\; \mathbf{X}_n]\) is the \(N \times M\) design matrix for the prognostic effects. It typically includes:
An intercept column. In Section 3.6 we discuss the inclusion and the prior we give to the intercept in detail
A column for the main treatment effect.
-
Columns for the prognostic effects of each subgroup level. The contrast coding depends on shrinkage status:
- Unshrunk terms (\(\mathbf{X}_n\)): Use dummy coding (one reference level per factor is dropped)
- Shrunk terms (\(\mathbf{X}_p\)): Use one-hot encoding (all levels included to maintain exchangeability)
Example:
# Example: Unshrunk prognostic effects using dummy coding design_matrix_df <- data.frame( Patient = c(1, 2, 3, 4), Intercept = c(1, 1, 1, 1), trt = c(0, 1, 0, 1), sexF = c(0, 1, 1, 0), regionUS = c(0, 0, 1, 1), regionAsia = c(0, 1, 0, 0) ) knitr::kable( design_matrix_df, caption = "Example design matrix for unshrunk prognostic effects (dummy coding).", align = c('l', 'c', 'c', 'c', 'c') )Table 3.1: Example design matrix for unshrunk prognostic effects (dummy coding). Patient Intercept trt sexF regionUS regionAsia 1 1 0 0 0 0 2 1 1 1 0 1 3 1 0 1 1 0 4 1 1 0 1 0 Note: For unshrunk terms, M=1+1+L-J= 1 for intercept + 1 for treatment + L (total subgroup levels) - J (number of subgrouping variables) because reference levels are dropped. For shrunk terms, all L levels are included (no reference level dropped).
\(\boldsymbol{\beta_{1,p}}\) is the \(P \times 1\) vector of coefficients for the prognostic effects that we want to shrink. We will give an shrinkage prior to this coefficients.
\(\boldsymbol{\beta_{1,n}}\) is the \((M-P) \times 1\) vector of coefficients for the prognostic effects that we don’t want to shrink.
-
\(\mathbf{Z}= [\mathbf{Z_p} \;\; \mathbf{Z}_n]\) is the design matrix for the predictive effects. The contrast coding depends on shrinkage status:
- Unshrunk predictive terms (\(\mathbf{Z}_n\)): Use dummy coding (one reference interaction per factor is dropped)
-
Shrunk predictive terms (\(\mathbf{Z}_p\)): Use one-hot encoding where each column
trt_subgroupLEVELequals 1 when patient \(i\) is both on treatment (\(s_i = 1\)) AND belongs to that specific subgroup level, and 0 otherwise. All \(L\) interaction dummies are included to ensure all subgroups are treated symmetrically under the exchangeability assumption.
# Example: Shrunk predictive effects using one-hot encoding
# Each column represents trt_subgroupLEVEL (1 if treated AND in that level, 0 otherwise)
interaction_matrix <- data.frame(
Patient = 1:6,
`trt_sexM` = c(1, 0, 1, 1, 0, 0),
`trt_sexF` = c(0, 1, 0, 0, 1, 1),
`trt_regionEU` = c(0, 1, 0, 0, 1, 0),
`trt_regionUS` = c(1, 0, 0, 1, 0, 0),
`trt_regionAsia` = c(0, 0, 1, 0, 0, 1)
)
knitr::kable(
interaction_matrix,
align = 'c',
col.names = c("Patient", "trt_sexM", "trt_sexF", "trt_regionEU", "trt_regionUS", "trt_regionAsia"),
caption = "Example design matrix for shrunk predictive effects (one-hot encoding)"
)| Patient | trt_sexM | trt_sexF | trt_regionEU | trt_regionUS | trt_regionAsia |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 1 | 0 |
| 2 | 0 | 1 | 1 | 0 | 0 |
| 3 | 1 | 0 | 0 | 0 | 1 |
| 4 | 1 | 0 | 0 | 1 | 0 |
| 5 | 0 | 1 | 1 | 0 | 0 |
| 6 | 0 | 1 | 0 | 0 | 1 |
- \(\boldsymbol{\beta_{2,p}}\) is the \(R \times 1\) vector of coefficients for the predictive effects that we want to shrink. We will give an shrinkage prior to this coefficients.
- \(\boldsymbol{\beta_{2,n}}\) is the \((L-R) \times 1\) vector of coefficients for the predictive effects that we don’t want to shrink.
- \(U\) is the \(N \times V\) design matrix for the extra variables we want to take into account in the fitting of the model. This will also be unpenalized.
- \(\boldsymbol{\beta_3}\) is the \(V \times 1\) vector of extra prognostic effects coefficients
| Shrinkage | Not Shrinkage | |
|---|---|---|
| Prognostic Effect | \(\mathbf{X_p}\boldsymbol{\beta_{1,p}}\) | \(\mathbf{X_n}\boldsymbol{\beta_{1,n}}\) |
| Predictive Effect | \(\mathbf{Z_p}\boldsymbol{\beta_{2,p}}\) | \(\mathbf{Z_n}\boldsymbol{\beta_{2,n}}\) |
Linear form:
To make it more concrete, the linear predictor for a single patient \(i\) can be written as:
\[ \eta_i = \underbrace{\beta_0 + \beta_{\text{treat}}s_i + \underbrace{\sum_{l=1}^{P} \beta^l_{1,p}x^{il}_n}_\text{penalized}+ \underbrace{\sum_{l=P+1}^{M} \beta^l_{1,n}x^{il}_n}_\text{Not penalized}+\underbrace{\sum_{v=1}^{V} \beta_{3}^vu^{iv}}_{\text{Extra }}}_{\text{Prognostic Effects}} + \underbrace{\underbrace{\sum_{l=1}^{L-R} \beta_{2,p}^l z_p^{il}}_{\text{penalized }} + \underbrace{\sum_{l=L-R+1}^{L} \beta_{2,n}^l z_n^{il}}_{\text{Not penalized}} }_{\text{Predictive effects}} \]
Note: See that even if \(z^{il}=x^{il}\cdot s^{i}\), as the prognostic effect or the predictive effect of the same subgrouping variable may be penalized or not, we separate it in \(x\) and \(z\) in the formula.
3.4.2 Example
A randomized, double-blind trial comparing a new drug, DrugX, against a placebo in patients with hypertension.
- Primary Endpoint (\(Y_{i}\)): Change from baseline in systolic blood pressure (SBP) in mmHg at 6 months.
-
Subgroups of Interest:
-
Age Group:
<65vs.≥65years. -
Sex:
Malevs.Female. -
Kidney Function: eGFR
<60(impaired) vs.≥60(normal).
-
Age Group:
-
Covariates:
BaselineSBPᵢ,BMIᵢ,Smokerᵢ(binary).
Global Model with Shrinkage
The outcome is assumed to be normally distributed: \(Y_{i} \sim N(\mu_{i}, \sigma^2)\).
The linear predictor \(\mu_{i}\) is modeled as:
\[ \mu_{i} = \underbrace{\beta_0}_{\text{Intercept}} + \underbrace{\beta_{\text{treat}} \cdot \text{Treatment}_i}_{\text{Main Trt Effect}} + \underbrace{\sum_{k=1}^{K} \alpha_k \cdot \text{Subgroup}_{ik}}_{\text{Prognostic Subgroup Effects}} + \underbrace{\sum_{k=1}^{K} \gamma_k \cdot (\text{Subgroup}_{ik} \cdot \text{Treatment}_i)}_{\text{Shrunken Predictive Effects}} + \underbrace{\sum_{j=1}^{J} \delta_j \cdot \text{Covariate}_{ij}}_{\text{Extra Prognostic effects}} \]
\[ =\underbrace{\beta_0}_{\text{Intercept}} + \underbrace{\beta_{\text{treat}} \cdot \text{Treatment}_i}_{\text{Main Trt Effect}} + \underbrace{\alpha_1 \cdot \text{Age Group}_i+\alpha_2 \cdot \text{Sex}_i+\alpha_3 \cdot \text{eGFR}_i}_{\text{Prognostic Subgroup Effects}}+ \]
\[ + \underbrace{\gamma_1 \cdot (\text{Age Group}_i\cdot \text{Treatment}_i)+\gamma_2 \cdot (\text{Sex}_i\cdot \text{Treatment}_i)+\gamma_3 \cdot (\text{eGFR}_i\cdot \text{Treatment}_i)}_{\text{Predictive Effects}} + \underbrace{\delta_1 \cdot \text{BaselineSBP}_i+\delta_2 \cdot \text{BMI}_i+\delta_3 \cdot \text{Smoker}_i}_{\text{Extra Prognostic effects}} \]
3.5 One-Variable-at-a-Time Model
The one-variable-at-a-time model (Wang et al. 2024) is a Bayesian hierarchical model (BHM) that improves upon standard subgroup analysis by borrowing information across subgroup levels to produce more stable and reliable estimates. Instead of analyzing each subgroup level in isolation, it analyzes all levels within a single subgrouping variable (e.g., all age groups) together in one model.
For example, for the variable Sex (with levels Male and Female), you would fit one hierarchical model. In this single model, the treatment effects for Male (\(\beta_{2,\text{sex,male}}\)) and Female (\(\beta_{2,\text{sex,female}}\)) are treated as exchangeable and are linked by a common prior distribution. Then, a completely separate hierarchical model would be fit for the variable Region.
3.5.1 Full Model Equation
For a given subgrouping variable \(j\), we can specify a flexible linear predictor that allows for some covariate effects to be constant across the levels of \(j\), while others are allowed to vary.
Let’s partition the patient-level covariates into two sets:
\(\mathbf{u}_i\): A vector of covariates assumed to have a constant prognostic effect across all levels of the subgrouping variable \(j\).
\(\mathbf{v}_i\): A vector of covariates for which there is a strong a priori reason to believe their prognostic effect varies across the levels \(k\) of the subgrouping variable \(j\).
The linear predictor for patient \(i\) in level \(k\) of variable \(j\) is then: \[g(\mu_{i,j,k}) = \beta_{1,j,k} + \beta_{2,j,k}z_i + \boldsymbol{\beta}_{3,j}\mathbf{u}_i + \boldsymbol{\beta}_{4,j,k}\mathbf{v}_i\]
- \(\beta_{2,j,k}\): The predictive effect for level \(k\) of variable \(j\). This is the primary parameter of interest for shrinkage.
- \(\boldsymbol{\beta}_{3,j}\mathbf{u}_i\): The effect of covariates assumed to be constant across levels. The coefficient vector \(\boldsymbol{\beta}_{3,j}\) does not have a \(k\) subscript.
- \(\boldsymbol{\beta}_{4,j,k}\mathbf{v}_i\): The effect of covariates assumed to be level-specific. The coefficient vector \(\boldsymbol{\beta}_{4,j,k}\) does have a \(k\) subscript, allowing its effect to be different for each subgroup level. This can be seen as an interaction between variable \(x_j\) and \(v_i\).
Note that for coefficients \(\boldsymbol{\beta}_{3,j}\) and \(\boldsymbol{\beta}_{4,j,k}\) we can use shrinkage or standard priors depending in our prior beliefs.
3.5.2 Example
Similarly to the example showed for the Global model. The example below is for the Age Group variable, where \(k \in \{ \text{<65}, \text{≥65} \}\).
The outcome for a patient i in age subgroup k is: \(Y_{ik} \sim N(\mu_{ik}, \sigma_k^2)\).
The linear predictor \(\mu_{ik}\) is:
\[ \mu_{ik} = \underbrace{\beta_{1,k}}_{\text{Subgroup Intercept}} + \underbrace{\beta_{2,k} \cdot \text{Treatment}_i}_{\text{Predictive Effect}} + \underbrace{\delta_1 \cdot \text{BMI}_i + \delta_2 \cdot \text{Smoker}_i}_{\text{Constant Prognostic Effects}} + \underbrace{\delta_{3,k} \cdot \text{BaselineSBP}_i}_{\text{Varying Prognostic Effect}} \]
- Here, \(\beta_{1,k}\) (subgroup-specific intercept) and \(\beta_{2,k}\) (subgroup-specific predictive effect) are given hierarchical priors to borrow information across age groups. For example: \(\beta_{2,k} \sim N(\mu_2, \tau_2^2)\).
- The effect of Baseline SBP is also allowed to vary by age group (\(\delta_{3,k}\)), while BMI and Smoking effects are assumed constant
3.6 The Logic of the Intercept
While the intercept (\(\beta_0\)) is not subject to shrinkage (it is a fixed effect within the unshrunktermeffect component), specifying its prior requires care to ensure the model remains weakly informative without introducing bias or numerical instability.
Default Specification: The package automatically scales the intercept prior based on the reference scale parameter (sigma_ref) provided by the user:
For Continuous Endpoints: The prior is centered at the observed outcome mean with scale proportional to
sigma_ref: \[ \beta_0 \sim \mathcal{N}(\bar{y}, (5 \times \sigma_{ref})^2) \] where \(\bar{y}\) is the sample mean of the outcome variable andsigma_refis the protocol standard deviation from the trial design. Centering at the outcome mean improves computational efficiency while the scale of \(5 \times \sigma_{ref}\) provides sufficient flexibility relative to the trial’s expected variability.For Binary and Count Endpoints: The prior is centered at 0 and scaled by
sigma_ref(typically set to 1): \[ \beta_0 \sim \mathcal{N}(0, (5 \times \sigma_{ref})^2) \] For binary and count endpoints,sigma_refis typically set to 1, yieldingnormal(0, 5). A standard deviation of 5 on the log-odds or log-rate scale covers a wide range of plausible effect sizes (e.g., ratios from approximately 0.08 to 12) while still penalizing extreme, unrealistic values. This choice is aligned with the methodology in Wolbers et al. (2025).
Linking to Trial Design:
For continuous outcomes: Set
sigma_refto the population standard deviation assumed during the trial design and sample size calculation phase. This ensures the prior is automatically calibrated to the expected magnitude of effects in the specific trial.For binary, count, and survival outcomes: Use
sigma_ref = 1(the typical default), as these outcomes are modeled on transformed scales (logit, log, or log-hazard) where the original outcome standard deviation is not directly meaningful.
Special Case: Time-to-Event Endpoints. For Cox proportional hazards models, the intercept is not identifiable and is omitted from the linear predictor.
Mechanism: The intercept is effectively absorbed into the non-parametric baseline hazard function, \(h_0(t)\).
Implementation: The
brmsbackend handles this automatically by estimating the baseline hazard (viabhaz()) rather than a fixed intercept parameter. Therefore, no prior specification for \(\beta_0\) is required for survival models.
3.7 Endpoint-Specific Models
The package supports four primary endpoint types by specifying the appropriate likelihood and link function for \(g(\boldsymbol{\mu}) = \boldsymbol{\eta}\).
| Endpoint | Distribution / Model | Link Function | Notes |
|---|---|---|---|
| Continuous | Normal: \(y_i \sim N(\mu_i, \sigma^2)\) | Identity: \(\mu_i = \eta_i\) |
\(\sigma\) can be stratified (e.g., sigma ~ 0 + clinic_site). |
| Binary | Bernoulli: \(y_i \sim \text{Bernoulli}(p_i)\) | Logit: \(\text{logit}(p_i) = \eta_i\) | |
| Count | Negative Binomial: \(y_i \sim \text{NegBin}(\mu_i, \phi)\) | Log: \(\log(\mu_i) = \eta_i\) |
\(\phi\) (overdispersion) can be stratified. Supports offset(). |
| Time-to-event | Cox Proportional Hazards: \(h_i(t) = h_0(t)\exp(\eta_i)\) | Log (on the hazard) | Intercept is absorbed by the baseline hazard \(h_0(t)\). \(h_0(t)\) can be stratified. |
3.8 Prior Specifications
3.8.1 Default Priors
The package requires the user to specify sigma_ref, a reference scale parameter used to calibrate all default priors. The interpretation of sigma_ref depends on the endpoint type:
-
For continuous outcomes:
sigma_refrepresents the protocol standard deviation of the trial (typically obtained from the sample size calculation). -
For binary, count, and survival outcomes:
sigma_refis typically set to 1, as these outcomes are modeled on transformed scales (logit, log, or log-hazard) where standard deviations are not directly meaningful.
All default priors are automatically scaled relative to sigma_ref to ensure they are appropriately calibrated:
For continuous outcomes:
-
Intercept:
normal(outcome_mean, 5 * sigma_ref)— centered at the observed outcome mean -
Unshrunk terms:
normal(0, 5 * sigma_ref) -
Shrunk prognostic:
horseshoe(1) -
Shrunk predictive:
horseshoe(1)
For binary, count, and survival outcomes:
-
Intercept:
normal(0, 5 * sigma_ref)(typicallynormal(0, 5)whensigma_ref = 1) -
Unshrunk terms:
normal(0, 5 * sigma_ref)(typicallynormal(0, 5)whensigma_ref = 1) -
Shrunk prognostic:
horseshoe(1) -
Shrunk predictive:
horseshoe(1)
Note: The intercept prior is not applied for response_type = "survival", as Cox models do not have a global intercept. Additionally, the intercept prior is omitted if the user specifies a formula without an intercept (using ~ 0 + ... syntax).
3.8.2 Weakly Informative Priors (Unshrunk Terms)
For all unshrunk coefficients (both prognostic and predictive terms in the unshrunktermeffect component, as well as extra covariates \(\boldsymbol{\beta_{3}}\)), we use weakly informative priors to aid computational stability without strongly influencing the posterior.
-
Default:
normal(0, 5 * sigma_ref). This prior is automatically scaled to the trial’s expected effect size, making it weakly regularizing on the linear predictor scale. Users can override this by specifying custom priors using thesigma_refplaceholder (e.g.,"normal(0, 2.5 * sigma_ref)"), which will be automatically substituted with the numeric value during model fitting.
3.8.3 Regularized Horseshoe Prior (Shrunk Terms)
This is the default shrinkage prior in bonsaiforest2, recommended for its excellent adaptive shrinkage properties (Piironen and Vehtari 2017).
Concept: A global-local prior. It has an infinitely tall spike at zero (to aggressively shrink noise) and heavy tails (to leave true, large signals unshrunk).
-
Hierarchical Specification:
\[\begin{aligned} \beta_{2,l} &\sim N(0, \tau^2 \tilde{\lambda}_l^2) \\ \tilde{\lambda}_l^2 &= \frac{c^2 \lambda_l^2}{c^2 + \tau^2 \lambda_l^2} \\ \lambda_l &\sim C^+(0, 1) \quad (\text{Local shrinkage}) \\ \tau &\sim C^+(0, \tau_0) \quad (\text{Global shrinkage}) \\ c^2 &\sim \text{Inverse-Gamma}(\nu/2, \nu s^2/2) \end{aligned}\]
-
Hyperparameter Justification: The package supports two ways to set the crucial global scale \(\tau_0\):
-
Fixed Default (
scale_global):horseshoe(scale_global = 1). This is the package default, a robust, general-purpose choice. -
Elicited Prior (
par_ratio): Sets \(\tau_0\) based on the prior belief about the number of effective (non-zero) subgroups, \(p_{eff}\), out of the total \(L\) (Bornkamp 2025; Piironen and Vehtari 2017). This is specified via \(\text{par\_ratio} = \frac{p_{eff}}{L - p_{eff}}\). This scales the prior based on sample size (\(N\)) and the expected sparsity.
-
Fixed Default (
Regularized Horseshoe Implementation in brms
The brms package implements the Regularized Horseshoe prior using the function horseshoe(...). The regularization term (the “slab”) is added to the standard Horseshoe to ensure robustness and improve computational stability in Stan.
- Horseshoe Function and Key Arguments
he Horseshoe prior is set using the function call:
horseshoe(df = 1, scale_global = 1, df_global = 1, scale_slab = 2, df_slab = 4, par_ratio = NULL, autoscale = TRUE, main = FALSE)| Horseshoe Argument | Theoretical Equivalent | Role and Implication |
|---|---|---|
| \(\mathbf{scale\_global}\) | \(\tau_0\) | Scale of the \(\text{Student-}t\) prior on the global shrinkage parameter \(\tau\) Lower values \(\implies\) more aggressive global shrinkage. |
| \(\mathbf{par\_ratio}\) | \(\frac{p_{eff}}{L - p_{eff}}\) | Alternative to \(\mathbf{scale\_global}\). Specifies the ratio of expected non-zero to zero coefficients, automatically calculating \(\tau_0\). |
| \(\mathbf{df}\) | DOF for \(\lambda_l\) | Degrees of freedom of \(\text{Student-}t\) prior on local shrinkage parameters (\(\lambda_l\)). Default \(\mathbf{df=1}\) gives the standard half-Cauchy. |
| \(\mathbf{scale\_slab}\) | \(s\) | Scale of the regularization slab. Default \(\mathbf{scale\_slab=2}\) prevents large coefficients from being over-shrunk. |
| \(\mathbf{autoscale}\) | Scaling by \(\sigma\) | Logical. If \(\mathbf{TRUE}\) (default), the prior is automatically scaled using the residual standard deviation (\(\sigma\)). |
- Relationship to Justification: The \(\mathbf{brms}\) function offers two primary ways to set the global shrinkage level, which controls the overall sparsity of the model:
-
Fixed Global Scale: Using the \(\mathbf{scale\_global}\) argument, often set to \(\mathbf{1.0}\) as a general-purpose default. This value is internally multiplied by the residual standard deviation (\(\sigma\)) when
autoscale = TRUE. - Elicited Global Scale (Recommended): Using the \(\mathbf{par\_ratio}\) argument, which allows the user to specify their prior belief about the sparsity of the model (i.e., the expected number of true non-zero coefficients, \(p_{eff}\)). This is the recommended approach as it accounts for the number of coefficients \(L\) and the sample size \(N\) to determine the optimal \(\tau_0\).
-
Choosing \(\mathbf{scale\_global}\) and \(\mathbf{par\_ratio}\): The choice of \(\tau_0\) (controlled by \(\mathbf{scale\_global}\) or \(\mathbf{par\_ratio}\)) is crucial, as it sets the overall magnitude of shrinkage.
-
Fixed Default (\(\mathbf{scale\_global=1}\)):
- Use when: You want a simple, robust, general-purpose prior without explicit sparsity knowledge.
- Caveat: May result in insufficient shrinkage if the true number of non-zero coefficients is small.
-
Elicited Sparsity (\(\mathbf{par\_ratio}\)):
- Calculation: \(\mathbf{par\_ratio} = \frac{p_{eff}}{L - p_{eff}}\), where \(p_{eff}\) is the expected number of non-zero coefficients and \(L\) is the total number of coefficients.
- Use when: You have a strong prior belief about the number of relevant effects. This is generally preferred for adaptive shrinkage as it scales \(\tau_0\) appropriately.
-
| \(\mathbf{par\_ratio}\) Value | Expected \(p_{eff}\) (Sparsity) | Interpretation of Shrinkage |
|---|---|---|
| Small (e.g., 0.05) | Very Few Non-Zero Coefficients | High Shrinkage: \(\tau_0\) is small, aggressively shrinking coefficients towards zero. |
| Moderate (e.g., 0.2) | A Moderate Fraction is Non-Zero | Moderate Shrinkage: A less aggressive \(\tau_0\), allowing more coefficients to remain unshrunk. |
| \(\mathbf{par\_ratio}\) Ignored | \(\mathbf{scale\_global}\) used | Uses a fixed \(\tau_0\) that doesn’t explicitly depend on expected sparsity. |
Recommendation: While \(\mathbf{scale\_global=1}\) is the quick default, specifying \(\mathbf{par\_ratio}\) based on expert knowledge of expected sparsity is the most theoretically sound and adaptive method for setting the global shrinkage scale.
3.8.4 R2D2 Prior (Shrunk Terms)
Concept: This prior is uniquely derived by first placing a prior on the model’s coefficient of determination, \(R^2\), which quantifies the proportion of variance explained by the predictors. This is arguably more intuitive than specifying priors directly on coefficients.
Hierarchical Specification (Marginal Version): \[ \begin{aligned} \beta_{2,l} &\sim DE(\sigma \sqrt{\phi_l \omega / 2}) \quad (\text{Double-Exponential/Laplace kernel}) \\ \boldsymbol{\phi} &= (\phi_1, \dots, \phi_L) \sim \text{Dirichlet}(a_\pi, \dots, a_\pi) \quad (\text{Local shrinkage}) \\ \omega &\sim \text{Beta-Prime}(a, b) \quad (\text{Global shrinkage}) \end{aligned} \] This is equivalent to assuming that the proportion of variance explained by the interaction terms follows \(R^2 = \frac{\omega}{1+\omega} \sim \text{Beta}(a,b)\).
-
Justification of Hyperparameters:
Zhang et al. (2022) provide clear guidance. A fully automatic approach is to set \(b=0.5\) to achieve Cauchy-like heavy tails.
The concentration parameter \(a_\pi\) controls sparsity. A smaller \(a_\pi\) (e.g., 0.2) implies higher shrinkage, concentrating the prior variance on fewer coefficients, while a larger \(a_\pi\) (e.g., 0.5) spreads the variance more evenly, implying lower shrinkage.
-
Default Recommendation: Set \(b=0.5\) and offer options for the user based on desired shrinkage strength:
- High Shrinkage: \(a_\pi=0.2\)
- Low Shrinkage: \(a_\pi=0.5\)
R2D2 Prior Implementation in brms
The brms package implements this prior using the function R2D2(...), providing a high-level parametrization that is more intuitive than setting the parameters \(a\), \(b\), and \(a_\pi\) directly. This function translates desired \(\mathbf{R^2}\) properties and shrinkage strength into the underlying prior distribution’s parameters.
- R2D2 Function and Key Arguments
The R2D2 prior is set using the function call:
R2D2(mean_R2 = 0.5, prec_R2 = 2, cons_D2 = 0.5, autoscale = TRUE, main = FALSE)| R2D2 Argument | Theoretical Equivalent | Role and Implication |
|---|---|---|
| \(\mathbf{mean\_R2}\) | \(\mathbb{E}[R^2]\) | Sets the prior expected value for the model’s fixed effect \(R^2\). |
| \(\mathbf{prec\_R2}\) | \(a+b\) (Concentration) | Controls the precision of the Beta prior on \(R^2\). Lower values mean a flatter prior. |
| \(\mathbf{cons\_D2}\) | \(a_{\pi}\) | The concentration parameter for the \(\text{Dirichlet}\) distribution on the variance components (local shrinkage). Lower values imply higher shrinkage/sparsity. |
| \(\mathbf{autoscale}\) | Scaling by \(\sigma\) | Logical. If \(\mathbf{TRUE}\) (default), the prior is automatically scaled using the residual standard deviation (\(\sigma\)). This ensures the prior is scale-invariant. |
-
Relationship to Justification: The
brmsfunction parameters are directly related to the justified hierarchical parameters:- The \(\mathbf{cons\_D2}\) parameter is the direct counterpart to the sparsity parameter \(\mathbf{a_{\pi}}\). The default recommendation \(\mathbf{cons\_D2 = 0.5}\) corresponds precisely to the \(\mathbf{a_{\pi}=0.5}\) low shrinkagesetting.
- The \(\mathbf{mean\_R2}\) and \(\mathbf{prec\_R2}\) arguments define the \(\text{Beta}(a,b)\) prior on \(R^2\), where: \[a = \text{mean_R2} \times \text{prec_R2}\] \[b = (1 - \text{mean_R2}) \times \text{prec_R2}\] The choice of \(\mathbf{prec\_R2=2}\) (the default) alongside \(\mathbf{mean\_R2=0.5}\) results in \(a=1\) and \(b=1\), which is a Uniform prior on \(R^2 \sim \text{Beta}(1, 1)\), representing a maximally weak prior belief on global fit.
- The essential \(\mathbf{b=0.5}\) setting for Cauchy-like heavy tails (preventing over-shrinkage) is implicitly handled by the R2D2 function’s underlying structure and does not require manual specification.
-
Choosing \(\mathbf{mean\_R2}\) and \(\mathbf{prec\_R2}\) : Choosing these parameters involves quantifying your prior knowledge about the model’s overall predictive power before observing the data. They define a \(\text{Beta}(a,b)\) distribution on \(R^2\).
-
Choosing \(\mathbf{mean\_R2}\) (Prior Expected \(R^2\)): This sets the center of your prior belief.
Default (0.5): Use if you have no strong prior knowledge. It implies a neutral expectation that the fixed effects will explain half of the variance.
Informative Low (e.g., 0.1 to 0.3): Use if you expect a very weak or noisy relationship.
Informative High (e.g., 0.7 to 0.9): Use if you expect a highly predictive model based on strong prior evidence.
Choosing \(\mathbf{prec\_R2}\) (Prior Confidence in \(R^2\)): This determines the concentration or certainty of the Beta prior around the chosen \(\text{mean_R2}\).
-
| \(\mathbf{prec\_R2}\) |
Resulting Beta Parameters (\(\text{mean_R2}=0.5\)) |
Interpretation |
|---|---|---|
| 2 (Default) | \(a=1, b=1\) (\(\text{Uniform}\)) | Weakly Informative: All \(R^2\) values are equally likely. |
| 5 to 10 | \(a=2.5, b=2.5\) to \(a=5, b=5\) | Moderately Informative: The prior is bell-shaped and centered at 0.5, constraining \(R^2\) away from extreme values (0 and 1). |
| > 20 | Highly Peaked | Highly Informative: Use only with very strong external evidence for the \(R^2\) value. |
Recommendation: The Default \(\mathbf{mean\_R2 = 0.5}\) and \(\mathbf{prec\_R2 = 2}\) provides the most flexible, weakly informative setting for \(R^2\), which is generally recommended unless strong prior knowledge dictates otherwise.
3.8.5 Normal Hierarchical Prior (for One-Variable-at-a-Time Model)
This is the standard hierarchical model for subgroup analysis, which assumes that treatment effects for levels within a subgrouping variable are drawn from a common normal distribution. It’s less complex than the Horseshoe or R2D2 priors but provides effective shrinkage, especially when the number of subgroup levels is small.
- Hierarchical Specification: For a given subgrouping variable \(j\) with levels \(k=1, \dots, K_j\):
\[\begin{aligned} \beta_{2,j,k} &\sim \mathcal{N}(\mu_{2,j}, \tau^2_{2,j}) \quad (\text{Level-specific treatment effects}) \\ \mu_{2,j} &\sim \mathcal{N}(0, s^2) \quad (\text{Common mean effect}) \\ \tau_{2,j} &\sim \text{Half-Normal}(a) \quad (\text{Between-subgroup heterogeneity}) \end{aligned} \]
- Justification of Hyperparameters: The key is setting the scale parameter a for the Half-Normal prior on the between-subgroup standard deviation, \(\tau_{2,j}\), as this controls the degree of shrinkage. As recommended by Bornkamp (2025), this can be linked to the planned treatment effect ( \(\delta_{plan}\)) from the trial protocol, making the prior choice interpretable and consistent across different endpoints. The choice of a implies a prior on the plausible difference in treatment effects between any two subgroups.
3.9 Estimation (MCMC)
The joint posterior distribution of all parameters is complex and has no closed-form solution. We use Markov Chain Monte Carlo (MCMC) methods to draw samples from this distribution. Specifically, the package uses the No-U-Turn Sampler (NUTS), an efficient variant of Hamiltonian Monte Carlo (HMC), as implemented in Stan via the brms package. The output is a set of posterior samples (e.g., 4000 draws) representing the joint posterior distribution.
3.10 Standardization for Marginal Effects (G-computation)
To obtain interpretable marginal effects for each subgroup, the package implements a standardization (G-computation) procedure. This process is repeated for each MCMC draw to generate a full posterior distribution of the marginal effect.
For a single MCMC draw, the step-by-step process is:
Select Parameters: Take one draw from the joint posterior distribution for all model parameters (all \(\boldsymbol{\beta}\)’s, \(\sigma\), \(\phi\), etc.).
-
Create Counterfactuals: For every patient \(i\) in the original dataset, create two scenarios:
- Scenario A: Patient \(i\)’s covariates, with treatment \(s_i=0\) (Control).
- Scenario B: Patient \(i\)’s covariates, with treatment \(s_i=1\) (Treatment).
For Scenario A (control), all treatment-by-subgroup interaction dummies are set to 0. For Scenario B (treatment), each interaction dummy
trt_subgroupLEVELis set to 1 if the patient belongs to that subgroup level, 0 otherwise. Predict Outcomes: Use the model formula and the parameters from Step 1 to predict the outcome for every patient under both Scenario A (\(\hat{\mu}_{i,0}\)) and Scenario B (\(\hat{\mu}_{i,1}\)). The
brms::posterior_epred()function automatically handles the computation using the model’s design matrix and parameters.-
Average within Subgroups: For a subgroup of interest \(l\) (e.g., “Female”):
- Calculate the average predicted outcome under control: \(\hat{\mu}_{l,0} = \text{mean}(\hat{\mu}_{i,0})\) for all \(i\) where \(x_{il}=1\).
- Calculate the average predicted outcome under treatment: \(\hat{\mu}_{l,1} = \text{mean}(\hat{\mu}_{i,1})\) for all \(i\) where \(x_{il}=1\).
-
Calculate Effect Measure: Compute the marginal effect for subgroup \(l\) from these averaged predictions. This depends on the endpoint type:
- Continuous: Mean Difference = \(\hat{\mu}_{l,1} - \hat{\mu}_{l,0}\).
- Binary: Odds Ratio = \(\frac{\hat{\mu}_{l,1} / (1 - \hat{\mu}_{l,1})}{\hat{\mu}_{l,0} / (1 - \hat{\mu}_{l,0})}\) (where \(\hat{\mu}\) is the predicted probability).
- Count: Rate Ratio = \(\hat{\mu}_{l,1} / \hat{\mu}_{l,0}\) (where \(\hat{\mu}\) is the predicted rate).
- Time-to-Event: Average Hazard Ratio (AHR). This is the most complex, as it requires predicting the full survival curve \(S_i(t)\) for each patient. The marginal survival curves \(\hat{S}_{l,0}(t)\) and \(\hat{S}_{l,1}(t)\) are calculated, and the AHR is computed from them, as the marginal curves may not be proportional.
Repeat: This process (Steps 1-5) is repeated for all MCMC draws, yielding a full posterior distribution (e.g., 4000 values) for the marginal effect of each subgroup. The final point estimate (median) and 95% credible interval are taken from this distribution.
4 Mapping of Statistical Methods to bonsaiforest2 Functions
This section connects the statistical methodology (Section 3) to the package functions. The package provides a three-level architecture: high-level user interface functions, modular worker functions for advanced control, and internal helpers.
4.1 High-Level User Interface
These are the main functions users typically call in their analysis workflow:
-
- Maps to: Sections 3.4 (Global Model), 3.5 (One-Variable-at-a-Time Model), 3.7 (Endpoints), 3.8 (Priors), 3.9 (Estimation).
-
Action: Main entry point that orchestrates the complete modeling workflow.
- Internally calls
prepare_formula_model()to construct the three-componentbrmsformulastructure and validate inputs. - Then calls
fit_brms_model()to run MCMC sampling viabrms::brm()and Stan. - Stores essential metadata (treatment variable, response type, original data) as attributes on the fitted model object for downstream analysis.
- Handles all formula components:
unshrunktermeffect(unshrunk terms),shprogeffect(shrunk prognostic), andshpredeffect(shrunk predictive). - Returns a
brmsfitobject with enriched attributes.
- Internally calls
-
- Maps to: Section 3.10 (Standardization / G-computation).
-
Action: User-facing wrapper for marginal effect summarization.
- Internally calls
estimate_subgroup_effects()to perform the G-computation procedure. - Automatically extracts treatment variable, response type, and data from model attributes (set by
fit_brms_model()). - Auto-detects subgroups from all formula components by searching for treatment interactions in
unshrunktermeffect,shprogeffect, andshpredeffect. - Returns a
subgroup_summaryS3 object containing estimates, credible intervals, and metadata.
- Internally calls
-
- Maps to: Final visualization of Section 3.10 results.
-
Action: S3 method for
subgroup_summaryobjects. Generates publication-ready forest plots from marginal effect estimates with median and 95% credible intervals.
4.2 Advanced/Modular Interface
For users requiring fine-grained control over the modeling process, the package exposes intermediate worker functions:
-
-
Parameters:
-
data: Dataset containing all variables -
response_formula: Response specification (e.g.,outcome ~ trtorSurv(time, status) ~ trt) -
unshrunk_terms_formula: Unshrunk terms specification (may include main effects and treatment interactions) -
shrunk_prognostic_formula: Prognostic main effects to be regularized (optional) -
shrunk_predictive_formula: Treatment interactions to be regularized (optional) -
response_type: One of"binary","count","continuous", or"survival" -
stratification_formula: Stratification variable (optional)
-
-
Action: Constructs and validates the three-component
brmsformulastructure:-
unshrunktermeffect: Unshrunk terms with weakly informative priors. Can include both prognostic effects (intercept, main treatment effect, pre-specified covariates) and predictive effects (treatment interactions). Specified viaunshrunk_terms_formula. -
shprogeffect: Shrunk prognostic effects with strong regularization priors. Specified viashrunk_prognostic_formula. -
shpredeffect: Shrunk predictive effects (treatment interactions) with strong regularization priors. Specified viashrunk_predictive_formula.
-
- Supports flexible interaction syntax: colon notation (
trt:subgroup), star notation (trt*subgroup), or random effects notation ((trt || subgroup)), which can be mixed in the same model. - Validates model hierarchy: checks if predictive terms have corresponding prognostic main effects, issuing warnings if violations are detected (star notation automatically includes main effects and is excluded from this check).
- Applies automatic contrast coding: one-hot encoding for shrunk terms (all levels represented for exchangeability) and dummy encoding for unshrunk terms (reference level dropped).
- Constructs design matrices \(\mathbf{X_n}, \mathbf{X_p}, \mathbf{Z_n}, \mathbf{Z_p}, \mathbf{U}\) from formula specifications.
- Returns a list containing the complete
brmsformula, transformed data with proper contrasts, response type, treatment variable name, Stan parameter names, and metadata.
-
Parameters:
-
-
Action: Executes the MCMC sampling and model fitting.
- Takes output from
prepare_formula_model()or manually specified formula components. - Assigns priors to each formula component based on user specifications or defaults.
- Calls
brms::brm()to compile Stan code and run MCMC chains. - Handles endpoint-specific configurations (Section 3.7): likelihood functions, link functions, and stratification.
- Attaches essential metadata as attributes to the fitted model for use by
estimate_subgroup_effects().
- Takes output from
-
Action: Executes the MCMC sampling and model fitting.
-
-
Action: Implements the complete G-computation procedure described in Section 3.10:
- For each MCMC draw, creates counterfactual datasets (all patients as treated vs. control).
- Generates posterior predictions using
brms::posterior_epred()orbrms::posterior_survfit(). - Averages predictions within each subgroup to obtain marginal expected outcomes.
- Computes endpoint-specific effect measures:
- Continuous: Mean difference (\(\hat{\mu}_{l,1} - \hat{\mu}_{l,0}\))
- Binary: Odds ratio
- Count: Rate ratio
- Survival: Average hazard ratio (AHR)
- Auto-detects subgroups by parsing all formula components for treatment interactions (both colon
:syntax and pipe||syntax for random effects). - Returns raw posterior draws and summary statistics (median, credible intervals).
-
Action: Implements the complete G-computation procedure described in Section 3.10: