Skip to contents

RobinCar2 implements survival analysis methods for testing treatment effects with log-rank tests and estimating hazard ratios using score functions. This vignette provides an overview of the underlying algorithms.

Overview of Survival Analysis Methods

Survival analysis is performed with the robin_surv function, and the syntax is:

robin_surv(
  Surv(time, event) ~ group * strata + covariates,
  treatment = group ~ strata,
  data = df
)

Depending on the choice of strata and covariates, the following four methods are available:

  • Standard log-rank test without strata or covariates by omitting strata and covariates
  • Stratified log-rank test by specifying strata but no covariates
  • Covariate adjusted log-rank test by specifying covariates but no strata
  • Covariate adjusted and stratified log-rank test by specifying both strata and covariates

Let’s go through these in a simple example. We start with the standard log-rank test:

library(RobinCar2)
robin_surv(
  Surv(time, status) ~ sex,
  treatment = sex ~ 1,
  data = surv_data
)
#> Model        :  Surv(time, status) ~ sex 
#> Randomization:  sex ~ 1  ( Simple )
#> 
#> Contrast     :  Hazard ratio
#> 
#>                  Estimate Std.Err Z Value Pr(>|z|)   
#> Male v.s. Female  0.53343 0.16727   3.189 0.001428 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Test         :  Log-Rank
#> 
#>                  Test Stat. Pr(>|z|)   
#> Male v.s. Female     3.2135 0.001311 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can perform the stratified log-rank test by adding a strata argument:

robin_surv(
  Surv(time, status) ~ sex * strata,
  treatment = sex ~ strata,
  data = surv_data
)
#> Model        :  Surv(time, status) ~ sex * strata 
#> Randomization:  sex ~ strata  ( Simple )
#> 
#> Contrast     :  Hazard ratio
#> 
#>                  Estimate Std.Err Z Value Pr(>|z|)   
#> Male v.s. Female  0.55482 0.17063  3.2516 0.001147 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Test         :  Log-Rank
#> 
#>                  Test Stat. Pr(>|z|)   
#> Male v.s. Female     3.2856 0.001018 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We could also just use covariate adjustment:

robin_surv(
  Surv(time, status) ~ sex + age + meal.cal,
  treatment = sex ~ 1,
  data = surv_data
)
#> Model        :  Surv(time, status) ~ sex + age + meal.cal 
#> Randomization:  sex ~ 1  ( Simple )
#> 
#> Contrast     :  Hazard ratio
#> 
#>                  Estimate Std.Err Z Value Pr(>|z|)  
#> Male v.s. Female  0.47686 0.18608  2.5626  0.01039 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Test         :  Log-Rank
#> 
#>                  Test Stat. Pr(>|z|)   
#> Male v.s. Female     2.6858 0.007236 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or we combine both stratification and covariate adjustment:

robin_surv(
  Surv(time, status) ~ sex * strata + age + meal.cal,
  treatment = sex ~ strata,
  data = surv_data
)
#> Model        :  Surv(time, status) ~ sex * strata + age + meal.cal 
#> Randomization:  sex ~ strata  ( Simple )
#> 
#> Contrast     :  Hazard ratio
#> 
#>                  Estimate Std.Err Z Value Pr(>|z|)   
#> Male v.s. Female  0.55219 0.19133  2.8861   0.0039 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Test         :  Log-Rank
#> 
#>                  Test Stat. Pr(>|z|)   
#> Male v.s. Female     2.9496 0.003181 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Details of the Methods

The four methods introduced above are implemented based on the corresponding score functions U(θ)U(\theta) with corresponding variance estimators σ2(θ)\sigma^{2}(\theta), where θ\theta is the log hazard ratio.

The log-rank test statistic is defined as

𝒯L=nUL(0)/σL(0) \mathcal{T}_{L} = \sqrt{n} U_{L}(0) / \sigma_{L}(0)

and it asymptotically follows a standard normal distribution under the null hypothesis of no treatment effect, i.e. θ=0\theta = 0.

For estimation of θ\theta, we find the root θ̂\hat{\theta} of the score function U(θ)U(\theta) such that U(θ̂)=0U(\hat{\theta}) = 0. The standard error of θ̂\hat{\theta} is derived from the variance estimator σ(θ̂)\sigma(\hat{\theta}), with specific formulas depending on the method used and detailed below.

Standard analysis without strata or covariates

Following Section 2 in Ye, Shao, and Yi (2023), the score function from the partial likelihood under the Cox proportional hazards model λ1(t)=λ0(t)exp(θ)\lambda_{1}(t) = \lambda_{0}(t) \exp(\theta) is given by:

UL(θ)=1ni=1n0τ{Iiexp(θ)Y¯1(t)Y¯0(t)+exp(θ)Y¯1(t)}dNi(t)=1nj=1mIjexp(θ)Y¯1(tj)Y¯0(tj)+exp(θ)Y¯1(tj) \begin{align*} U_{L}(\theta) &= \frac{1}{n} \sum_{i=1}^{n} \int_{0}^{\tau} \left\{ I_{i} - \frac{\exp(\theta) \overline{Y}_{1}(t)}{\overline{Y}_{0}(t) + \exp(\theta) \overline{Y}_{1}(t)} \right\} dN_i(t)\\ &= \frac{1}{n} \sum_{j=1}^{m} I_{j} - \frac{\exp(\theta) \overline{Y}_{1}(t_{j})}{\overline{Y}_{0}(t_{j}) + \exp(\theta) \overline{Y}_{1}(t_{j})} \end{align*}

where we switched from the summation over all patients i=1,,ni = 1, \dotsc, n to the summation over all patients with events j=1,,mj = 1, \dotsc, m where mnm \leq n. The treatment indicator is Ij=1I_{j} = 1 if patient jj is in treatment group 1, and Ij=0I_{j} = 0 if patient jj is in treatment group 0. The proportions of patients at risk at time tjt_{j}, per arm k=0,1k = 0, 1 is given by

Y¯k(tj)=1ni=1n𝕀(Ii=k)𝕀(titj) \overline{Y}_{k}(t_{j}) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(I_{i} = k) \mathbb{I}(t_{i} \geq t_{j})

where tit_{i} is the potentially right-censored time of patient ii and 𝕀()\mathbb{I}(\cdot) is the indicator function.

The corresponding variance estimate is given by this adapted version using the correction factor cθ(t)c_{\theta}(t) which accounts for ties in the survival times in the log-rank test.

σL2(θ)=1ni=1n0τexp(θ)Y¯0(t)Y¯1(t)cθ(t)Y¯θ(t)2dNi(t)=1nj=1mexp(θ)Y¯0(tj)Y¯1(tj)cθ(tj)Y¯θ(tj)2 \begin{align*} \sigma_{L}^{2}(\theta) &= \frac{1}{n} \sum_{i=1}^{n} \int_{0}^{\tau} \frac{\exp(\theta) \overline{Y}_{0}(t) \overline{Y}_{1}(t) c_{\theta}(t)}{\overline{Y}_{\theta}(t)^{2}} dN_i(t) \\ &= \frac{1}{n} \sum_{j=1}^{m} \frac{\exp(\theta) \overline{Y}_{0}(t_{j}) \overline{Y}_{1}(t_{j}) c_{\theta}(t_{j})}{\overline{Y}_{\theta}(t_{j})^{2}} \end{align*}

where we abbreviated

Y¯θ(t)=Y¯0(t)+exp(θ)Y¯1(t) \overline{Y}_{\theta}(t) = \overline{Y}_{0}(t) + \exp(\theta) \overline{Y}_{1}(t)

and similarly

Y¯θ(tj)=Y¯0(tj)+exp(θ)Y¯1(tj). \overline{Y}_{\theta}(t_{j}) = \overline{Y}_{0}(t_{j}) + \exp(\theta) \overline{Y}_{1}(t_{j}).

Let δi\delta_{i} be the binary event indicator for patient ii, and let the number of events at time tjt_{j} be denoted by m(tj)=i=1nδi𝕀(ti=tj)m(t_{j}) = \sum_{i=1}^{n} \delta_{i} \mathbb{I}(t_{i} = t_{j}). We define then the correction factor cθ(tj)c_{\theta}(t_{j}) as follows:

cθ(tj)={1if m(tj)=1nY¯θ(tj)m(tj)nY¯θ(tj)1if m(tj)>1 c_{\theta}(t_{j}) = \begin{cases} 1 & \text{if } m(t_{j}) = 1 \\ \frac{n \overline{Y}_{\theta}(t_{j}) - m(t_{j})}{n \overline{Y}_{\theta}(t_{j}) - 1} & \text{if } m(t_{j}) > 1 \end{cases}

Furthermore, we note that this correction factor is only used when calculating the log-rank test statistic, which is given at θ=0\theta = 0 by

𝒯L=nUL(0)/σL(0). \mathcal{T}_{L} = \sqrt{n} U_{L}(0) / \sigma_{L}(0).

However, when we estimate the log hazard ratio θ\theta by finding the root θ̂L\hat{\theta}_{L} of the score function such that UL(θ̂L)=0U_{L}(\hat{\theta}_{L}) = 0, we set cθ(tj)1c_{\theta}(t_{j}) \equiv 1.

The standard error of θ̂L\hat{\theta}_{L} is given by

1nσL(θ̂L). \frac{1}{\sqrt{n} \sigma_{L}(\hat{\theta}_{L})}.

This score function and the variance estimator are implemented in the internal function RobinCar2:::h_lr_score_no_strata_no_cov().

Stratified analysis without covariates

Following Section S2.3 in Ye, Shao, and Yi (2023), the score function for the stratified log-rank test is given by:

USL(θ)=1nzi:Zi=z0τ{Iiexp(θ)Y¯z1(t)Y¯z0(t)+exp(θ)Y¯z1(t)}dNi(t)=1nzj(z)Ijexp(θ)Y¯z1(tj)Y¯z0(tj)+exp(θ)Y¯z1(tj) \begin{align*} U_{SL}(\theta) &= \frac{1}{n} \sum_{z} \sum_{i: Z_{i} = z} \int_{0}^{\tau} \left\{ I_{i} - \frac{\exp(\theta) \overline{Y}_{z1}(t)}{\overline{Y}_{z0}(t) + \exp(\theta) \overline{Y}_{z1}(t)} \right\} dN_i(t)\\ &= \frac{1}{n} \sum_{z} \sum_{j(z)} I_{j} - \frac{\exp(\theta) \overline{Y}_{z1}(t_{j})}{\overline{Y}_{z0}(t_{j}) + \exp(\theta) \overline{Y}_{z1}(t_{j})} \end{align*}

where the last sum is over the unique event times jj in stratum zz, therefore written here simply as indices j(z)j(z). The proportions of patients at risk at time tjt_{j}, per arm k=0,1k = 0, 1 in stratum zz are denoted by Y¯zk(tj)\overline{Y}_{zk}(t_{j}). Importantly, the proportion is with regards to the total number of patients across all strata, not just within stratum zz.

So we can see that the score function is a sum over strata zz of the standard log-rank score function. In the same way, the variance estimator σSL2(θ)\sigma_{SL}^{2}(\theta) is the sum over strata zz of the standard log-rank variance estimator σL2(θ)\sigma_{L}^{2}(\theta).

The standard error of the correspondingly estimated log hazard ratio θ̂SL\hat{\theta}_{SL} is given by

1nσSL(θ̂SL). \frac{1}{\sqrt{n} \sigma_{SL}(\hat{\theta}_{SL})}.

One small important detail is that the number of patients nn in the denominator is always the total number of patients across all strata, not the number of patients in stratum zz. This is the reason why the standard log-rank score function RobinCar2:::h_lr_score_no_strata_no_cov() has an argument n, which is used here by the stratified log-rank score function RobinCar2:::h_lr_score_strat() to pass on the total number of patients.

Covariate adjusted analysis without strata

A little bit more complex is the calculation of the score function for the covariate adjusted log-rank test, which following Section 3 in Ye, Shao, and Yi (2023) is given by:

UCL(θ)=UL(θ)1ni=1n{Ii(XiX¯)β̂1(θL)(1Ii)(XiX¯)β̂0(θL)}. U_{CL}(\theta) = U_{L}(\theta) - \frac{1}{n} \sum_{i=1}^{n} \{ I_{i}(X_{i} - \overline{X})^{\top} \hat{\beta}_{1}(\theta_{L}) - (1 - I_{i})(X_{i} - \overline{X})^{\top} \hat{\beta}_{0}(\theta_{L}) \}.

Here we have to explain a lot of notation. The score function UCL(θ)U_{CL}(\theta) is based on the standard log-rank score function UL(θ)U_{L}(\theta), but we subtract a correction term that accounts for the covariates XiX_{i} of patient ii. The covariates are assumed to be centered, i.e. X¯=1ni=1nXi\overline{X} = \frac{1}{n} \sum_{i=1}^{n} X_{i} is the mean of the covariates across all patients.

Interestingly on the right hand side we see also θL\theta_{L}. This is supposed to be the log hazard ratio estimated by the standard log-rank analysis, i.e. θ̂L\hat{\theta}_{L}, when finding the root of UCL(θ)U_{CL}(\theta) to estimate the log hazard ratio θ̂CL\hat{\theta}_{CL}. The reason for this is that the asymptotic guaranteed efficiency gain of the covariate adjusted log hazard ratio estimator is only proven using this version of the score function, see Section 3 in Ye, Shao, and Yi (2023). On the other hand, for calculating the covariate adjusted log-rank score test statistic, we set both θ=0\theta = 0 and θL=0\theta_{L} = 0.

How are the regression coefficients β̂1\hat{\beta}_{1} and β̂0\hat{\beta}_{0} in the correction term estimated given a log hazard ratio θ\theta?

First, the so-called derived outcomes OikO_{ik} for patients ii and arms k=0,1k=0,1 need to be calculated. Let Yik(t)Y_{ik}(t) be the indicator whether patient ii in treatment arm k=0,1k = 0,1 is at risk at time tt, and let δi\delta_{i} be the binary event indicator for patient ii. Then the derived outcomes are defined as:

Oik(θ)=0τ{exp(θ)Y¯1(t)}(1k){Y¯0(t)}kY¯θ(t){dNik(t)Yik(t)exp(θ)dN¯(t)Y¯θ(t)}=j=1m{exp(θ)Y¯1(tj)}(1k){Y¯0(tj)}kY¯θ(tj){δi𝕀(tj=ti)Yik(tj)exp(θ)m(tj)/nY¯θ(tj)} \begin{align*} O_{ik}(\theta) &= \int_{0}^{\tau} \frac{ \{\exp(\theta) \overline{Y}_{1}(t)\}^{(1-k)} \{\overline{Y}_{0}(t)\}^{k} }{\overline{Y}_{\theta}(t)} \left\{ dN_{ik}(t) - \frac{Y_{ik}(t) \exp(\theta) d\overline{N}(t)} {\overline{Y}_{\theta}(t)} \right\} \\ &= \sum_{j=1}^{m} \frac{ \{\exp(\theta) \overline{Y}_{1}(t_{j})\}^{(1-k)} \{\overline{Y}_{0}(t_{j})\}^{k} }{\overline{Y}_{\theta}(t_{j})} \left\{ \delta_{i} \mathbb{I}(t_{j} = t_{i}) - \frac{Y_{ik}(t_{j}) \exp(\theta) m(t_{j}) / n} {\overline{Y}_{\theta}(t_{j})} \right\} \end{align*}

where we can see that the count measure integral was again replaced by a sum over the unique event times tj,j=1,,mt_{j}, j = 1, \dotsc, m.

Second, given the derived outcomes Oik(θ)O_{ik}(\theta), the coefficients estimate β̂k(θ)\hat{\beta}_{k}(\theta) for treatment arm k=0,1k = 0, 1 is defined as the solution to the least squares problem with the centered covariates XiX¯kX_{i} - \overline{X}_{k} (therefore no constant intercept column in XiX_{i}) and responses Oik(θ)O_{ik}(\theta).

Finally, we can define the variance estimator as follows:

σCL2(θ)=σL2(θ)π̂(1π̂)(β̂0(θL)+β̂1(θL))Σ̂X(β̂0(θL)+β̂1(θL)) \sigma_{CL}^{2}(\theta) = \sigma_{L}^{2}(\theta) - \hat{\pi}(1 - \hat{\pi}) (\hat{\beta}_{0}(\theta_{L}) + \hat{\beta}_{1}(\theta_{L}))^{\top} \hat{\Sigma}_{X} (\hat{\beta}_{0}(\theta_{L}) + \hat{\beta}_{1}(\theta_{L}))

where π̂=1ni=1nIi\hat{\pi} = \frac{1}{n} \sum_{i=1}^{n} I_{i} is the proportion of patients in treatment arm 1, and Σ̂X\hat{\Sigma}_{X} is the sample covariance matrix of the centered covariates XiX¯X_{i} - \overline{X} (where the average is now taken across all patients, not just within treatment arm kk).

Similarly as the for the score function defined above, the right hand side of the variance estimator is evaluated at θ=0\theta = 0 and θL=0\theta_{L} = 0 when calculating the covariate adjusted log-rank test statistic.

However, when estimating the covariate adjusted log hazard ratio θ̂CL\hat{\theta}_{CL}, the regression coefficients β̂0(θ̂L)\hat{\beta}_{0}(\hat{\theta}_{L}) and β̂1(θ̂L)\hat{\beta}_{1}(\hat{\theta}_{L}) are fixed at the unadjusted log hazard ratio θ̂L\hat{\theta}_{L} in this expression for the standard error of θ̂CL\hat{\theta}_{CL}:

σL2(θ̂CL)π̂(1π̂)(β̂0(θ̂L)+β̂1(θ̂L))Σ̂X(β̂0(θ̂L)+β̂1(θ̂L))nσL2(θ̂CL) \frac{\sqrt{ \sigma_{L}^{2}(\hat{\theta}_{CL}) - \hat{\pi}(1 - \hat{\pi}) (\hat{\beta}_{0}(\hat{\theta}_{L}) + \hat{\beta}_{1}(\hat{\theta}_{L}))^{\top} \hat{\Sigma}_{X} (\hat{\beta}_{0}(\hat{\theta}_{L}) + \hat{\beta}_{1}(\hat{\theta}_{L})) }}{ \sqrt{n} \sigma_{L}^{2}(\hat{\theta}_{CL}) }

This is the published version of the variance estimator when used for estimating the standard error of θ̂CL\hat{\theta}_{CL} via σL2(θ̂CL)\sigma_{L}^{2}(\hat{\theta}_{CL}), which is the default in RobinCar2. However, there is also an alternative version which we call “unadjusted” where σL2(θ̂CL)\sigma_{L}^{2}(\hat{\theta}_{CL}) on the right hand side of the definition is replaced by σL2(θ̂L)\sigma_{L}^{2}(\hat{\theta}_{L}). That is, the log-rank score variance estimator is evaluated at the unadjusted log hazard ratio estimate θ̂L\hat{\theta}_{L}, rather than at the covariate adjusted log hazard ratio estimate θ̂CL\hat{\theta}_{CL}. This version is available via the option se_method = "unadjusted", and is the version implemented in the RobinCar package.

Note that the results will differ only very slightly. Both resulting standard errors are consistent and valid. One motivation for the unadjusted version is that it is guaranteed to be smaller than the standard error of the unadjusted log hazard ratio, due to the fact that the correction term is always positive.

This score function and the variance estimator are implemented in the internal function RobinCar2:::h_lr_score_cov().

Covariate adjusted and stratified analysis

Finally, using both covariate adjustment and stratification, following Section S2.3 in Ye, Shao, and Yi (2023), the score function is given by:

UCSL(θ)=USL(θ)1nzi:Zi=z{Ii(XiX¯z)β̂1(θSL)(1Ii)(XiX¯z)β̂0(θSL)}, U_{CSL}(\theta) = U_{SL}(\theta) - \frac{1}{n} \sum_{z} \sum_{i: Z_{i} = z} \{ I_{i}(X_{i} - \overline{X}_{z})^{\top} \hat{\beta}_{1}(\theta_{SL}) - (1 - I_{i})(X_{i} - \overline{X}_{z})^{\top} \hat{\beta}_{0}(\theta_{SL}) \},

which shows us that it is based on the score function USL(θ)U_{SL}(\theta) from the stratified but not covariate-adjusted log-rank test. The correction term is added up over the strata zz, but based on overall regression coefficients β̂0(θSL)\hat{\beta}_{0}(\theta_{SL}) and β̂1(θSL)\hat{\beta}_{1}(\theta_{SL}).

Similarly as for the unstratified covariate adjusted log-rank test, the derived outcomes Ozik(θ)O_{zik}(\theta) in stratum zz are defined as:

Ozik(θ)=0τ{exp(θ)Y¯z1(t)}(1k){Y¯z0(t)}kY¯zθ(t){dNik(t)Yik(t)exp(θ)dN¯z(t)Y¯zθ(t)}=j(z){exp(θ)Y¯z1(tj)}(1k){Y¯z0(tj)}kY¯zθ(tj){δi𝕀(tj=ti)Yik(tj)exp(θ)mz(tj)/nY¯zθ(tj)} \begin{align*} O_{zik}(\theta) &= \int_{0}^{\tau} \frac{ \{\exp(\theta) \overline{Y}_{z1}(t)\}^{(1-k)} \{\overline{Y}_{z0}(t)\}^{k} }{\overline{Y}_{z\theta}(t)} \left\{ dN_{ik}(t) - \frac{Y_{ik}(t) \exp(\theta) d\overline{N}_{z}(t)} {\overline{Y}_{z\theta}(t)} \right\} \\ &= \sum_{j(z)} \frac{ \{\exp(\theta) \overline{Y}_{z1}(t_{j})\}^{(1-k)} \{\overline{Y}_{z0}(t_{j})\}^{k} }{\overline{Y}_{z\theta}(t_{j})} \left\{ \delta_{i} \mathbb{I}(t_{j} = t_{i}) - \frac{Y_{ik}(t_{j}) \exp(\theta) m_{z}(t_{j}) / n} {\overline{Y}_{z\theta}(t_{j})} \right\} \end{align*}

where mz(tj)=i:Zi=zδi𝕀(tj=ti)m_{z}(t_{j}) = \sum_{i: Z_{i} = z} \delta_{i} \mathbb{I}(t_{j} = t_{i}) is the number of events in stratum zz at the unique event time tjt_{j}, and we further abbreviated

Yzθ(t)=Y¯z0(t)+exp(θ)Y¯z1(t). Y_{z\theta}(t) = \overline{Y}_{z0}(t) + \exp(\theta) \overline{Y}_{z1}(t).

Now as a second step, given the derived outcomes Ozik(θ)O_{zik}(\theta), the coefficients estimate β̂k(θ)\hat{\beta}_{k}(\theta) for treatment arm k=0,1k = 0, 1 is defined as:

β̂k(θ)={zi:Ii=k,Zi=z(XiX¯zk)(XiX¯zk)}1zi:Ii=k,Zi=z(XiX¯zk)Ozik(θ). \hat{\beta}_{k}(\theta) = \left\{ \sum_{z} \sum_{i: I_{i} = k, Z_{i} = z} (X_{i} - \overline{X}_{zk}) (X_{i} - \overline{X}_{zk})^{\top} \right\}^{-1} \sum_{z} \sum_{i: I_{i} = k, Z_{i} = z} (X_{i} - \overline{X}_{zk}) O_{zik}(\theta).

So we see that within each treatment group and within each stratum, the covariates are centered separately, and the required cross-products are added from all strata. Finally the coefficient estimates are obtained by solving the least squares problem with the derived outcomes Ozik(θ)O_{zik}(\theta) as responses.

Finally, the variance estimator is defined very similarly to the covariate adjusted but unstratified log-rank test:

σCSL2(θ)=σSL2(θ)π̂(1π̂)(β̂0(θSL)+β̂1(θSL)){znz/nΣ̂X|z}(β̂0(θSL)+β̂1(θSL)) \sigma_{CSL}^{2}(\theta) = \sigma_{SL}^{2}(\theta) - \hat{\pi}(1 - \hat{\pi}) (\hat{\beta}_{0}(\theta_{SL}) + \hat{\beta}_{1}(\theta_{SL}))^{\top} \left\{\sum_{z} n_z / n \hat{\Sigma}_{X\vert z}\right\} (\hat{\beta}_{0}(\theta_{SL}) + \hat{\beta}_{1}(\theta_{SL}))

where nzn_z is the number of patients in stratum zz and Σ̂X|z\hat{\Sigma}_{X\vert z} is the sample covariance matrix of the original covariates XiX_{i} within stratum zz.

In the same way as for the unstratified case, when estimating the covariate adjusted stratified log hazard ratio θ̂CSL\hat{\theta}_{CSL}, the regression coefficients β̂0(θ̂SL)\hat{\beta}_{0}(\hat{\theta}_{SL}) and β̂1(θ̂SL)\hat{\beta}_{1}(\hat{\theta}_{SL}) are fixed at the unadjusted log hazard ratio θ̂SL\hat{\theta}_{SL} in this expression for the standard error of θ̂CSL\hat{\theta}_{CSL}:

σSL2(θ̂CSL)π̂(1π̂)(β̂0(θ̂SL)+β̂1(θ̂SL)){znz/nΣ̂X|z}(β̂0(θ̂SL)+β̂1(θ̂SL))nσSL2(θ̂CSL) \frac{\sqrt{ \sigma_{SL}^{2}(\hat{\theta}_{CSL}) - \hat{\pi}(1 - \hat{\pi}) (\hat{\beta}_{0}(\hat{\theta}_{SL}) + \hat{\beta}_{1}(\hat{\theta}_{SL}))^{\top} \{\sum_{z} n_z / n \hat{\Sigma}_{X\vert z}\} (\hat{\beta}_{0}(\hat{\theta}_{SL}) + \hat{\beta}_{1}(\hat{\theta}_{SL})) }}{ \sqrt{n} \sigma_{SL}^{2}(\hat{\theta}_{CSL}) }

This is the published version of the variance estimator when used for estimating the standard error of θ̂CSL\hat{\theta}_{CSL} via σSL2(θ̂CSL)\sigma_{SL}^{2}(\hat{\theta}_{CSL}), which is the default in RobinCar2. However, there is also an alternative version which we call “unadjusted” where σSL2(θ̂CSL)\sigma_{SL}^{2}(\hat{\theta}_{CSL}) on the right hand side of the definition is replaced by σSL2(θ̂SL)\sigma_{SL}^{2}(\hat{\theta}_{SL}). That is, the log-rank score variance estimator is evaluated at the unadjusted log hazard ratio estimate θ̂SL\hat{\theta}_{SL}, rather than at the covariate adjusted log hazard ratio estimate θ̂CSL\hat{\theta}_{CSL}. This version is available via the option se_method = "unadjusted", and is the version implemented in the RobinCar package.

Note that the results will differ only very slightly. Both resulting standard errors are consistent and valid. One motivation for the unadjusted version is that it is guaranteed to be smaller than the standard error of the unadjusted log hazard ratio, due to the fact that the correction term is always positive.

This score function and the variance estimator are implemented in the internal function RobinCar2:::h_lr_score_strat_cov().

References

Ye, Ting, Jun Shao, and Yanyao Yi. 2023. “Covariate-Adjusted Log-Rank Test: Guaranteed Efficiency Gain and Universal Applicability.” Biometrika 111 (2): 691–705. https://doi.org/10.1093/biomet/asad045.