Survival Analysis with RobinCar2
Source:vignettes/articles/robincar-survival.Rmd
robincar-survival.Rmd
RobinCar2
implements survival analysis methods for
testing treatment effects with log-rank tests and estimating hazard
ratios using score functions. This vignette provides an overview of the
underlying algorithms.
Overview of Survival Analysis Methods
Survival analysis is performed with the robin_surv
function, and the syntax is:
robin_surv(
Surv(time, event) ~ group * strata + covariates,
treatment = group ~ strata,
data = df
)
Depending on the choice of strata
and
covariates
, the following four methods are available:
- Standard log-rank test without strata or covariates by omitting
strata
andcovariates
- Stratified log-rank test by specifying
strata
but nocovariates
- Covariate adjusted log-rank test by specifying
covariates
but nostrata
- Covariate adjusted and stratified log-rank test by specifying both
strata
andcovariates
Let’s go through these in a simple example. We start with the standard log-rank test:
library(RobinCar2)
robin_surv(
Surv(time, status) ~ sex,
treatment = sex ~ 1,
data = surv_data
)
#> Model : Surv(time, status) ~ sex
#> Randomization: sex ~ 1 ( Simple )
#>
#> Contrast : Hazard ratio
#>
#> Estimate Std.Err Z Value Pr(>|z|)
#> Male v.s. Female 0.53343 0.16727 3.189 0.001428 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Test : Log-Rank
#>
#> Test Stat. Pr(>|z|)
#> Male v.s. Female 3.2135 0.001311 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can perform the stratified log-rank test by adding a
strata
argument:
robin_surv(
Surv(time, status) ~ sex * strata,
treatment = sex ~ strata,
data = surv_data
)
#> Model : Surv(time, status) ~ sex * strata
#> Randomization: sex ~ strata ( Simple )
#>
#> Contrast : Hazard ratio
#>
#> Estimate Std.Err Z Value Pr(>|z|)
#> Male v.s. Female 0.55482 0.17063 3.2516 0.001147 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Test : Log-Rank
#>
#> Test Stat. Pr(>|z|)
#> Male v.s. Female 3.2856 0.001018 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We could also just use covariate adjustment:
robin_surv(
Surv(time, status) ~ sex + age + meal.cal,
treatment = sex ~ 1,
data = surv_data
)
#> Model : Surv(time, status) ~ sex + age + meal.cal
#> Randomization: sex ~ 1 ( Simple )
#>
#> Contrast : Hazard ratio
#>
#> Estimate Std.Err Z Value Pr(>|z|)
#> Male v.s. Female 0.47686 0.18608 2.5626 0.01039 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Test : Log-Rank
#>
#> Test Stat. Pr(>|z|)
#> Male v.s. Female 2.6858 0.007236 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or we combine both stratification and covariate adjustment:
robin_surv(
Surv(time, status) ~ sex * strata + age + meal.cal,
treatment = sex ~ strata,
data = surv_data
)
#> Model : Surv(time, status) ~ sex * strata + age + meal.cal
#> Randomization: sex ~ strata ( Simple )
#>
#> Contrast : Hazard ratio
#>
#> Estimate Std.Err Z Value Pr(>|z|)
#> Male v.s. Female 0.55219 0.19133 2.8861 0.0039 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Test : Log-Rank
#>
#> Test Stat. Pr(>|z|)
#> Male v.s. Female 2.9496 0.003181 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Details of the Methods
The four methods introduced above are implemented based on the corresponding score functions with corresponding variance estimators , where is the log hazard ratio.
The log-rank test statistic is defined as
and it asymptotically follows a standard normal distribution under the null hypothesis of no treatment effect, i.e. .
For estimation of , we find the root of the score function such that . The standard error of is derived from the variance estimator , with specific formulas depending on the method used and detailed below.
Standard analysis without strata or covariates
Following Section 2 in Ye, Shao, and Yi (2023), the score function from the partial likelihood under the Cox proportional hazards model is given by:
where we switched from the summation over all patients to the summation over all patients with events where . The treatment indicator is if patient is in treatment group 1, and if patient is in treatment group 0. The proportions of patients at risk at time , per arm is given by
where is the potentially right-censored time of patient and is the indicator function.
The corresponding variance estimate is given by this adapted version using the correction factor which accounts for ties in the survival times in the log-rank test.
where we abbreviated
and similarly
Let be the binary event indicator for patient , and let the number of events at time be denoted by . We define then the correction factor as follows:
Furthermore, we note that this correction factor is only used when calculating the log-rank test statistic, which is given at by
However, when we estimate the log hazard ratio by finding the root of the score function such that , we set .
The standard error of is given by
This score function and the variance estimator are implemented in the
internal function
RobinCar2:::h_lr_score_no_strata_no_cov()
.
Stratified analysis without covariates
Following Section S2.3 in Ye, Shao, and Yi (2023), the score function for the stratified log-rank test is given by:
where the last sum is over the unique event times in stratum , therefore written here simply as indices . The proportions of patients at risk at time , per arm in stratum are denoted by . Importantly, the proportion is with regards to the total number of patients across all strata, not just within stratum .
So we can see that the score function is a sum over strata of the standard log-rank score function. In the same way, the variance estimator is the sum over strata of the standard log-rank variance estimator .
The standard error of the correspondingly estimated log hazard ratio is given by
One small important detail is that the number of patients
in the denominator is always the total number of patients across all
strata, not the number of patients in stratum
.
This is the reason why the standard log-rank score function
RobinCar2:::h_lr_score_no_strata_no_cov()
has an argument
n
, which is used here by the stratified log-rank score
function RobinCar2:::h_lr_score_strat()
to pass on the
total number of patients.
Covariate adjusted analysis without strata
A little bit more complex is the calculation of the score function for the covariate adjusted log-rank test, which following Section 3 in Ye, Shao, and Yi (2023) is given by:
Here we have to explain a lot of notation. The score function is based on the standard log-rank score function , but we subtract a correction term that accounts for the covariates of patient . The covariates are assumed to be centered, i.e. is the mean of the covariates across all patients.
Interestingly on the right hand side we see also . This is supposed to be the log hazard ratio estimated by the standard log-rank analysis, i.e. , when finding the root of to estimate the log hazard ratio . The reason for this is that the asymptotic guaranteed efficiency gain of the covariate adjusted log hazard ratio estimator is only proven using this version of the score function, see Section 3 in Ye, Shao, and Yi (2023). On the other hand, for calculating the covariate adjusted log-rank score test statistic, we set both and .
How are the regression coefficients and in the correction term estimated given a log hazard ratio ?
First, the so-called derived outcomes for patients and arms need to be calculated. Let be the indicator whether patient in treatment arm is at risk at time , and let be the binary event indicator for patient . Then the derived outcomes are defined as:
where we can see that the count measure integral was again replaced by a sum over the unique event times .
Second, given the derived outcomes , the coefficients estimate for treatment arm is defined as the solution to the least squares problem with the centered covariates (therefore no constant intercept column in ) and responses .
Finally, we can define the variance estimator as follows:
where is the proportion of patients in treatment arm 1, and is the sample covariance matrix of the centered covariates (where the average is now taken across all patients, not just within treatment arm ).
Similarly as the for the score function defined above, the right hand side of the variance estimator is evaluated at and when calculating the covariate adjusted log-rank test statistic.
However, when estimating the covariate adjusted log hazard ratio , the regression coefficients and are fixed at the unadjusted log hazard ratio in this expression for the standard error of :
This is the published version of the variance estimator when used for
estimating the standard error of
via
,
which is the default in RobinCar2
. However, there is also
an alternative version which we call “unadjusted” where
on the right hand side of the definition is replaced by
.
That is, the log-rank score variance estimator is evaluated at the
unadjusted log hazard ratio estimate
,
rather than at the covariate adjusted log hazard ratio estimate
.
This version is available via the option
se_method = "unadjusted"
, and is the version implemented in
the RobinCar
package.
Note that the results will differ only very slightly. Both resulting standard errors are consistent and valid. One motivation for the unadjusted version is that it is guaranteed to be smaller than the standard error of the unadjusted log hazard ratio, due to the fact that the correction term is always positive.
This score function and the variance estimator are implemented in the
internal function RobinCar2:::h_lr_score_cov()
.
Covariate adjusted and stratified analysis
Finally, using both covariate adjustment and stratification, following Section S2.3 in Ye, Shao, and Yi (2023), the score function is given by:
which shows us that it is based on the score function from the stratified but not covariate-adjusted log-rank test. The correction term is added up over the strata , but based on overall regression coefficients and .
Similarly as for the unstratified covariate adjusted log-rank test, the derived outcomes in stratum are defined as:
where is the number of events in stratum at the unique event time , and we further abbreviated
Now as a second step, given the derived outcomes , the coefficients estimate for treatment arm is defined as:
So we see that within each treatment group and within each stratum, the covariates are centered separately, and the required cross-products are added from all strata. Finally the coefficient estimates are obtained by solving the least squares problem with the derived outcomes as responses.
Finally, the variance estimator is defined very similarly to the covariate adjusted but unstratified log-rank test:
where is the number of patients in stratum and is the sample covariance matrix of the original covariates within stratum .
In the same way as for the unstratified case, when estimating the covariate adjusted stratified log hazard ratio , the regression coefficients and are fixed at the unadjusted log hazard ratio in this expression for the standard error of :
This is the published version of the variance estimator when used for
estimating the standard error of
via
,
which is the default in RobinCar2
. However, there is also
an alternative version which we call “unadjusted” where
on the right hand side of the definition is replaced by
.
That is, the log-rank score variance estimator is evaluated at the
unadjusted log hazard ratio estimate
,
rather than at the covariate adjusted log hazard ratio estimate
.
This version is available via the option
se_method = "unadjusted"
, and is the version implemented in
the RobinCar
package.
Note that the results will differ only very slightly. Both resulting standard errors are consistent and valid. One motivation for the unadjusted version is that it is guaranteed to be smaller than the standard error of the unadjusted log hazard ratio, due to the fact that the correction term is always positive.
This score function and the variance estimator are implemented in the
internal function RobinCar2:::h_lr_score_strat_cov()
.