Skip to contents

Here we describe the details of the calculations for the Kenward-Roger degrees of freedom and the adjusted covariance matrix of the coefficients.

Model definition

The model definition is the same as what we have in Details of the model fitting in mmrm. We are using the same notations.

Linear model

For each subject ii we observe a vector Yi=(yi1,,yimi)mi Y_i = (y_{i1}, \dotsc, y_{im_i})^\top \in \mathbb{R}^{m_i} and given a design matrix Ximi×p X_i \in \mathbb{R}^{m_i \times p} and a corresponding coefficient vector βp\beta \in \mathbb{R}^{p} we assume that the observations are multivariate normal distributed: YiN(Xiβ,Σi) Y_i \sim N(X_i\beta, \Sigma_i) where the covariance matrix Σimi×mi\Sigma_i \in \mathbb{R}^{m_i \times m_i} is derived by subsetting the overall covariance matrix Σm×m\Sigma \in \mathbb{R}^{m \times m} appropriately by Σi=Gi1/2SiΣSiGi1/2 \Sigma_i = G_i^{-1/2} S_i^\top \Sigma S_i G_i^{-1/2} where the subsetting matrix Si{0,1}m×miS_i \in \{0, 1\}^{m \times m_i} contains in each of its mim_i columns contains a single 1 indicating which overall time point is matching tiht_{ih}. Gi>0mi×miG_i \in \mathbb{R}_{\gt 0}^{m_i \times m_i} is the diagonal weight matrix.

Conditional on the design matrices XiX_i, the coefficient vector β\beta and the covariance matrix Σ\Sigma we assume that the observations are independent between the subjects.

We can write the linear model for all subjects together as Y=Xβ+ϵ Y = X\beta + \epsilon where YNY \in \mathbb{R}^N combines all subject specific observations vectors YiY_i such that we have in total N=i=1nmiN = \sum_{i = 1}^{n}{m_i} observations, XN×pX \in \mathbb{R}^{N \times p} combines all subject specific design matrices and ϵN\epsilon \in \mathbb{R}^N has a multivariate normal distribution ϵN(0,Ω) \epsilon \sim N(0, \Omega) where ΩN×N\Omega \in \mathbb{R}^{N \times N} is block-diagonal containing the subject specific Σi\Sigma_i covariance matrices on the diagonal and 0 in the remaining entries.

Mathematical Details of Kenward-Roger method

The mathematical derivation of the Kenward-Roger method is based on the Taylor expansion of the obtained covariance matrix of β̂\hat\beta to get a more accurate estimate for it. All these derivations are based on the restricted maximum likelihood. Following the same notation, the covariance matrix, Ω\Omega can be represented as a function of covariance matrix parameters θ=(θ1,,θk)\theta = (\theta_1, \dotsc, \theta_k)^\top, i.e. Ω(θ)\Omega(\theta). Here after model fitting with mmrm, we obtain the estimate β̂=Φ(θ̂)XΩ(θ̂)1Y\hat\beta = \Phi(\hat\theta)X^\top\Omega(\hat\theta)^{-1}Y, where Φ(θ)={XΩ(θ)1X}1\Phi(\theta) = \left\{X^\top \Omega(\theta)^{-1} X\right\} ^{-1} is the asymptotic covariance matrix of β̂\hat\beta. However, Kackar and Harville (1984) suggests that although the β̂\hat\beta is unbiased for β\beta, the covariance matrix, Φ̂={XΩ̂X}1\hat\Phi = \left\{X^\top \hat\Omega X\right\}^{-1} can be biased. They showed that the variability of β̂\hat\beta can be partitioned into two components,

ΦA=Φ+Λ \Phi_A = \Phi + \Lambda

where Φ\Phi is the variance-covariance matrix of the asymptotic distribution of β̂\hat\beta as nn\rightarrow \infty as defined above, and Λ\Lambda represents the amount to which the asymptotic variance-covariance matrix underestimates ΦA\Phi_A.

Based on a Taylor series expansion around θ\theta, Λ\Lambda can be approximated by

ΛΦ{h=1kj=1kWhj(QhjPhΦPj)}Φ \Lambda \simeq \Phi \left\{\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}(Q_{hj} - P_h \Phi P_j)} }\right\} \Phi where Ph=XΩ1θhX P_h = X^\top \frac{\partial{\Omega^{-1}}}{\partial \theta_h} X Qhj=XΩ1θhΩΩ1θjX Q_{hj} = X^\top \frac{\partial{\Omega^{-1}}}{\partial \theta_h} \Omega \frac{\partial{\Omega^{-1}}}{\partial \theta_j} X

WW is the inverse of the Hessian matrix of the log-likelihood function of θ\theta evaluated at the estimate θ̂\hat\theta, i.e. the observed Fisher Information matrix as a consistent estimator of the variance-covariance matrix of θ̂\hat\theta.

Again, based on a Taylor series expansion about θ\theta, Kenward and Roger (1997) show that Φ̂Φ+h=1k(θ̂hθh)Φθh+12h=1kj=1k(θ̂hθh)(θ̂jθj)2Φθhθj \hat\Phi \simeq \Phi + \sum_{h=1}^k{(\hat\theta_h - \theta_h)\frac{\partial{\Phi}}{\partial{\theta_h}}} + \frac{1}{2} \sum_{h=1}^k{\sum_{j=1}^k{(\hat\theta_h - \theta_h)(\hat\theta_j - \theta_j)\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}}}} Ignoring the possible bias in θ̂\hat\theta, E(Φ̂)Φ+12h=1kj=1kWhj2Φθhθj E(\hat\Phi) \simeq \Phi + \frac{1}{2} \sum_{h=1}^k{\sum_{j=1}^k{W_{hj}\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}}}} Using previously defined notations, this can be further written as 2Φθhθj=Φ(PhΦPj+PjΦPhQhjQjh+Rhj)Φ \frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}} = \Phi (P_h \Phi P_j + P_j \Phi P_h - Q_{hj} - Q_{jh} + R_{hj}) \Phi where Rhj=XΩ12ΩθhθjΩ1X R_{hj} = X^\top\Omega^{-1}\frac{\partial^2\Omega}{\partial{\theta_h}\partial{\theta_j}} \Omega^{-1} X

substituting Φ\Phi and Λ\Lambda back to the Φ̂A\hat\Phi_A, we have

Φ̂A=Φ̂+2Φ̂{h=1kj=1kWhj(QhjPhΦ̂Pj14Rhj)}Φ̂ \hat\Phi_A = \hat\Phi + 2\hat\Phi \left\{\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}(Q_{hj} - P_h \hat\Phi P_j - \frac{1}{4}R_{hj})} }\right\} \hat\Phi

where Ω(θ̂)\Omega(\hat\theta) replaces Ω(θ)\Omega(\theta) in the right-hand side.

Please note that, if we ignore RhjR_{hj}, the second-order derivatives, we will get a different estimate of adjusted covariance matrix, and we call this the linear Kenward-Roger approximation.

Special Considerations for mmrm models

In mmrm models, Ω\Omega is a block-diagonal matrix, hence we can calculate PP, QQ and RR for each subject and add them up.

Ph=i=1NPih=i=1NXiΣi1θhXi P_h = \sum_{i=1}^{N}{P_{ih}} = \sum_{i=1}^{N}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} X_i}

Qhj=i=1NQihj=i=1NXiΣi1θhΣiΣi1θjXi Q_{hj} = \sum_{i=1}^{N}{Q_{ihj}} = \sum_{i=1}^{N}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} \Sigma_i \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_j} X_i}

Rhj=i=1NRihj=i=1NXiΣi12ΣiθhθjΣi1Xi R_{hj} = \sum_{i=1}^{N}{R_{ihj}} = \sum_{i=1}^{N}{X_i^\top\Sigma_i^{-1}\frac{\partial^2\Sigma_i}{\partial{\theta_h}\partial{\theta_j}} \Sigma_i^{-1} X_i}

Derivative of the overall covariance matrix Σ\Sigma

The derivative of the overall covariance matrix Σ\Sigma with respect to the variance parameters can be calculated through the derivatives of the Cholesky factor, and hence obtained through automatic differentiation, following matrix identities calculations. Σθh=LLθh=LθhL+LLθh \frac{\partial{\Sigma}}{\partial{\theta_h}} = \frac{\partial{LL^\top}}{\partial{\theta_h}} = \frac{\partial{L}}{\partial{\theta_h}}L^\top + L\frac{\partial{L^\top}}{\partial{\theta_h}}

2Σθhθj=2LθhθjL+L2Lθhθj+LθhLTθj+LθjLθh \frac{\partial^2{\Sigma}}{\partial{\theta_h}\partial{\theta_j}} = \frac{\partial^2{L}}{\partial{\theta_h}\partial{\theta_j}}L^\top + L\frac{\partial^2{L^\top}}{\partial{\theta_h}\partial{\theta_j}} + \frac{\partial{L}}{\partial{\theta_h}}\frac{\partial{L^T}}{\partial{\theta_j}} + \frac{\partial{L}}{\partial{\theta_j}}\frac{\partial{L^\top}}{\partial{\theta_h}}

Derivative of the Σ1\Sigma^{-1}

The derivatives of Σ1\Sigma^{-1} can be calculated through

$$ \frac{\partial{\Sigma\Sigma^{-1}}}{\partial{\theta_h}}\\ = \frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1} + \Sigma\frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} \\ = 0 $$ Σ1θh=Σ1ΣθhΣ1 \frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} = - \Sigma^{-1} \frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1}

Subjects with missed visits

If a subject do not have all visits, the corresponding covariance matrix can be represented as Σi=SiΣSi \Sigma_i = S_i^\top \Sigma S_i

and the derivatives can be obtained through

Σiθh=SiΣθhSi \frac{\partial{\Sigma_i}}{\partial{\theta_h}} = S_i^\top \frac{\partial{\Sigma}}{\partial{\theta_h}} S_i

2Σiθhθj=Si2ΣθhθjSi \frac{\partial^2{\Sigma_i}}{\partial{\theta_h}\partial{\theta_j}} = S_i^\top \frac{\partial^2{\Sigma}}{\partial{\theta_h}\partial{\theta_j}} S_i

The derivative of the Σi1\Sigma_i^{-1}, Σi1θh\frac{\partial\Sigma_i^{-1}}{\partial{\theta_h}} can be calculated through Σi1\Sigma_i^{-1} and Σiθh\frac{\partial{\Sigma_i}}{\partial{\theta_h}} using the above.

Scenario under group specific covariance estimates

When fitting grouped mmrm models, the covariance matrix for subject i of group g(i)g(i), can be written as $$ \Sigma_i = S_i^\top \Sigma_{g(i)} S_i$. $$ Assume there are BB groups, the number of parameters is increased by BB times. With the fact that for each group, the corresponding θ\theta will not affect other parts, we will have block-diagonal PP, QQ and RR matrices, where the blocks are given by:

Ph=(Ph,1Ph,B) P_h = \begin{pmatrix} P_{h, 1} & \dots & P_{h, B} \\ \end{pmatrix}

Qhj=(Qhj,1000Qhj,20000Qhj,B) Q_{hj} = \begin{pmatrix} Q_{hj, 1} & 0 & \dots & \dots & 0 \\ 0 & Q_{hj, 2} & 0 & \dots & 0\\ \vdots & & \ddots & & \vdots \\ \vdots & & & \ddots & \vdots \\ 0 & \dots & \dots & 0 & Q_{hj, B} \end{pmatrix}

Rhj=(Rhj,1000Rhj,20000Rhj,B) R_{hj} = \begin{pmatrix} R_{hj, 1} & 0 & \dots & \dots & 0 \\ 0 & R_{hj, 2} & 0 & \dots & 0\\ \vdots & & \ddots & & \vdots \\ \vdots & & & \ddots & \vdots \\ 0 & \dots & \dots & 0 & R_{hj, B} \end{pmatrix}

Use Pj,bP_{j, b} to denote the block diagonal part for group bb, we have Ph,b=g(i)=bPih=g(i)=bXiΣi1θhXi P_{h,b} = \sum_{g(i) = b}{P_{ih}} = \sum_{g(i) = b}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} X_i}

Qhj,b=g(i)=bQihj=g(i)=bXiΣi1θhΣiΣi1θjXi Q_{hj,b} = \sum_{g(i) = b}{Q_{ihj}} = \sum_{g(i) = b}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} \Sigma_i \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_j} X_i}

Rhj,b=g(i)=bRihj=g(i)=bXiΣi12ΣiθhθjΣi1Xi R_{hj,b} = \sum_{g(i) = b}{R_{ihj}} = \sum_{g(i) = b}{X_i^\top\Sigma_i^{-1}\frac{\partial^2\Sigma_i}{\partial{\theta_h}\partial{\theta_j}} \Sigma_i^{-1} X_i}

Similarly for RR.

Scenario under weighted mmrm

Under weights mmrm model, the covariance matrix for subject ii, can be represented as

Σi=Gi1/2SiΣSiGi1/2 \Sigma_i = G_i^{-1/2} S_i^\top \Sigma S_i G_i^{-1/2}

Where GiG_i is a diagonal matrix of the weights. Then, when deriving PP, QQ and RR, there are no mathematical differences as they are constant, and having GiG_i in addition to SiS_i does not change the algorithms and we can simply multiply the formulas with Gi1/2G_i^{-1/2}, similarly as above for the subsetting matrix.

Inference

Suppose we are testing the linear combination of β\beta, CβC\beta with Cc×pC \in \mathbb{R}^{c\times p}, we can use the following F-statistic F=1c(β̂β)C(CΦ̂AC)1C(β̂β) F = \frac{1}{c} (\hat\beta - \beta)^\top C (C^\top \hat\Phi_A C)^{-1} C^\top (\hat\beta - \beta) and F*=λF F^* = \lambda F follows exact Fc,mF_{c,m} distribution.

When we have only one coefficient to test, then CC is a column vector containing a single 11 inside. We can still follow the same calculations as for the multi-dimensional case. This recovers the degrees of freedom results of the Satterthwaite method.

λ\lambda and mm can be calculated through

M=C(CΦC)1C M = C (C^\top \Phi C)^{-1} C^\top

A1=h=1kj=1kWhjtr(MΦPhΦ)tr(MΦPjΦ) A_1 = \sum_{h=1}^k{\sum_{j=1}^k{W_{hj} tr(M \Phi P_h \Phi) tr(M \Phi P_j \Phi)}}

A2=h=1kj=1kWhjtr(MΦPhΦMΦPjΦ) A_2 = \sum_{h=1}^k{\sum_{j=1}^k{W_{hj} tr(M \Phi P_h \Phi M \Phi P_j \Phi)}}

B=12c(A1+6A2) B = \frac{1}{2c}(A_1 + 6A_2)

g=(c+1)A1(c+4)A2(c+2)A2 g = \frac{(c+1)A_1 - (c+4)A_2}{(c+2)A_2}

c1=g3c+2(1g) c_1 = \frac{g}{3c+2(1-g)}

c2=cg3c+2(1g) c_2 = \frac{c-g}{3c+2(1-g)}

c3=c+2g3c+2(1g) c_3 = \frac{c+2-g}{3c+2(1-g)} E*={1A2c}1E^*={\left\{1-\frac{A_2}{c}\right\}}^{-1}V*=2c{1+c1B(1c2B)2(1c3B)}V^*=\frac{2}{c}{\left\{\frac{1+c_1 B}{(1-c_2 B)^2(1-c_3 B)}\right\}}

ρ=V*2(E*)2\rho = \frac{V^{*}}{2(E^*)^2}

m=4+c+2cρ1m = 4 + \frac{c+2}{c\rho - 1}λ=mE*(m2)\lambda = \frac{m}{E^*(m-2)}

Parameterization methods and Kenward-Roger

While the Kenward-Roger adjusted covariance matrix is adopting a Taylor series to approximate the true value, the choices of parameterization can change the result. In a simple example of unstructured covariance structure, in our current approach, where the parameters are elements of the Cholesky factor of Σ\Sigma (see parameterization), the second-order derivatives of Σ\Sigma over our parameters, are non-zero matrices. However, if we use the elements of Σ\Sigma as our parameters, then the second-order derivatives are zero matrices. However, the differences can be small and will not affect the inference. If you would like to match SAS results for the unstructured covariance model, you can use the linear Kenward-Roger approximation.

Implementations in mmrm

In package mmrm, we have implemented Kenward-Roger calculations based on the previous sections. Specially, for the first-order and second-order derivatives, we use automatic differentiation to obtain the results easily for non-spatial covariance structure. For spatial covariance structure, we derive the exact results.

Spatial Exponential Derivatives

For spatial exponential covariance structure, we have

θ=(θ1,θ2)\theta = (\theta_1,\theta_2)σ=eθ1\sigma = e^{\theta_1}ρ=eθ21+eθ2\rho = \frac{e^{\theta_2}}{1 + e^{\theta_2}}

Σij=σρdij\Sigma_{ij} = \sigma \rho^{d_{ij}} where dijd_{ij} is the distance between time point ii and time point jj.

So the first-order derivatives can be written as:

$$ \frac{\partial{\Sigma_{ij}}}{\partial\theta_1} = \frac{\partial\sigma}{\partial\theta_1} \rho^{d_{ij}}\\ = e^{\theta_1}\rho^{d_{ij}} \\ = \Sigma_{ij} $$

$$ \frac{\partial{\Sigma_{ij}}}{\partial\theta_2} = \sigma\frac{\partial{\rho^{d_{ij}}}}{\partial\theta_2} \\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\frac{\partial\rho}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho) \\ = \sigma \rho^{d_{ij}} {d_{ij}} (1-\rho) $$

Second-order derivatives can be written as:

$$ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_1}\\ = \frac{\partial\Sigma_{ij}}{\partial\theta_1}\\ = \Sigma_{ij} $$

$$ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_2} = \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_1} \\ = \frac{\partial\Sigma_{ij}}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho)\\ = \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho) $$

$$ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_2}\\ = \frac{\partial{\sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)}}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)(d_{ij} (1-\rho) - \rho) $$

References

Kackar RN, Harville DA (1984). “Approximations for Standard Errors of Estimators of Fixed and Random Effects in Mixed Linear Models.” Journal of the American Statistical Association, 79(388), 853–862. https://doi.org/10.1080/01621459.1984.10477102.
Kenward MG, Roger JH (1997). “Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood.” Biometrics, 53(3), 983–997. https://doi.org/10.2307/2533558.