Here we describe the details of the calculations for the Kenward-Roger degrees of freedom and the adjusted covariance matrix of the coefficients.
Model definition
The model definition is the same as what we have in Details of the model fitting in
mmrm
. We are using the same notations.
Linear model
For each subject we observe a vector and given a design matrix and a corresponding coefficient vector we assume that the observations are multivariate normal distributed: where the covariance matrix is derived by subsetting the overall covariance matrix appropriately by where the subsetting matrix contains in each of its columns contains a single 1 indicating which overall time point is matching . is the diagonal weight matrix.
Conditional on the design matrices , the coefficient vector and the covariance matrix we assume that the observations are independent between the subjects.
We can write the linear model for all subjects together as where combines all subject specific observations vectors such that we have in total observations, combines all subject specific design matrices and has a multivariate normal distribution where is block-diagonal containing the subject specific covariance matrices on the diagonal and 0 in the remaining entries.
Mathematical Details of Kenward-Roger method
The mathematical derivation of the Kenward-Roger method is based on
the Taylor expansion of the obtained covariance matrix of
to get a more accurate estimate for it. All these derivations are based
on the restricted maximum likelihood. Following the same notation, the
covariance matrix,
can be represented as a function of covariance matrix parameters
,
i.e. .
Here after model fitting with mmrm
, we obtain the estimate
,
where
is the asymptotic covariance matrix of
.
However, Kackar and Harville (1984)
suggests that although the
is unbiased for
,
the covariance matrix,
can be biased. They showed that the variability of
can be partitioned into two components,
where is the variance-covariance matrix of the asymptotic distribution of as as defined above, and represents the amount to which the asymptotic variance-covariance matrix underestimates .
Based on a Taylor series expansion around , can be approximated by
where
is the inverse of the Hessian matrix of the log-likelihood function of evaluated at the estimate , i.e. the observed Fisher Information matrix as a consistent estimator of the variance-covariance matrix of .
Again, based on a Taylor series expansion about , Kenward and Roger (1997) show that Ignoring the possible bias in , Using previously defined notations, this can be further written as where
substituting and back to the , we have
where replaces in the right-hand side.
Please note that, if we ignore , the second-order derivatives, we will get a different estimate of adjusted covariance matrix, and we call this the linear Kenward-Roger approximation.
Special Considerations for mmrm models
In mmrm models, is a block-diagonal matrix, hence we can calculate , and for each subject and add them up.
Derivative of the overall covariance matrix
The derivative of the overall covariance matrix with respect to the variance parameters can be calculated through the derivatives of the Cholesky factor, and hence obtained through automatic differentiation, following matrix identities calculations.
Derivative of the
The derivatives of can be calculated through
$$ \frac{\partial{\Sigma\Sigma^{-1}}}{\partial{\theta_h}}\\ = \frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1} + \Sigma\frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} \\ = 0 $$
Subjects with missed visits
If a subject do not have all visits, the corresponding covariance matrix can be represented as
and the derivatives can be obtained through
The derivative of the , can be calculated through and using the above.
Scenario under group specific covariance estimates
When fitting grouped mmrm
models, the covariance matrix
for subject i of group
,
can be written as $$
\Sigma_i = S_i^\top \Sigma_{g(i)} S_i$.
$$ Assume there are
groups, the number of parameters is increased by
times. With the fact that for each group, the corresponding
will not affect other parts, we will have block-diagonal
,
and
matrices, where the blocks are given by:
Use to denote the block diagonal part for group , we have
Similarly for .
Scenario under weighted mmrm
Under weights mmrm model, the covariance matrix for subject , can be represented as
Where is a diagonal matrix of the weights. Then, when deriving , and , there are no mathematical differences as they are constant, and having in addition to does not change the algorithms and we can simply multiply the formulas with , similarly as above for the subsetting matrix.
Inference
Suppose we are testing the linear combination of , with , we can use the following F-statistic and follows exact distribution.
When we have only one coefficient to test, then is a column vector containing a single inside. We can still follow the same calculations as for the multi-dimensional case. This recovers the degrees of freedom results of the Satterthwaite method.
and can be calculated through
Parameterization methods and Kenward-Roger
While the Kenward-Roger adjusted covariance matrix is adopting a Taylor series to approximate the true value, the choices of parameterization can change the result. In a simple example of unstructured covariance structure, in our current approach, where the parameters are elements of the Cholesky factor of (see parameterization), the second-order derivatives of over our parameters, are non-zero matrices. However, if we use the elements of as our parameters, then the second-order derivatives are zero matrices. However, the differences can be small and will not affect the inference. If you would like to match SAS results for the unstructured covariance model, you can use the linear Kenward-Roger approximation.
Implementations in mmrm
In package mmrm
, we have implemented Kenward-Roger
calculations based on the previous sections. Specially, for the
first-order and second-order derivatives, we use automatic
differentiation to obtain the results easily for non-spatial covariance
structure. For spatial covariance structure, we derive the exact
results.
Spatial Exponential Derivatives
For spatial exponential covariance structure, we have
where is the distance between time point and time point .
So the first-order derivatives can be written as:
$$ \frac{\partial{\Sigma_{ij}}}{\partial\theta_1} = \frac{\partial\sigma}{\partial\theta_1} \rho^{d_{ij}}\\ = e^{\theta_1}\rho^{d_{ij}} \\ = \Sigma_{ij} $$
$$ \frac{\partial{\Sigma_{ij}}}{\partial\theta_2} = \sigma\frac{\partial{\rho^{d_{ij}}}}{\partial\theta_2} \\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\frac{\partial\rho}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho) \\ = \sigma \rho^{d_{ij}} {d_{ij}} (1-\rho) $$
Second-order derivatives can be written as:
$$ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_1}\\ = \frac{\partial\Sigma_{ij}}{\partial\theta_1}\\ = \Sigma_{ij} $$
$$ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_2} = \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_1} \\ = \frac{\partial\Sigma_{ij}}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho)\\ = \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho) $$
$$ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_2}\\ = \frac{\partial{\sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)}}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)(d_{ij} (1-\rho) - \rho) $$