Skip to contents

Here we describe the details of the Satterthwaite degrees of freedom calculations.

Satterthwaite degrees of freedom for asymptotic covariance

In Christensen (2018) the Satterthwaite degrees of freedom approximation based on normal models is well detailed and the computational approach for models fitted with the lme4 package is explained. We follow the algorithm and explain the implementation in this mmrm package. The model definition is the same as in Details of the model fitting in mmrm.

We are also using the same notation as in the Details of the Kenward-Roger calculations. In particular, we assume we have a contrast matrix Cc×pC \in \mathbb{R}^{c\times p} with which we want to test the linear hypothesis Cβ=0C\beta = 0. Further, W(θ̂)W(\hat\theta) is the inverse of the Hessian matrix of the log-likelihood function of θ\theta evaluated at the estimate θ̂\hat\theta, i.e. the observed Fisher Information matrix as a consistent estimator of the variance-covariance matrix of θ̂\hat\theta. Φ(θ)={XΩ(θ)1X}1\Phi(\theta) = \left\{X^\top \Omega(\theta)^{-1} X\right\} ^{-1} is the asymptotic covariance matrix of β̂\hat\beta.

One-dimensional contrast

We start with the case of a one-dimensional contrast, i.e. c=1c = 1. The Satterthwaite adjusted degrees of freedom for the corresponding t-test are then defined as: ν̂(θ̂)=2f(θ̂)2f(θ̂)W(θ̂)f(θ̂) \hat\nu(\hat\theta) = \frac{2f(\hat\theta)^2}{f{'}(\hat\theta)^\top W(\hat\theta) f{'}(\hat\theta)} where f(θ̂)=CΦ(θ̂)Cf(\hat\theta) = C \Phi(\hat\theta) C^\top is the scalar in the numerator and we can identify it as the variance estimate for the estimated scalar contrast Cβ̂C\hat\beta. The computational challenge is essentially to evaluate the denominator in the expression for ν̂(θ̂)\hat\nu(\hat\theta), which amounts to computing the kk-dimensional gradient f(θ̂)f{'}(\hat\theta) of f(θ)f(\theta) (for the given contrast matrix CC) at the estimate θ̂\hat\theta. We already have the variance-covariance matrix W(θ̂)W(\hat\theta) of the variance parameter vector θ\theta from the model fitting.

Jacobian approach

However, if we proceeded in a naive way here, we would need to recompute the denominator again for every chosen CC. This would be slow, e.g. when changing CC every time we want to test a single coefficient within β\beta. It is better to instead evaluate the gradient of the matrix valued function Φ(θ)\Phi(\theta), which is therefore the Jacobian, with regards to θ\theta, 𝒥(θ)=θΦ(θ)\mathcal{J}(\theta) = \nabla_\theta \Phi(\theta). Imagine 𝒥(θ)\mathcal{J}(\theta) as the the 3-dimensional array with kk faces of size p×pp\times p. Left and right multiplying each face by CC and CC^\top respectively leads to the kk-dimensional gradient f(θ)=C𝒥(θ)Cf'(\theta) = C \mathcal{J}(\theta) C^\top. Therefore for each new contrast CC we just need to perform simple matrix multiplications, which is fast (see h_gradient() where this is implemented). Thus, having computed the estimated Jacobian 𝒥(θ̂)\mathcal{J}(\hat\theta), it is only a matter of putting the different quantities together to compute the estimate of the denominator degrees of freedom, ν̂(θ̂)\hat\nu(\hat\theta).

Jacobian calculation

Currently, we evaluate the gradient of Φ(θ)\Phi(\theta) through function h_jac_list(). It uses automatic differentiation provided in TMB.

We first obtain the Jacobian of the inverse of the covariance matrix of coefficient (Φ(θ)1\Phi(\theta)^{-1}), following the Kenward-Roger calculations. Please note that we only need PhP_h matrices.

Then, to obtain the Jacobian of the covariance matrix of coefficient, following the algorithm, we use Φ(θ)\Phi(\theta) estimated in the fit to obtain the Jacobian.

The result is a list (of length kk where kk is the dimension of the variance parameter θ\theta) of matrices of p×pp \times p, where pp is the dimension of β\beta.

Multi-dimensional contrast

When c>1c > 1 we are testing multiple contrasts at once. Here an F-statistic F=1c(Cβ̂)(CΦ(θ̂)C)1C(Cβ̂) F = \frac{1}{c} (C\hat\beta)^\top (C \Phi(\hat\theta) C^\top)^{-1} C^\top (C\hat\beta) is calculated, and we are interested in estimating an appropriate denominator degrees of freedom for FF, while assuming cc are the numerator degrees of freedom. Note that only in special cases, such as orthogonal or balanced designs, the F distribution will be exact under the null hypothesis. In general, it is an approximation.

The calculations are described in detail in Christensen (2018), and we don’t repeat them here in detail. The implementation is in h_df_md_sat() and starts with an eigen-decomposition of the asymptotic variance-covariance matrix of the contrast estimate, i.e. CΦ(θ̂)CC \Phi(\hat\theta) C^\top. The F-statistic can be rewritten as a sum of t2t^2 statistics based on these eigen-values. The corresponding random variables are independent (by design because they are derived from the orthogonal eigen-vectors) and essentially have one degree of freedom each. Hence, each of the tt statistics is treated as above in the one-dimensional contrast case, i.e. the denominator degree of freedom is calculated for each of them. Finally, using properties of the F distribution’s expectation, the denominator degree of freedom for the whole F statistic is derived.

Satterthwaite degrees of freedom for empirical covariance

In Bell and McCaffrey (2002) the Satterthwaite degrees of freedom in combination with a sandwich covariance matrix estimator are described.

One-dimensional contrast

For one-dimensional contrast, following the same notation in Details of the model fitting in mmrm and Details of the Kenward-Roger calculations, we have the following derivation. For an estimator of variance with the following term

v=sc(XX)1iXiAiϵiϵiAiXi(XX)1c v = s c^\top(X^\top X)^{-1}\sum_{i}{X_i^\top A_i \epsilon_i \epsilon_i^\top A_i X_i} (X^\top X)^{-1} c

where ss takes the value of nn1\frac{n}{n-1}, 11 or n1n\frac{n-1}{n}, and AiA_i takes IiI_i, (IiHii)12(I_i - H_{ii})^{-\frac{1}{2}}, or (IiHii)1(I_i - H_{ii})^{-1} respectively, cc is a column vector, then vv can be decomposed into the a weighted sum of independent χ12\chi_1^2 distribution, where the weights are the eigenvalues of the n×nn\times n matrix GG with elements Gij=giVgj G_{ij} = g_i^\top V g_j

where

gi=s12(IH)iAiXi(XX)1c g_i = s^{\frac{1}{2}} (I - H)_i^\top A_i X_i (X^\top X)^{-1} c H=X(XX)1X H = X(X^\top X)^{-1}X^\top

(IH)i(I - H)_i corresponds to the rows of subject ii.

So the degrees of freedom can be represented as ν=(iλi)2iλi2 \nu = \frac{(\sum_{i}\lambda_i)^2}{\sum_{i}{\lambda_i^2}}

where λi,i=1,,n\lambda_i, i = 1, \dotsc, n are the eigenvalues of GG. Bell and McCaffrey (2002) also suggests that VV can be chosen as identify matrix, so Gij=gigjG_{ij} = g_i ^\top g_j.

Following Weighted Least Square Estimator, we can transform the original XX into x̃\tilde{x} to use the above equations.

To avoid repeated computation of matrix AiA_i, HH etc for different contrasts, we calculate and cache the following

Gi*=(IH)iAiXi(XX)1 G^\ast_i = (I - H)_i^\top A_i X_i (X^\top X)^{-1} which is a imi×p\sum_i{m_i} \times p matrix. With different contrasts, we need only calculate the following gi=Gi*c g_i = G^\ast_i c to obtain a imi×1\sum_i{m_i} \times 1 matrix, GG can be computed with gig_i.

To obtain the degrees of freedom, and to avoid eigen computation on a large matrix, we can use the following equation

ν=(iλi)2iλi2=tr(G)2ijGij2 \nu = \frac{(\sum_{i}\lambda_i)^2}{\sum_{i}{\lambda_i^2}} = \frac{tr(G)^2}{\sum_{i}{\sum_{j}{G_{ij}^2}}}

The scale parameter is not used throughout the package.

The proof is as following

  1. Proof of tr(AB)=tr(BA) tr(AB) = tr(BA)

Let AA has dimension p×qp\times q, BB has dimension q×pq\times ptr(AB)=i=1p(AB)ii=i=1pj=1qAijBji tr(AB) = \sum_{i=1}^{p}{(AB)_{ii}} = \sum_{i=1}^{p}{\sum_{j=1}^{q}{A_{ij}B_{ji}}}

tr(BA)=i=1q(BA)ii=i=1qj=1pBijAji tr(BA) = \sum_{i=1}^{q}{(BA)_{ii}} = \sum_{i=1}^{q}{\sum_{j=1}^{p}{B_{ij}A_{ji}}}

so tr(AB)=tr(BA)tr(AB) = tr(BA)

  1. Proof of tr(G)=i(λi) tr(G) = \sum_{i}(\lambda_i) and i(λi2)=ijGij2 \sum_{i}(\lambda_i^2) = \sum_{i}{\sum_{j}{G_{ij}^2}} if G=GG = G^\top

Following eigen decomposition, we have G=QΛQ G = Q \Lambda Q^\top where Λ\Lambda is diagonal matrix, QQ is orthogonal matrix.

Using the previous formula that tr(AB)=tr(BA)tr(AB) = tr(BA), we have

tr(G)=tr(QΛQ)=tr(ΛQQ)=tr(Λ)=i(λi) tr(G) = tr(Q \Lambda Q^\top) = tr(\Lambda Q^\top Q) = tr(\Lambda) = \sum_{i}(\lambda_i)

tr(GG)=tr(QΛQQΛQ)=tr(Λ2QQ)=tr(Λ2)=i(λi2) tr(G^\top G) = tr(Q \Lambda Q^\top Q \Lambda Q^\top) = tr(\Lambda^2 Q^\top Q) = tr(\Lambda^2) = \sum_{i}(\lambda_i^2)

and tr(GG)tr(G^\top G) can be further expressed as

tr(GG)=i(GG)ii=ijGijGji=ijGij2 tr(G^\top G) = \sum_{i}{(G^\top G)_{ii}} = \sum_{i}{\sum_{j}{G^\top_{ij}G_{ji}}} = \sum_{i}{\sum_{j}{G_{ij}^2}}

Multi-dimensional contrast

For multi-dimensional contrast we use the same technique for multi-dimensional contrast for asymptotic covariance.

References

Bell RM, McCaffrey DF (2002). “Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples.” Survey Methodology, 28(2), 169–182.
Christensen RHB (2018). Satterthwaite’s Method for Degrees of Freedom in Linear Mixed Models. Retrieved from https://github.com/runehaubo/lmerTestR/blob/35dc5885205d709cdc395b369b08ca2b7273cb78/pkg_notes/Satterthwaite_for_LMMs.pdf