Skip to contents

Weighted Least Square (WLS) Empirical Covariance

Following the notation we have without weights, Bell and McCaffrey (2002) and Pustejovsky and Tipton (2018) suggest

v=sC(XWX)1iXiWiAiϵiϵiAiWiXi(XWX)1C v = s C^\top(X^\top W X)^{-1}\sum_{i}{X_i^\top W_i A_i \epsilon_i \epsilon_i^\top A_i W_i X_i} (X^\top W X)^{-1} C

where AiA_i takes IiI_i, (IiHii)12(I_i - H_{ii})^{-\frac{1}{2}}, or (IiHii)1(I_i - H_{ii})^{-1} is unchanged, but HH is changed

H=X(XWX)1XW H = X (X^\top W X)^{-1} X^\top W

For the degrees of freedom, we have

Gij=giΦgj G_{ij} = g_i^\top \Phi g_j

where

gi=s12(IH)iAiWiXi(XX)1C g_i = s^{\frac{1}{2}} (I - H)_i^\top A_i W_i X_i (X^\top X)^{-1} C

Difference of Implementations

Comparing the previous section with our implementation, we can find out the differences. Since they have nearly the same symbols, to differentiate the different part, we use subscript 11 to denote the implementation suggested by Bell and McCaffrey (2002) and Pustejovsky and Tipton (2018), and use 22 to denote the our implementation of covariance estimator in mmrm, we have

v1=sC(XWX)1iXiWiA1,iϵiϵiA1,iWiXi(XWX)1C v_{1} = s C^\top(X^\top W X)^{-1}\sum_{i}{X_i^\top W_i A_{1, i} \epsilon_i \epsilon_i^\top A_{1, i} W_i X_i} (X^\top W X)^{-1} C

v2=sC(XWX)1iXiLiA2,iLiϵiϵiLiA2,iLiXi(XWX)1C v_{2} = s C^\top(X^\top W X)^{-1}\sum_{i}{X_i^\top L_i A_{2, i} L_i^\top \epsilon_i \epsilon_i^\top L_i A_{2, i} L_i^\top X_i} (X^\top W X)^{-1} C

Here we will prove that they are identical.

Proof of Identity

Proof for Covariance Estimator

First of all, we assume that all AiA_i matrix, in any form, are positive-definite. Comparing v1v_{1} and v2v_{2}, we see that the different part is

M1,d,i=WiA1,i M_{1, d, i} = W_i A_{1, i} and M2,d,i=LiA2,iLi M_{2, d, i} = L_i A_{2, i} L_i^\top

Substitute H1H_{1} and H2H_{2} with its expression, we have

M1,d,i=Wi(IiXi(XWX)1XiWi)d M_{1, d, i} = W_i (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^d

M2,d,i=Li(IiLiXi(XWX)1XiLi)dLi M_{2, d, i} = L_i (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^d L_i^\top

Where dd takes 00, 1/2-1/2 and 1-1 respectively.

Apparently, if d=0d=0, these two are identical because Wi=LiLiW_i = L_i L_i^\top.

When d=1d = -1, we have

$$ M_{2, -1, i} = L_i (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^{-1} L_i^\top \\ = (L_i^{-1})^{-1} (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^{-1} ((L_i^\top)^{-1})^{-1} \\ = [((L_i^\top)^{-1})(I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)(L_i^{-1})]^{-1} \\ = [(L_i^\top)^{-1}L_i^{-1} - X_i (X^\top W X)^{-1} X_i^\top]^{-1} \\ = (W_i^{-1} - X_i (X^\top W X)^{-1} X_i^\top)^{-1} $$

$$ M_{1, -1, i} = W_i (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^{-1} \\ = (W_i^{-1})^{-1} (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^{-1} \\ = [(I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)((W_i^{-1}))]^{-1} \\ = (W_i^{-1} - X_i (X^\top W X)^{-1} X_i^\top)^{-1} $$

Obviously, M2,1,i=M1,1,iM_{2, -1, i} = M_{1, -1, i}, and use the following notation

M2,1,i=LiB2,iLi M_{2, -1, i} = L_i B_{2, i} L_i^\top

M1,1,i=WiB1,i M_{1, -1, i} = W_i B_{1, i}

we have

$$ B_{1, i} = W_i^{-1} L_i B_{2, i} L_i^\top \\ = (L_i^\top)^{-1} B_{2, i} L_i^\top $$

When d=1/2d = -1/2, we have the following

$$ M_{2, -1/2, i} = L_i (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^{-1/2} L_i^\top \\ = L_i B_{2, i}^{1/2} L_i^\top $$

$$ M_{1, -1/2, i} = W_i (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^{-1/2} \\ = W_i B_{1, i}^{1/2} $$

Apparently if B1,i1/2(Li)1B2,i1/2LiB_{1, i}^{1/2} \ne (L_i^\top)^{-1} B_{2, i}^{1/2} L_i^\top, we should also have B1,i1/2B1,i1/2(Li)1B2,i1/2Li(Li)1B2,i1/2Li B_{1, i}^{1/2} B_{1, i}^{1/2} \ne (L_i^\top)^{-1} B_{2, i}^{1/2} L_i^\top (L_i^\top)^{-1} B_{2, i}^{1/2} L_i^\top

leading to

B1,i(Li)1B2,iLi B_{1, i} \ne (L_i^\top)^{-1} B_{2, i} L_i^\top

which is contradictory with our previous result. Thus, these covariance estimator are identical.

Proof for Degrees of Freedom

To prove G1,ij=g1,iΦg1,j G_{1, ij} = g_{1, i}^\top \Phi g_{1, j} and G2,ij=g2,ig2,j G_{2, ij} = g_{2, i}^\top g_{2, j} are identical, we only need to prove that

L1g1,i=gmmrmi L^{-1} g_{1, i} = g_{mmrm_i}

where Φ=W1\Phi = W^{-1} according to our previous expression.

We first expand L1g1,iL^{-1} g_{1, i} and gmmrmig_{mmrm_i}

L1g1,i=L1(IX(XWX)1XW)SiA1,idWiXi(XWX)1C L^{-1} g_{1, i} = L^{-1} (I - X(X^\top W X)^{-1}X^\top W) S_i^\top A_{1, i}^d W_i X_i (X^\top W X)^{-1} C

g2,i=(ILiX(XWX)1XLi)SiA2,idLiXi(XWX)1C g_{2, i} = (I - L_i^\top X(X^\top W X)^{-1}X^\top L_i) S_i^\top A_{2, i}^d L_i^\top X_i (X^\top W X)^{-1} C

where SiS_i is the row selection matrix.

We will prove the inner part equal L1(IX(XWX)1XW)SiA1,idWi=(ILX(XWX)1XL)SiA2,idLi L^{-1} (I - X(X^\top W X)^{-1}X^\top W) S_i^\top A_{1, i}^d W_i = (I - L^\top X(X^\top W X)^{-1}X^\top L) S_i^\top A_{2, i}^d L_i^\top

With the previous proof of covariance estimators, we already have

M1,d,i=WiA1,id=LiA2,idLi=M2,d,i M_{1, d, i} = W_i A_{1, i}^d = L_i A_{2, i}^d L_i^\top = M_{2, d, i} we then need to prove L1(IX(XWX)1XW)Si=(ILX(XWX)1XL)SiLi1 L^{-1} (I - X(X^\top W X)^{-1}X^\top W) S_i^\top = (I - L^\top X(X^\top W X)^{-1}X^\top L) S_i^\top L_i^{-1}

and note the relationship between (IX(XWX)1XW)(I - X(X^\top W X)^{-1}X^\top W) and (ILX(XWX)1XL)(I - L^\top X(X^\top W X)^{-1}X^\top L) has already been proved in covariance estimator section, we only need to prove

L1(IX(XWX)1XW)Si=(ILX(XWX)1XL)SiLi1 L^{-1} (I - X(X^\top W X)^{-1}X^\top W) S_i^\top = (I - L^\top X(X^\top W X)^{-1}X^\top L) S_i^\top L_i^{-1}

Apparently

L1(IX(XWX)1XW)Si=L1SiL1X(XWX)1XiWi L^{-1} (I - X(X^\top W X)^{-1}X^\top W) S_i^\top = L^{-1} S_i^\top - L^{-1} X(X^\top W X)^{-1}X_i^\top W_i

(ILX(XWX)1XL)SiLi1=SiLi1LX(XWX)1Xi (I - L^\top X(X^\top W X)^{-1}X^\top L) S_i^\top L_i^{-1} = S_i^\top L_i^{-1} - L^\top X(X^\top W X)^{-1}X_i^\top

And obviously L1Si=SiLi1 L^{-1} S_i^\top = S_i^\top L_i^{-1}

L1X(XWX)1XiWi=LX(XWX)1Xi L^{-1} X(X^\top W X)^{-1}X_i W_i = L^\top X(X^\top W X)^{-1}X_i^\top

because of the following $$ (X(X^\top W X)^{-1}X_i W_i)_{i} = X_i(X^\top W X)^{-1}X_i W_i \\ = W_i X_i(X^\top W X)^{-1}X_i^\top \\ = (W X(X^\top W X)^{-1}X_i^\top)_{i} $$

Special Considerations in Implementations

Pseudo Inverse of a Matrix

Empirical covariance matrix is involved with the inverse of a matrix, or symmetric square root of a matrix. To calculate this, we usually requires that the matrix is positive-definite. However, Young (2016) suggest that this is not always assured in practice.

Thus, following Pustejovsky and Tipton (2018), we use the pseudo inverse to avoid this. We follow the following logic (see the corresponding C++ function pseudoInverseSqrt) to obtain the pseudo inverse:

  1. Conduct singular value decomposition.
  2. Use cpow to obtain the square root of the reciprocals of singular values, if the value is larger than a computational threshold; otherwise replace the value with 0.
  3. Reconstruct the pseudo inverse matrix from modified singular values and U/V matrix.

In Eigen package, the pseudo inverse method is already implemented in Eigen::CompleteOrthogonalDecomposition< MatrixType_ >::pseudoInverse, but it is not used for the following reason:

  1. The pseudo inverse method is not stable and can lead to NAN in calculations.
  2. To find out the symmetric square root, singular value decomposition is still needed, so not using the method but instead calculating directly the square root of the pseudo inverse can be simpler.

References

Bell RM, McCaffrey DF (2002). “Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples.” Survey Methodology, 28(2), 169–182.
Pustejovsky JE, Tipton E (2018). “Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models.” Journal of Business & Economic Statistics, 36(4), 672–683.
Young A (2016). “Improved, Nearly Exact, Statistical Inference with Robust and Clustered Covariance Matrices Using Effective Degrees of Freedom Corrections.” Manuscript, London School of Economics,. Retrieved from https://personal.lse.ac.uk/YoungA/Improved.pdf