Details of Weighted Least Square Empirical Covariance
Source:vignettes/empirical_wls.Rmd
empirical_wls.Rmd
Weighted Least Square (WLS) Empirical Covariance
Following the notation we have without weights, Bell and McCaffrey (2002) and Pustejovsky and Tipton (2018) suggest
where takes , , or is unchanged, but is changed
For the degrees of freedom, we have
where
Difference of Implementations
Comparing the previous section with our implementation, we can find
out the differences. Since they have nearly the same symbols, to
differentiate the different part, we use subscript
to denote the implementation suggested by Bell
and McCaffrey (2002) and Pustejovsky and
Tipton (2018), and use
to denote the our implementation of covariance estimator in
mmrm
, we have
Here we will prove that they are identical.
Proof of Identity
Proof for Covariance Estimator
First of all, we assume that all matrix, in any form, are positive-definite. Comparing and , we see that the different part is
and
Substitute and with its expression, we have
Where takes , and respectively.
Apparently, if , these two are identical because .
When , we have
$$ M_{2, -1, i} = L_i (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^{-1} L_i^\top \\ = (L_i^{-1})^{-1} (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^{-1} ((L_i^\top)^{-1})^{-1} \\ = [((L_i^\top)^{-1})(I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)(L_i^{-1})]^{-1} \\ = [(L_i^\top)^{-1}L_i^{-1} - X_i (X^\top W X)^{-1} X_i^\top]^{-1} \\ = (W_i^{-1} - X_i (X^\top W X)^{-1} X_i^\top)^{-1} $$
$$ M_{1, -1, i} = W_i (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^{-1} \\ = (W_i^{-1})^{-1} (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^{-1} \\ = [(I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)((W_i^{-1}))]^{-1} \\ = (W_i^{-1} - X_i (X^\top W X)^{-1} X_i^\top)^{-1} $$
Obviously, , and use the following notation
we have
$$ B_{1, i} = W_i^{-1} L_i B_{2, i} L_i^\top \\ = (L_i^\top)^{-1} B_{2, i} L_i^\top $$
When , we have the following
$$ M_{2, -1/2, i} = L_i (I_i - L_i^\top X_i (X^\top W X)^{-1} X_i^\top L_i)^{-1/2} L_i^\top \\ = L_i B_{2, i}^{1/2} L_i^\top $$
$$ M_{1, -1/2, i} = W_i (I_i - X_i (X^\top W X)^{-1} X_i^\top W_i)^{-1/2} \\ = W_i B_{1, i}^{1/2} $$
Apparently if , we should also have
leading to
which is contradictory with our previous result. Thus, these covariance estimator are identical.
Proof for Degrees of Freedom
To prove and are identical, we only need to prove that
where according to our previous expression.
We first expand and
where is the row selection matrix.
We will prove the inner part equal
With the previous proof of covariance estimators, we already have
we then need to prove
and note the relationship between and has already been proved in covariance estimator section, we only need to prove
Apparently
And obviously
because of the following $$ (X(X^\top W X)^{-1}X_i W_i)_{i} = X_i(X^\top W X)^{-1}X_i W_i \\ = W_i X_i(X^\top W X)^{-1}X_i^\top \\ = (W X(X^\top W X)^{-1}X_i^\top)_{i} $$
Special Considerations in Implementations
Pseudo Inverse of a Matrix
Empirical covariance matrix is involved with the inverse of a matrix, or symmetric square root of a matrix. To calculate this, we usually requires that the matrix is positive-definite. However, Young (2016) suggest that this is not always assured in practice.
Thus, following Pustejovsky and Tipton
(2018), we use the pseudo inverse to avoid this. We follow the
following logic (see the corresponding C++
function
pseudoInverseSqrt
) to obtain the pseudo inverse:
- Conduct singular value decomposition.
- Use
cpow
to obtain the square root of the reciprocals of singular values, if the value is larger than a computational threshold; otherwise replace the value with 0. - Reconstruct the pseudo inverse matrix from modified singular values and U/V matrix.
In Eigen
package, the pseudo inverse method is already
implemented in Eigen::CompleteOrthogonalDecomposition< MatrixType_ >::pseudoInverse
,
but it is not used for the following reason:
- The pseudo inverse method is not stable and can lead to
NAN
in calculations. - To find out the symmetric square root, singular value decomposition is still needed, so not using the method but instead calculating directly the square root of the pseudo inverse can be simpler.