Skip to contents

Prediction of conditional mean

Mathematical Derivations

Since residuals can be correlated, potentially existing observed outcomes of the same individual can be informative for predicting the unobserved valued of the same individual.

Assume that the data is sorted such that Yij=yij,j=k+1,k+2,,pY_{ij} = y_{ij}, j = k+1, k+2, \dots, p are observed and Yij,j=1,2,,kY_{ij}, j = 1, 2, \dots, k are not. The special case of all outcomes being unobserved (new individual) is covered with k=pk=p.

Let further Σi(Xi,θ)=(Σinew,new(Xi,θ)Σinew,old(Xi,θ)Σiold,new(Xi,θ)Σiold,old(Xi,θ)) \Sigma_i(X_i, \theta) = \begin{pmatrix} \Sigma_i^{new,new}(X_i,\theta) & \Sigma_i^{new,old}(X_i,\theta)\\ \Sigma_i^{old,new}(X_i,\theta) & \Sigma_i^{old,old}(X_i,\theta)\end{pmatrix}

be a block decomposition where Σinew,new(Xi,θ)=((Σi(Xi,θ))j,l)j=1k,l=1k\Sigma_i^{new,new}(X_i,\theta) = \Big(\big(\Sigma_i(X_i,\theta)\big)_{j,l}\Big)_{j = 1\dots k,\, l = 1\ldots k} and similarly for the other blocks.

Predictions can then be made based on the conditional distribution Yi,1k|Xi,Yi,k+1p=yi,k+1p𝒩(μi,Ai) Y_{i, 1\ldots k}\,|\,X_i,Y_{i,k+1\ldots p}=y_{i, k+1\ldots p}\sim\mathcal{N}(\mu_i, A_i)

with

μi(β,θ)=(Xiβ)1k+Σinew,old(Xi,θ)((Σiold,old(Xi,θ))1(yik+1p(Xiβ)k+1p)) \mu_i(\beta,\theta) = (X_i \ \beta)_{1\ldots k} + \Sigma_i^{new,old}(X_i,\theta) \, \Big(\big(\Sigma_i^{old,old}(X_i,\theta)\big)^{-1} \big(y_i^{k+1\ldots p} - (X_i \ \beta)_{k+1\ldots p}\big)\Big) and

Ai(β,θ)=Σinew,new(Xi,θ)Σiold,new(Xi,θ)(Σiold,old(Xi,θ))1Σinew,old(Xi,θ). A_i(\beta, \theta) = \Sigma_i^{new,new}(X_i,\theta) - \Sigma_i^{old,new}(X_i,\theta) \Big(\Sigma_i^{old,old}(X_i,\theta)\Big)^{-1} \Sigma_i^{new,old}(X_i,\theta) \ . Note that AiA_i does not depend on β\beta.

Implementation of predict

For implementing predict(), only μ̂i:=μi(β̂,θ̂)\widehat{\mu}_i:=\mu_i(\widehat{\beta},\widehat{\theta}) is required.

For predict(interval = "confidence") additionally standard errors are required. These could be derived using the delta methods since μi\mu_i is a function of the estimated model parameters β\beta and θ\theta. This would require the Jacobian μi(β,θ)|(β̂,θ̂)\nabla\mu_i(\beta,\theta)|_{\big(\widehat{\beta},\widehat{\theta}\big)} in addition to the estimated variance covariance matrix of the parameter estimate (β̂,θ̂)\big(\widehat{\beta},\widehat{\theta}\big), Ŝ\widehat{S}. Standard errors for μ̂(i)\widehat{\mu}^{\,(i)} are then given by the square root of the diagonal elements of (μi(β,θ)|(β̂,θ̂))Ŝ(μi(β,θ)|(β̂,θ̂)) \Big(\nabla\mu_i(\beta,\theta)|_{\big(\widehat{\beta},\widehat{\theta}\big)}\Big)^\top\quad \widehat{S} \quad \Big(\nabla\mu_i(\beta,\theta)|_{\big(\widehat{\beta},\widehat{\theta}\big)}\Big) For predict(interval = "prediction") one would use the square root of the diagonal elements of Ai(β̂,θ̂)A_i\big(\widehat{\beta},\widehat{\theta}\big) instead. The delta method could again be used to make upper and lower boundaries reflect parameter estimation uncertainty.

Alternatively, both intervals can be derived using a parametric bootstrap sample of the unrestricted parameters θ\theta. This would probably also be easier for the interval = "prediction" case.

Please note that for these intervals, we assume that the distribution is approximately normal: we use μi,j(β̂,θ̂)±Zα*sqrt(Ai,j,j(β̂,θ̂))\mu_{i,j}(\hat\beta, \hat\theta) \pm Z_{\alpha} * sqrt(A_{i, j, j}(\hat\beta, \hat\theta)) to construct it, where μi,j(β̂,θ̂)\mu_{i,j}(\hat\beta, \hat\theta) is the jjth element of μi(β̂,θ̂)\mu_i(\hat\beta, \hat\theta), and Ai,j,j(β̂,θ̂)A_{i, j, j}(\hat\beta, \hat\theta) is the j,jj,j element of Ai(β̂,θ̂)A_i(\hat\beta, \hat\theta).

Parametric Sampling for Prediction Interval

With the conditional variance formula

Var(Yi)=Var(E(Yi|θ))+E(Var(Yi|θ)) Var(Y_i) = Var(E(Y_i|\theta)) + E(Var(Y_i|\theta))

and the conditional expectation E(Yi|θ)E(Y_i|\theta) and the conditional variance Var(Yi|θ)Var(Y_i|\theta) being already described as

E(Yi|θ)=μi(β,θ) E(Y_i|\theta) = \mu_i(\beta, \theta)

and

Var(Yi|θ)=Ai(β,θ), Var(Y_i|\theta) = A_i(\beta, \theta),

we can sample on θ\theta and obtain β\beta, then calculate the variance of conditional mean and the mean of conditional variance.

Prediction of Conditional Mean for New Subjects

If there are no observations for a subject, then the prediction is quite simple:

Yi=Xiβ̂ Y_i = X_i \hat\beta

Simulate response

To create simulation of responses from a fitted model, we have multiple situations: whether this simulation is conditional on both θ\theta and β\beta estimates, or it is marginal?

Conditional Simulation

Under conditional simulation setting, the variance-covariance matrix, and the expectation of YiY_i are already given in Mathematical Derivations.

Please note that in implementation of predict function, we only use the diagonal elements of AiA_i, however, here we need to make use of the full matrix AiA_i to obtain correctly correlated simulated observations.

Marginal Simulation

To simulate marginally, we take the variance of θ̂\hat\theta and β̂\hat\beta into consideration. For each simulation, we first generate θ\theta assuming it approximately follows a multivariate normal distribution. Then, conditional on the θ\theta we sampled, we generate β\beta also assuming it approximately follows a multivariate normal distribution.

Now we have θ\theta and β\beta estimates, and we just follow the conditional simulation.

Implementation of simulate

To implement simulate function, we first ensure that the expectation (μ\mu) and variance-covariance matrix (AA) are generated in predict function, for each of the subjects.

For simulate(method = "conditional"), we use the estimated θ\theta and β\beta to construct the μ\mu and AA directly, and generate response with N(μ,A)N(\mu, A) distribution.

For simulate(method = "marginal"), for each repetition of simulation, we generate θnew\theta_{new} from the mmrm fit, where the estimate of θ\theta and variance-covariance matrix of θ\theta are provided. Using the generated θnew\theta_{new}, we then obtain the βnew\beta_{new} and its variance-covariance matrix, with θnew\theta_{new} and the data used in fit.

Then we sample β\beta as follows. We note that on the C++ side we already have the robust Cholesky decomposition of the inverse of its asymptotic covariance matrix: cov(β̂)=(XWX)1=(LDL)1 cov(\hat\beta) = (X^\top W X)^{-1} = (LDL^\top)^{-1} Hence we make sure to report the lower triangular matrix LL and the diagonal matrix DD back to the R side, and afterwards we can generate β\beta samples as follows: βsample=βnew+LD1/2zsample \beta_{sample} = \beta_{new} + L^{-\top}D^{-1/2}z_{sample} where zsamplez_{sample} is drawn from the standard multivariate normal distribution, since cov(LD1/2zsample)=LD1/2IpD1/2L1=LD1L1=(LDL)1=cov(β̂) cov(L^{-\top}D^{-1/2}z_{sample}) = L^{-\top}D^{-1/2} I_p D^{-1/2} L^{-1} = L^{-\top}D^{-1}L^{-1} = (LDL^\top)^{-1} = cov(\hat\beta) We note that calculating w=LD1/2zsamplew = L^{-\top}D^{-1/2}z_{sample} is efficient via backwards solving Lw=D1/2zsample L^\top w = D^{-1/2}z_{sample} since LL^\top is upper right triangular.

Then we simulate the observations once with simulate(method = "conditional", beta = beta_sample, theta = theta_new). We pool all the repetitions together and thus obtain the marginal simulation results.

Relationship Between predict and simulate Results

We summarize the different options for predict and simulate methods and explain how they relate to each other.

predict options

  1. predict(type = "confidence") gives the variance of the predictions conditional on the θ\theta estimate, taking into account the uncertainty of estimating β\beta. So here we ignore the uncertainty in estimating θ\theta. It also does not add measurement error ϵ\epsilon. We can use this prediction when we are only interested in predicting the mean of the unobserved yiy_i, assuming the estimated θ\theta as the true variance parameters.
  2. predict(type = "prediction") in contrast takes into account the full uncertainty, including the variance of θ\theta and the measurement error ϵ\epsilon. We can use this prediction when we are interested in reliable confidence intervals for the unobserved yiy_i as well as the observed yiy_i, assuming we would like to predict repeated observations from the same subjects and time points.

simulate options

  1. simulate(type = "conditional") simulates observations while keeping the θ\theta and β\beta estimates fixed. It adds measurement errors ϵ\epsilon when generating the simulated values. Hence the mean of the simulated values will be within the confidence intervals from predict(type = "conditional").
  2. simulate(type = "marginal") simulates observations while taking into account the uncertainty of β\beta and θ\theta through sampling from their asymptotic frequentist distributions. On top of that, it also adds measurement errors ϵ\epsilon. This hence is using the same distribution as predict(type = "prediction").

Comparison with SAS

In SAS, from proc mixed, we are able to generate predictions using the outp argument in the model statement. For example:

PROC MIXED DATA = fev_data method=reml;
  CLASS RACE(ref = 'Asian') AVISIT(ref = 'VIS4') SEX(ref = 'Male') ARMCD(ref = 'PBO') USUBJID;
  MODEL FEV1 = ARMCD / ddfm=Satterthewaite solution chisq outp=pred;
  REPEATED AVISIT / subject=USUBJID type=un r rcorr;
  LSMEANS ARMCD / pdiff=all cl alpha=0.05 slice=AVISIT;
RUN;

However, there are some differences between the SAS implementation and our mmrm package, described as follows:

  1. While mmrm and SAS both provide predicted means (conditional on other observations) for unobserved records, SAS also provides predicted means for observed records while mmrm does not. The rationale is that in the mmrm package we want to be consistent with the notion of predictions conditional on the observed records - which means that observed records are observed and therefore there is no prediction uncertainty anymore.
  2. The prediction standard error is different between mmrm and SAS. While in SAS the prediction standard error is conditional on the estimated variance parameters θ\theta, in mmrm the marginal prediction standard error is provided. The rationale is that in the mmrm package we want to take into account the full uncertainty about parameter estimates including θ\theta.
  3. The prediction intervals in SAS are based on the t distribution, while currently in mmrm we use the normal distribution. We will be considering an extension towards using the t distribution in the future and welcome feedback on this detail.