lavPredictY for fitted model with covariances

70 views
Skip to first unread message

Wen L.

unread,
Nov 4, 2025, 5:32:59 AMNov 4
to lavaan
Dear All,

I am trying to use the lavPredictY function to predict an exogenous variable from a fitted model, but am running into difficulties when the model includes a residual covariance. I think it has to do with how the observed variables ("ov") are defined when there is a residual covariance. An MWE is below.

There is a related post about lavPredictY, but about how using a new dataset (with different exogenous variable values) returns incorrect predictions (https://groups.google.com/g/lavaan/c/lv33nSNQCIo/m/8E4vQERWAwAJ). I think the issues are different, but I'm happy to know if that was addressed.

Thanks in advance for any help!

Best wishes,
Wen Wei

--
rm(list=ls())
library("lavaan")
set.seed(9000)
popModel <- '
  Y ~ 2*1 + 0.5*X
'
D <- simulateData(model=popModel, model.type="sem", sample.nobs=1000)

# regression model only
model1 <- '
  Y ~ X
'
fit1 <- sem(model=model1,data=D,meanstructure=TRUE)
subset(parameterestimates(fit1),lhs=="Y" & op!="~~")

# with residual covariance
model2 <- '
  Y ~ X
  Y ~~ 0.7*X
'
fit2 <- sem(model=model2,data=D,meanstructure=TRUE)
subset(parameterestimates(fit2),lhs=="Y" & op!="~~")

# this is OK
lavPredictY(object=fit1,newdata=D[1,],ynames="Y",xnames="X")
# returns an error message
lavPredictY(object=fit2,newdata=D[1,],ynames="Y",xnames="X")

Felipe Vieira

unread,
Nov 4, 2025, 7:19:14 AMNov 4
to lav...@googlegroups.com
Hi Wen Wei,

Just to confirm: what version of lavaan is that? The current development version would give an error for both cases. From what I can see it just seems to be the case that the function tries to compute variance(s) from the given data, but in this case newdata only has one observation so you get NAs with var().  

Best, 
Felipe. 



--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/d46b075b-98df-4f03-ba08-5f2db7bec7f4n%40googlegroups.com.

Wen L.

unread,
Nov 5, 2025, 11:08:46 AMNov 5
to lavaan
Hi Felipe,

Thanks for the quick response! I am using lavaan 0.6-20.

I've edited the example so it makes predictions for two observations in the "newdata" instead. However, the predictions from the model with constraints are incorrect - they are identical to those from the unconstrained model (even though the regression coefficients are different).

Best wishes,
Wen Wei
--
rm(list=ls())
library("lavaan")
set.seed(9000)
popModel <- '
  Y ~ 2*1 + 0.5*X
'
D <- simulateData(model=popModel, model.type="sem", sample.nobs=1000)

# regression model only
model1 <- '
  Y ~ X
'
fit1 <- sem(model=model1,data=D,meanstructure=TRUE)
subset(parameterestimates(fit1),lhs=="Y" & op!="~~")
#   lhs op rhs   est    se      z pvalue ci.lower ci.upper
# 1   Y  ~   X 0.524 0.031 16.672      0    0.463    0.586
# 4   Y ~1     2.030 0.032 63.511      0    1.968    2.093


# with residual covariance
model2 <- '
  Y ~ X
  Y ~~ 0.7*X
'
fit2 <- sem(model=model2,data=D,meanstructure=TRUE)
subset(parameterestimates(fit2),lhs=="Y" & op!="~~")
#   lhs op rhs    est    se      z pvalue ci.lower ci.upper
# 1   Y  ~   X -0.153 0.044 -3.508      0   -0.239   -0.068
# 5   Y ~1      2.039 0.039 52.697      0    1.963    2.114

# this is correct
lavPredictY(object=fit1,newdata=D[1:2,],ynames="Y",xnames="X")[,"Y"]
# [1] 1.407290 1.668423

# this is incorrect
lavPredictY(object=fit2,newdata=D[1:2,],ynames="Y",xnames="X")[,"Y"]
# [1] 1.407290 1.668423

Yves Rosseel

unread,
Nov 5, 2025, 1:50:36 PMNov 5
to lav...@googlegroups.com
Note that both models result in the same model-implied
variance-covariance matrix:

> lavInspect(fit1, "implied")$cov
Y X
Y 1.306
X 0.542 1.033
> lavInspect(fit2, "implied")$cov
Y X
Y 1.306
X 0.542 1.033

And lavPredictY() uses these covariance matrices to 'predict' Y values
from 'X values', not merely the beta coefficients.

Yves.

Yves Rosseel

unread,
Nov 5, 2025, 1:58:34 PMNov 5
to lav...@googlegroups.com
> From what I can see it just
> seems to be the case that the function tries to compute variance(s) from
> the given data, but in this case newdata only has one observation so you
> get NAs with var().

Indeed. Fixed now on github.

Yves.

Felipe Vieira

unread,
Nov 5, 2025, 2:28:29 PMNov 5
to lav...@googlegroups.com
Yves already answered it, but I spent around an hour looking at the source code and writing an answer. So, here is a minimal chunk that shows how those predictions are calculated. Maybe it will be useful in the future:  

rm(list = ls())
library(lavaan)

set.seed(9000)

popModel <- '
  Y ~ 2*1 + 0.5*X
'
D <- simulateData(model = popModel, model.type = "sem", sample.nobs = 1000)

model1 <- '
  Y ~ X
'
fit1 <- sem(model = model1, data = D, meanstructure = TRUE)

model2 <- '
  Y ~ X
  Y ~~ 0.7*X
'
fit2 <- sem(model = model2, data = D, meanstructure = TRUE)

newdata <- D[1:2, c("Y","X")]

pred1_lav <- lavPredictY(fit1, newdata = newdata, ynames = "Y", xnames = "X")[, "Y"]
pred2_lav <- lavPredictY(fit2, newdata = newdata, ynames = "Y", xnames = "X")[, "Y"]

# conditional mean prediction from implied (!) mu and sigma
predictions_by_hand <- function(fit, newdata) {
  imp   <- lavInspect(fit, "implied")
  sigma <- imp$cov; mu <- imp$mean
 
  muY <- mu["Y"]; muX <- mu["X"]
  Sxx <- sigma["X","X"]; Syx <- sigma["Y","X"]    
 
  x_new     <- newdata$X
  beta_cond <- Syx / Sxx    
  muY + beta_cond * (x_new - muX)
}

predictions_by_hand(fit1, newdata); predictions_by_hand(fit2, newdata)

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Wen L.

unread,
Nov 7, 2025, 3:28:48 AMNov 7
to lavaan
Thanks, Yves and Felipe! 

Yves: I didn't know about the use of the model-implied (co)variance matrix, so it's good to know! Thanks for sharing!

Felipe: That's a great idea to calculate the coefficients manually! However, the function didn't return the predicted values based only on the path coefficients, so I wrote another one instead. The full example is below. Maybe someone else might find this useful in the future.

Thank you both again! Have a nice weekend!

Best wishes,
Wen Wei

--
rm(list=ls())
library("lavaan")
set.seed(9000)
popModel <- '
  Y ~ 2*1 + 0.5*X
'
D <- simulateData(model=popModel, model.type="sem", sample.nobs=1000)

# regression model only
model1 <- '
  Y ~ X
'
fit1 <- sem(model=model1,data=D,meanstructure=TRUE)
subset(parameterestimates(fit1),lhs=="Y" & op!="~~")
#   lhs op rhs   est    se      z pvalue ci.lower ci.upper
# 1   Y  ~   X 0.524 0.031 16.672      0    0.463    0.586
# 4   Y ~1     2.030 0.032 63.511      0    1.968    2.093

# with residual covariance
model2 <- '
  Y ~ X
  Y ~~ 0.7*X
'
fit2 <- sem(model=model2,data=D,meanstructure=TRUE)
subset(parameterestimates(fit2),lhs=="Y" & op!="~~")
#   lhs op rhs    est    se      z pvalue ci.lower ci.upper
# 1   Y  ~   X -0.153 0.044 -3.508      0   -0.239   -0.068
# 5   Y ~1      2.039 0.039 52.697      0    1.963    2.114

# this is correct
lavPredictY(object=fit1,newdata=D[1:2,],ynames="Y",xnames="X")[,"Y"]
# [1] 1.407290 1.668423

# this is incorrect
lavPredictY(object=fit2,newdata=D[1:2,],ynames="Y",xnames="X")[,"Y"]
# [1] 1.407290 1.668423


# conditional mean prediction from implied (!) mu and sigma
predictions_by_hand <- function(fit, newdata) {
  imp   <- lavInspect(fit, "implied")
  sigma <- imp$cov; mu <- imp$mean
 
  muY <- mu["Y"]; muX <- mu["X"]
  Sxx <- sigma["X","X"]; Syx <- sigma["Y","X"]    
 
  x_new     <- newdata$X
  beta_cond <- Syx / Sxx    
  muY + beta_cond * (x_new - muX)
}
# this is correct
predictions_by_hand(fit1, D[1:2,])

# [1] 1.407290 1.668423
# this is incorrect
predictions_by_hand(fit2, D[1:2,])
# [1] 1.407290 1.668423

# manually calculate predicted values using path coefficients
predictions_by_hand_2 <- function(fit, newdata, dv) {
  fit.coef <- subset(parameterEstimates(fit), op=="~" & lhs==dv)
  fit.coef.intercept <- subset(parameterEstimates(fit), op=="~1" & lhs==dv)
  as.numeric(
    fit.coef.intercept[,"est"] +
      (as.matrix(newdata[,fit.coef[,"rhs"],drop=FALSE]) %*%
         unlist(fit.coef[,"est"]))[,1])
}
# these are correct
predictions_by_hand_2(fit1, D[1:2,], dv="Y")
# 1.407290 1.668423
predictions_by_hand_2(fit2, D[1:2,], dv="Y")
# [1] 2.220626 2.144333
Reply all
Reply to author
Forward
0 new messages