lavPredict

Peter McHugh

unread,

Jul 28, 2015, 10:46:33 AM7/28/15

to lavaan

Hello!

I'm using lavaan for a simple path analysis and am interested in using the fitted model and lavPredict to predict values for a new dataset, e.g., along the lines of a cartoon example that looks like:

# relationships among exogenous variables

A ~ B+C

C ~ D

# relationships with endogenous variable

E ~ A+C+B

Everything works fine, however, the predicted values are inconsistent with what I would expect given estimated coefficients, new data values, etc. SO, I guess my question is the following: How exactly are predicted values computed under the hood of lavPredict? Are the path coefficients somehow factored into calculations such that strong associations have greater influence on predicted values?

I searched high and low for documentation in this area but couldn't find an answer. Apologies in advance if I've somehow overlooked an important resource. Thanks in advance,

Pete

yrosseel

unread,

Jul 29, 2015, 6:16:57 AM7/29/15

to lav...@googlegroups.com

On 07/28/2015 04:46 PM, Peter McHugh wrote:
> Everything works fine, however, the predicted values are inconsistent
> with what I would expect given estimated coefficients, new data values,
> etc.

Could you provide a small/simple artificial example (with as few
variables as possible) to explain why the predicted values seem
inconsistent?

I have the impression that in your model, there are no latent variables.
In that case, all that happens is applying the regression formula. This
is an example with three variables (x, m, y), where the model is a
simple mediation setup:

library(lavaan)

set.seed(1234)
x <- rnorm(1000)
m <- 5 + 1.2*x + rnorm(1000, 0, sd = 0.4)
y <- 2 + 0.8*m + rnorm(1000, 0, sd = 0.6)
Data <- data.frame(y,m,x)

model <- ' y ~ m + x
m ~ x '

fit <- sem(model, data = Data, meanstructure = TRUE)
summary(fit)

head(lavPredict(fit, type = "ov"))

# y m x
# [1,] 2.201717 3.531017 -1.2070657
# [2,] 2.316077 5.345494 0.2774292
# [3,] 2.378246 6.331893 1.0844412
# [4,] 2.114000 2.139284 -2.3456977
# [5,] 2.327763 5.530909 0.4291247
# [6,] 2.333690 5.624941 0.5060559

the 'formulas' in this case (ie. without any latent variables) boil down
to this:

N <- nobs(fit); nvar <- 3
X <- matrix(0, N, p); X[,3] <- fit@Data@X[[1]][,3]
BETA <- lavTech(fit, "est")$beta # regression coefficients
int <- lavTech(fit, "est")$alpha # intercepts
INT <- matrix(int, N, p, byrow = TRUE)

Yhat <- INT + X %*% t(BETA)
head(Yhat)

# [,1] [,2] [,3]
# [1,] 2.201717 3.531017 -0.0265972
# [2,] 2.316077 5.345494 -0.0265972
# [3,] 2.378246 6.331893 -0.0265972
# [4,] 2.114000 2.139284 -0.0265972
# [5,] 2.327763 5.530909 -0.0265972
# [6,] 2.333690 5.624941 -0.0265972

where the last column (the exogenous 'x'), will be replaced by its
observed values)

Yves.

Peter McHugh

unread,

Jul 29, 2015, 12:20:58 PM7/29/15

to lav...@googlegroups.com

Thank you very much for the detailed reply!

I have worked through your example and I now understand why lavPredict predictions deviate from what I was expecting. Your example is a very relevant caricature of what I'm doing (mediators, etc., plus I have a few more variables and paths). Nonetheless, it effectively gets me to the answer I was seeking.

SO, if I've followed things correctly, in your simulated dataset lavPredict predictions of y (analogous to my response) are a function of the direct influence of x and effectively compute m's influence on y at m = 0 (the matrix operation drops it). Is that fair to say? If so, that is why I found my y's to be inconsistent with the data -- I expected predictions of y from the "y~m+x" portion of model to be computed at observed x and predicted m.

Does that make sense?

Thanks again for your timely response on my question! Cheers,

Pete

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/LPxosJ7lVpU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

yrosseel

unread,

Jul 31, 2015, 6:20:03 AM7/31/15

to lav...@googlegroups.com

On 07/29/2015 06:20 PM, Peter McHugh wrote:
> I have worked through your example and I now understand why lavPredict
> predictions deviate from what I was expecting.

At least, that is that.

> SO, if I've followed things correctly, in your simulated dataset
> lavPredict predictions of y (analogous to my response) are a function of
> the direct influence of x and effectively compute m's influence on y at
> m = 0 (the matrix operation drops it).

It would appear that this is indeed what happens in lavaan 0.5-18 (if y
and m are observed).

> Does that make sense?

Good question. Clearly, if you want to 'predict' y under this model, it
does not.

I realize now that there is a fundamental misunderstanding here: the
lavPredict() function is *not* intended to 'predict' outcome values in a
regression framework. The main purpose of lavPredict() is to 'predict'
values of latent variables. Once we have these values, we can also
'predict' what the values are of the indicators of those latent
variables. That is the purpose of type = "ov".

The observed-variables-only setting is a special case, and lavPredict()
was not written (explicitly) for this setting. That we get results at
all is an (un)lucky byproduct of the fact that we can regard observed
variables as latent variables with just a single indicator.

But clearly, something needs to be done here.

What I could do is to provide an additional option to the lavPredict()
function to get model-based predictions. I will put this on my TODO
list. Unfortunately, tomorrow (1st of August) is the start of my
vacation, so I am afraid this will have to wait until the end of August.

Thanks for bringing this to my attention.

Yves.

Peter McHugh

unread,

Jul 31, 2015, 11:09:09 AM7/31/15

to lav...@googlegroups.com

Thanks much again for looking into this. Your response was super helpful and illuminated what's 'under the hood' of lavPredict. If you've had similar inquiries for prediction applications, perhaps it's worth taking a run at it (I know a cadre of freshwater ecologists thinking in this direction...). But I'm sure you've got a million better things to! Thanks for putting this package together and keeping it strong. Cheers,

Pete

P.S. Have a great holiday!

Yves.

Reply all

Reply to author

Forward