Using lavPredict on simple model

MB

unread,

Mar 5, 2015, 5:13:55 AM3/5/15

to lav...@googlegroups.com

Hello all,

I have a very simple model of observed variables:

data=data

model<-"

C~B+D

A~C+D

"

fit<-sem(model, data=data, std.ov=T)

I would like to get predicted values of endogenous variable "D", given a 10% increase in A, and 20% increase in C (for example)

I'm guessing a way to do this is generate a new dataset (A=A+10%, B=B, C=C+20%) and leave D empty, but run it through:

lavPredict(fit, newdata = newdata)

Perhaps I am wildly misunderstanding how to take coefficient scores from one model and apply it to a new data set.

Thanks for advice in the right direction,

Michael

Terrence Jorgensen

unread,

Mar 7, 2015, 7:02:58 PM3/7/15

to lav...@googlegroups.com

I would like to get predicted values of endogenous variable "D", given a 10% increase in A, and 20% increase in C (for example)

D is not an endogenous variable in your model. It is a predictor of both C and A, but not an outcome in the regressions you specify in your syntax.

But generally, yes, passing a new data frame (without an outcome, but specifying varying levels of your predictors) to the predict function should return fitted values. I've never used it for a path analysis in lavaan, though -- I just know the predict method returns factor scores from a latent variable model (e.g., CFA). But if it doesn't work with predict(), you can still create the new data frame with various levels of your predictors of D (or any outcome), and use the regression paths and intercept to add fitted values as a new variable.

Terry

MB

unread,

Mar 9, 2015, 6:21:41 AM3/9/15

to lav...@googlegroups.com

Hi Terry,

Looks like I should have reread my post - you are correct: I meant to say "Predict values of endogenous variable A, given 10% increase in D...."

And your suggestion of using regression slopes + intercept was the clearest way of predicting A... didn't even need the full new dataframe. Should have thought of that beforehand.

Thanks,

Michael

Marcos Angelini

unread,

Mar 9, 2015, 9:56:01 AM3/9/15

to lav...@googlegroups.com

Hi Terry, Michael and community,

I have a similar question. I have a simple SEM where my target variable to be predicted are both endogenous and exogenous. Let me simplify the model to this model

D ~ A
C ~ D + B
E ~ C

I can calibrate the model with 350 samples. But, I want to predict B, C and D for cases where only A and E are present. Is it possible to use the equations (slope and intercept) even when I know that E depend on C and not on the other way around? Is there any conceptual restriction?

Thanks in advance,
Marcos

Terrence Jorgensen

unread,

Mar 9, 2015, 11:22:57 PM3/9/15

to lav...@googlegroups.com

D ~ A
C ~ D + B
E ~ C

I can calibrate the model with 350 samples. But, I want to predict B, C and D for cases where only A and E are present. Is it possible to use the equations (slope and intercept) even when I know that E depend on C and not on the other way around? Is there any conceptual restriction?

You can transform a X --> Y slope into a Y --> X slope (something like multiplying the slope by the ratio of the X and Y variances, at least when there is only one predictor -- you can figure it out by looking at the formula for an OLS simple-regression slope). But I don't see why you'd do it that way. Conceptually, if you want to "predict" B, then why aren't you predicting it? And if only A and E are present, then you have no criteria for predicting C (which was only predicted by D and B).

If you want to "predict" B, C, and D because you only observed A and E, then it's a missing data problem. You can use FIML or multiple imputation if the data are MAR after including any covariates related to missingness. lavaan has FIML estimation, and you can look at the semTools package to easily include auxiliary variables with FIML or to automatically do multiple imputation, fit models, and combine results.

Terry

Marcos Angelini

unread,

Mar 10, 2015, 4:34:43 AM3/10/15

to lav...@googlegroups.com

Hi Terry,

Thanks for your answer. I work in soil science. B, C and D are soil properties, and A and D are environmental covariates (like satellite images).
A reason to present the equations like this is because this is a cause-efect system. I know that D depend on A, that C depend on both D and B, as does E on C. If there are missing data, I can only use the intercept. However, if I construct the equations only to predict B, C and D, I would write this:

D ~ A
B ~ E + (-A)
C ~ E

But it does not represent the cause-effect relationship that I want to show.
I am a beginner in SEM and I am not familiar with semTools, so I will look at this package.

Thank you again for your help,
Marcos

Reply all

Reply to author

Forward