How to do predictions on lavaan?

3,669 views
Skip to first unread message

Deuterium

unread,
Oct 9, 2014, 4:06:24 AM10/9/14
to lav...@googlegroups.com
Hello,

I fitted a model to some data, and would like to predict future variations given changes in the data values using the coefficients output by the initial model.

How can I change the data while keeping the coefficients constant in order to conduct the prediction for - let's say - variations in the year to come?

Thank you,
Deuterium

Terrence Jorgensen

unread,
Oct 9, 2014, 10:23:44 AM10/9/14
to lav...@googlegroups.com

I fitted a model to some data, and would like to predict future variations given changes in the data values using the coefficients output by the initial model.

How can I change the data while keeping the coefficients constant in order to conduct the prediction for - let's say - variations in the year to come?


If by "predict future variations" you want to calculate factor scores for different observations than you used to estimate the coefficients, then you can generate a new data set (or use a hold-out sample), then pass it to the predict function using the newdata argument:

fit <- cfa(model, data = dataset1)
dataset2$factor.scores <- predict(fit, newdata = dataset2)


Terry

Deuterium

unread,
Oct 10, 2014, 2:21:56 AM10/10/14
to lav...@googlegroups.com
Thank you, that was useful but I'm not sure it's 100% what I want. To be clearer about what I'm working on, I'm modeling (through SEM) the satisfaction of a sample with commute to work. What I want to predict is actually how the satisfaction would change if I change some of the indicators. It looks something like the attached image (marked in red is what I want to predict).

Is there a way to do that directly on lavaan or do I have to extract the value of the level of satisfaction with the commute based on the values of the predicted factor scores?

Thank you for your help,
Deuterium
Capture.PNG

Mikko Rönkkö

unread,
Oct 10, 2014, 2:33:20 AM10/10/14
to lav...@googlegroups.com
Hi

Why are you using lavaan for this? If you just want to predict one observed variable linearly from others, a more simple alternative would be to just use a linear regression model for prediction. 

I do not think that lavaan can do predictions directly and your model does not include paths from the predictors to the criterion. What you can do is to get the model implied covariance matrix from Lavaan and then solve the prediction equation from there. Like this: (I have not tested this code)

fit <- sem(YOUR MODEL AND DATA)

i <- INDEX OF PREDICTED VARIABLE
j <- INDICES OF PREDICTOR VARIABLES

cv <- fitted(fit)$cov
coef <- solve(cv[j,j],cv[j,i])

predictions <- YOUR_DATA[,j]%*% coef

Mikko

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.
<Capture.PNG>


Deuterium

unread,
Oct 10, 2014, 5:18:44 AM10/10/14
to lav...@googlegroups.com
Hello Mikko,

The original purpose of my research is to model the latent structure behind the model for satisfaction with the commute. Prediction is a secondary objective as an extension to the first one. I created a function based on your code, but any chance you can share with me any references to sources from which I can read and collect the theoretical information on what your code is based on (for my own information and in order to include in my "Methodology" part of my research)?

Mikko Rönkkö

unread,
Oct 10, 2014, 5:27:15 AM10/10/14
to lav...@googlegroups.com
Hi

I cannot really give you any exact references for full approach, but this is based on two principles

1) The model implied covariance matrix is maximum likelihood estimate of the population covariance matrix given the model structure

2) If you have a covariance matrix, then OLS regression gives optimal linear predictions.

You can find these two facts in any decent SEM and regression textbooks. My favorites that I would cite are

Bollen, K. A. (1989). Structural Equations with Latent Variables. New York, NY: John Wiley & Son Inc.
Wooldridge, J. M. (2009). Introductory econometrics: a modern approach (4th ed.). Mason, OH: South Western, Cengage Learning.


Mikko

Terrence Jorgensen

unread,
Oct 10, 2014, 6:59:31 PM10/10/14
to lav...@googlegroups.com
That's not exactly what the SEM in your diagram represents.  The indicators are not predictors of satisfaction.  The indicators are outcomes of the latent variables (just like the satisfaction is an outcome of the latent variables).  In principle, your model is equivalent to one in which satisfaction is a multidimensional indicator of the latent constructs, whereas the other indicators are unidimensional indicators of only one construct each (but that is obviously not representative of your theory). 

Hypothetically, if only ONE of your indicators changed its level while the other indicators of the same construct remained at the same levels, then the cause of that change would be due to something unique about that indicator, not common to all the indicators.  The latent variable represents the source of the common variance among all the indicators.  If you think there is something unique about an indicator that is predictive of satisfaction (beyond what the latent construct itself explains about satisfaction), then you need a regression path from that indicator to satisfaction, or equivalently, another latent variable that is pointing to both that indicator and to satisfaction.

If, however, you want to see how the predicted values of satisfaction would change as levels of the predictor(s) (i.e., the latent variables) change, then you already have the information you need in the lavaan output.  Save the regression coefficients, then create a new data object (the way you would to use the predict method for an lm or glm object) that contains combinations of values of your latent variables, then plug those "newdata" values into the regression equation to generate predicted values.

For example, if you have 2 latent variables (L1 and L2), then your regression coefficients can be found in the "Intercepts" and "Regressions" sections of the summary() output from the lavaan object.  Save those values as b0, b1, and b2.  If you used the fixed-factor method of identification, then you can pick 3 meaningful levels of the latent variable: the mean (0) and 1 SD above and below the mean (+/- 1).  Put all combinations of these into a single data frame:

newdata <- expand.grid(L1 = c(-1, 0, 1), L2 = -1:1)
newdata

Then calculate predicted values by plugging them into the regression equation:

newdata$pred <- b0 + b1*newdata$L1 + b2*newdata$L2
newdata

Finally, you can plot the predicted values

plot(pred ~ L1, data = newdata[newdata$L2 == 0, ], type = "l")
plot(pred ~ L2, data = newdata[newdata$L1 == 1, ], type = "l")

If you want to get fancy, there is a 3D plot in the "rgl" package, that the "car" package capitalizes on:

install.packages(c("car", "rgl"))
library(car)
scatter3d(
pred ~ L1 + L2, data = newdata, fit = "smooth")

But since you can't (easily) model an interaction in SEM (unless you are Bayesian), then the prediction plane will be flat, so you don't gain anything with the 3D plot.

Terry

Deuterium

unread,
Oct 11, 2014, 7:56:55 AM10/11/14
to lav...@googlegroups.com
Thank you very much Terrence and Mikko,

I will go over your suggestions after the weekend and try to fully grasp what you posted.

Deuterium


On Thursday, October 9, 2014 11:06:24 AM UTC+3, Deuterium wrote:

Yves Rosseel

unread,
Oct 11, 2014, 9:42:59 AM10/11/14
to lav...@googlegroups.com
On 10/11/2014 01:56 PM, Deuterium wrote:
> Thank you very much Terrence and Mikko,
>
> I will go over your suggestions after the weekend and try to fully grasp
> what you posted.

In the dev version of lavaan (0.5-18), there is a new function called
'lavPredict()'. The main purpose of this function is the same as
predict(), but with more options. So by default, it will give you
'factor scores' for latent variables. But by setting the 'type' argument
to "yhat", you will get 'predicted' scores for the observed variables of
your model, as in

lavPredict(fit, type = "yhat")

Perhaps, this is what you need?

Yves.

Tobias Ludwig

unread,
Nov 7, 2014, 10:18:09 AM11/7/14
to lav...@googlegroups.com


Am Samstag, 11. Oktober 2014 15:42:59 UTC+2 schrieb Yves Rosseel:


In the dev version of lavaan (0.5-18), there is a new function called
'lavPredict()'. The main purpose of this function is the same as
predict(), but with more options. So by default, it will give you
'factor scores' for latent variables. But by setting the 'type' argument
to "yhat", you will get 'predicted' scores for the observed variables of
your model, as in

lavPredict(fit, type = "yhat")

Perhaps, this is what you need?

Yves.


Is there any documentation lavPredict yet? ?lavPredict doesn´t work here (0.5-18).
Thank you, Tobias


 

yrosseel

unread,
Nov 27, 2014, 2:11:35 PM11/27/14
to lav...@googlegroups.com
On 11/07/2014 04:18 PM, Tobias Ludwig wrote:
> Is there any documentation lavPredict yet? ?lavPredict doesn´t work here
> (0.5-18).

Not yet.

Yves.

luggie müller

unread,
Apr 27, 2018, 8:12:11 AM4/27/18
to lavaan
Hi,
I'm trying to do a simliar thing. I'd like to predict change in variables, when running the same model with a different data set.
I consider the following model, containing no latent variables:

herb_layer ~ a*soil_type + forest_structure + tree_layer
forest_structure ~ forest_management + soil_type     
tree_layer ~ forest_structure + soil_type
light ~ tree_layer + forest_structure
biodiversity ~ b*herb_layer + forest_management + forest_structure

ab := a*b

to be more precise: I'd like to change the values of light by say a factor of 1.2 and predict
output values of biodiversity.
When using lavPredict() (or predict()) I recieve
lavPredict(fit, type="yhat", newdata=new_df)

Error in if (prob < .Machine$double.eps) { :
  missing value
where TRUE/FALSE needed
but don't know what to do with it.

Yves Rosseel

unread,
Jul 30, 2018, 2:25:53 PM7/30/18
to lav...@googlegroups.com
On 04/27/2018 02:12 PM, luggie müller wrote:
> I'm trying to do a simliar thing. I'd like to predict change in
> variables, when running the same model with a different data set.
> I consider the following model, containing no latent variables:

https://github.com/yrosseel/lavaan/issues/44
Reply all
Reply to author
Forward
0 new messages