lavPredict - factor scores for the dependent variable

Skip to first unread message


Apr 15, 2019, 1:04:01 PM4/15/19
to lavaan
Hi lavaan community

I am using the package to do a CFA with the aim to get the factor scores for my 4 factors which I have done already. (will be used as independent variables for ols)
I also do have another factor in my dataset which is my dependent variable. 

I also would like to use the factors scores using lavPredict for my dependent variable but got confused about the Note regarding the prediction of dependent variables. 

"The predict() function calls the lavPredict() function with its default options.If there are no latent variables in the model, type = "ov" will simply return the values of the observed variables. Note that this function can not be used to ‘predict’ values of dependent vari- ables, given the values of independent values (in the regression sense). In other words, the structural component is completely ignored (for now)."

Can I use the factor scores which lavPredict gives me for my dependent variable (factor with 5 items) to do a ols later on?

Terrence Jorgensen

Apr 16, 2019, 4:40:39 AM4/16/19
to lavaan
Can I use the factor scores which lavPredict gives me for my dependent variable (factor with 5 items) to do a ols later on?

Yes.  The note is about being unable to obtain predicted values of observed, not latent, variables.  The note is necessary for people coming from a regression mentality, who (quite naturally) expect the generic predict() function to provide predicted values of observed outcomes, since that is how it is employed for objects returned by lm(), glm(), etc.

FYI, the problem with using factor scores in a subsequent analysis is that they do not take into account the uncertainty about their estimation.  So results are biased in most cases.

Yves is working on an integrated 2-step estimation approach in lavaan:

The fsr() function is currently not publicly available (still need to figure out the best defaults, solve some issues), but another approach is plausible-values imputation:

This is available in semTools using the plausibleValues() function.  To use it, you will need the development versions of lavaan and semTools:

install.packages("lavaan", repos = "", type = "source")
?plausibleValues # see examples

The examples in the help page assume you want to run a path analysis afterward, but there is nothing preventing using the list of imputed data sets (i.e., with factors scores) in OLS or other regression analyses.  The mitml package has some particularly useful functions for obtaining pooled estimates as well as a range of pooled test statistics.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam


Apr 16, 2019, 4:43:18 AM4/16/19
to lavaan
The message means that the predict function does not give Y-hat scores (i.e., predicted scores as in regression). It gives factor scores. It can be problematic to use factor scores in OLS regression (see Grice, 2001). If you want to use factor scores see the fsr function in the latest lavaan development called with lavaan:::fsr. 


Apr 16, 2019, 8:23:08 AM4/16/19
to lavaan
Thanks for the quick and competent reply terrence !!

I will have a look to the plausibelvalues imputation !
looking forward to the implementation of the 2step estimation approach
Reply all
Reply to author
0 new messages