Predicting factor scores while leaving out certain indicators

Skip to first unread message


Aug 23, 2018, 5:24:13 AM8/23/18
to lavaan

Hi all,


I have a question concerning prediction in lavaan. We have a CFA model with (for now) 6 latent factors and 13 indicators. We have fitted this model on a cognitively healthy sample, and then applied it to a patient sample using the predict function.


What we wanted to do next is investigate to what extent the (healthy) predictions of the latent factor scores are influenced when data for one or more of the indicators is missing. In the future, the model will be used to give an estimate of cognitive functioning on different domains, and it is very likely that not every person will have completed the full test battery. Thus, it is essentially a test of robustness of the model (e.g., which tests are essential for reliable latent factor scores?).


In a first exploratory and rather crude attempt, we set all the values of one indicator to zero and fed this dataframe to lavPredict. In this case, however, lavaan gives an error because of the lack of variance. Our conception of what lavPredict does is that it estimates the factor scores on a subject-by-subject basis, which we thought should be possible in principle when all values of one or more indicators are zero (analogous to a simple regression equation, where a predictor would simply be cancelled out by multiplying its weight with zero).


My question is therefore both a technical and practical one: does it make theoretical sense to carry out such an analysis, and if so, how would one go about doing this using predict or lavPredict?

Thanks in advance!

Terrence Jorgensen

Aug 25, 2018, 8:19:33 AM8/25/18
to lavaan
Missing (NA) is very different from saying the value is known to be zero (on the scale of the indicator).  To simulate some items missing from a test battery, you should impose missing values for those groups, not set the values to zero.

Factor scores cannot be calculated when any indicators are missing (unless you use Bayesian / MCMC estimation -- available in the blavaan package).  If you want to see how the factor-score estimates differ when leaving an item out, you can fit a model without that item and then calculate the factor scores (for anyone with complete data on the remaining items).

You might find different advice on SEMNET.

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Reply all
Reply to author
0 new messages