factor score estimates in blavaan (Bayesian lavaan)

458 views
Skip to first unread message

Bill Shipley

unread,
Oct 25, 2017, 11:09:18 AM10/25/17
to lavaan
I am new to Bayesian methods of fitting and estimating parameters in SEM and to blavaan.  I am trying to understand this via Lee, S-Y. 2007. Structual Equation Modeling: A Bayesian Approach.  Three questions:

1) are the factor score estimates of the latent variables obtained as the mean and variance of the estimates of these in the Gibbs sampler after the chosen burn-in period; i.e. without being expressed in terms of the stuctural parameter estimates?

2) are there any studies comparing these Bayesian estimates of latent scores with those obtained in ML SEM via either the regression method or by minimising the sum of squares of the standardized residuals (I don't know which method is used in lavaan)?  What are the conclusions?

3) If I obtain new observations of the indicator variables, and want to then predict the latent scores associated with these new observations using blavaan, how can I do it?  This can be done in ML SEM because these are functions of the structural parameters, but I don't know how (if) this can be done in a Bayesian context.  This last point is particularly important to me because the latent scores are of primary importance and others need to use the measurement model to obtain their own latent scores given their data.

Thanks for any help.

Mauricio Garnier-Villarreal

unread,
Oct 25, 2017, 1:33:13 PM10/25/17
to lavaan
Bill

The later book by Song and Lee (2012) is more understandable in my opinion, and also the book by Kaplan (2014) has a couple of chapters about bayesian SEM. And I teach a 5 day course in Bayesian SEM (https://www.statscamp.org/summer-camp/advanced-sem) in case you are interested 

1)  The factor scores are estimated as a parameter for each subject at each step of the gibbs sampler. The factor scores can be estimated through data augmentation, meaning that given the conditional model, and the scale constraints (for example setting the mean to-, and variance to 1), the factor scores can be estimated giving that this is enough information to constraint the data space for them. They are explicit in the SEM equation. blavaan doesnt export the factor scores by default, but you can ask it to 

2) Not that I know of. A meaningful difference between de ML and Bayes factor scores, is that from ML you get 1 estimate of the factor scores, while in bayesian for each factor score for each subject you get a posterior distribution, from which you can use the mean of the posterior as the best single estimate, but you also have a measure of uncertainty for them.

3) If you get a new sample, you can run the model with blavaan and ask to export the factor scores with the jagextra argument, like this, asking to monitor "eta", eta is the name of the factors in the jags code that blavaan creates

fit51 <-  bcfa(mod51, data=dat, std.lv=T,
n.chains = 3, burnin=5000, 
            sample=2000,
jagextra = list(monitor="eta"))

Also, for blavaan specific questions, it has its own google group, where the developer may see doubts faster

bye

Adam S

unread,
Feb 22, 2020, 5:00:14 PM2/22/20
to lavaan
Please find a related question and response from Mauricio:

Hi Mauricio, 

I have estimated a single factor CFA model in Lavaan, and I derived factor scores using LavPredict. My dataframe, however, contains missing data, and it does not appear that LavPredict is able to calculate SEs for each estimated factor score when observations are not complete. I attempted to use your suggested blavaan code below, setting save.lvs=T, but I receive a similar error:   

blavaan ERROR: lvs cannot currently be saved when data are missing.

Do you have any suggestions to work around this limitation and calculate SEs of the factor scores? 

Thanks,
Adam



Adam

This is an issue with the new default Stan parameterization. If you want to extract factor scores you can use the old Stan method (target="stanclassic"), which the only disadvantage is that is slower. With missing data the posterior log-likelihhod will be slower, either way. I am showing an example at the end

Now, in general, it is not recommended to use factor scores as this does adds factor inderterminancy issues. Some, can be solve by treating factor scores from posterior draws as multiple imputations to account for factor indeterminancy and sample variability. This can be done with the function plausibleValues from semTools, which does work with blavaan objects

Please continue these questions in the blavaan google group forum. This way more people can help, and it stays there so other users can search for similar issues

Hope this helps

library(blavaan)
future::plan("multiprocess")
library(simsem)


## The famous Holzinger and Swineford (1939) example
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9
'

dat_test <- imposeMissing(HolzingerSwineford1939,
                          pmMCAR=.2)
summary(dat_test)

fit <- bcfa(HS.model, std.lv=T,
            save.lvs=T, target="stanclassic",
           data = dat_test)
summary(fit, standardized=T)


lvs <- blavInspect(fit, "lvmeans")
head(lvs)

Ed Merkle

unread,
Mar 28, 2020, 3:18:38 PM3/28/20
to lavaan
I just stumbled on this post. Saving lvs with missing data should now work in the development version of blavaan on github:


You could extract all the lv draws via

lvs <- blavInspect(fit, 'lvs')

and then compute SDs or whatever else you want with those draws. (note: lvs will be a list of matrices containing lv draws, where list length is number of chains)

Ed
Reply all
Reply to author
Forward
0 new messages