Dealing with multimodality and unconstrained parameter in Generic Confirmatory Factor Analysis model

Erikson Kaszubowski

unread,

Nov 4, 2016, 4:27:05 PM11/4/16

to Stan users mailing list

Greetings,

I'm working on a general confirmatory factor analysis model with Stan, inspired by the lavaan and blavaan packages (https://arxiv.org/pdf/1511.05604.pdf). My idea is to build a pre-compiled model, similar to the strategy used in rstanarm, to quickly fit CFA models with Stan (and general SEM models later on), using lavaan's simple syntax to define the model and generate the parameter matrices.

I have some questions about my implementation:

(1) Especially with standardized latent variables, there are at least two modes for some sets of parameters (loadings lambda and latent covariance matrix Phi) due to sign switching. I have visited many threads about this issue here, but some proposed solutions, like manual initialization and positivity constraints, are not working as expected. For example, when I constrain the loadings to be positive, some chains find the right mode, but other chains keep bouncing at zero. I suspect I should constrain more parameters, but I don't know exactly which. Could someone suggest how to deal with this multimodality issue without post-processing?

(2) I read in another thread that leaving free parameters (parameters not really constrained by any data or prior) can be harmful to HMC. Currently, I'm defining unconstrained parameters in the 'parameters' block (with cholesky correlation matrix and vector of standard deviations for the CFA model matrices), and constraining them in the 'transformed parameters' block -- e.g., fixing some elements of a matrix to zero or one. The likelihood is defined with the constrained parameters, of course, but that leaves some parameters without any true connection to the model. All unconstrained parameters are given somewhat weakly informative priors (like normal(0,1) for loadings, cauchy(0,1) for residual standard deviations, etc. etc.), so they are not completely unconstrained. When estimating the parameter of a CFA with one indicator loading fixed to one for each latent factor, HMC samples very quickly and with very good mixing. Should I be worried about the unconstrained parameters?

Thanks!

Erikson

lavaan.R

lavaan.stan

Stephen Martin

unread,

Nov 4, 2016, 4:59:39 PM11/4/16

to Stan users mailing list

First, this is an excellent project that you're working on. I'll hopefully soon be releasing a DIF/measurement invariance procedure in stan, and may borrow implementation ideas from you so the user can specify general CFA models. Very cool.

As for your questions:
1) You could try first fitting the model via optimization and using the output as the initial values for the chains. This hasn't work great for me in the past, but you may give it a shot. When I've fit CFAs in stan, I typically use the standardized solution with positive loadings and haven't hit much multimodality; this also gives better control over the priors. That said, it looks to me like you are using the covariance and multivariate normal approach, which I don't use. Try leaving the normal covariance-based approach to CFA fits and instead try fitting it via regressions. E.g., y[i,1] ~ normal(lambda[1,1]*theta[i], residual[1]) if theta is latent ability, lambda is the factor loading for factor 1 on item 1, and residual[1] is the residual for item 1. Does that make sense? Don't bother estimating the covariance matrix, but estimate the factor loadings and individuals' latent abilities and use likelihoods for each item and case. If you want the covariance matrix for later processing, generate it via transformed parameters or generated quantities.

2) The issue with leaving free parameters totally free is that they are totally unbound and winds up screwing up the adaptation of HMC. It doesn't know how to adapt the stepsize or evaluate u-turns for a particle than can essentially go onto infinity in any direction. This is fixed by priors, in which case the posterior distribution of the unconstrained, unused values will just be the prior distribution (because they are independent, p(unused | D) = p(unused). Ideally, you would just remove those if possible and only estimate parameters you actually use, but this quickly becomes a programming nightmare. It will speed up adaptation and sampling though, as there are fewer parameters to sample and adapt for.

Bob Carpenter

unread,

Nov 4, 2016, 5:16:18 PM11/4/16

to stan-...@googlegroups.com

We don't have good general solutions to (1) of which I'm aware.
In most of the sign switching cases we have (like in IRT 3PL for
generally well-posed problems) it's enough to constrain a single
variable.

As for (2), it's not just HMC. If you have a parameter with
unbounded support (not constrained above and below) and an improper
uniform prior, you get an improper uniform posterior.
But if you give them a proper prior (as you do), then you get
a proper posterior. This isn't a problem other than for efficiency.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <lavaan.R><lavaan.stan>

Bob Carpenter

unread,

Nov 4, 2016, 5:17:55 PM11/4/16

to stan-...@googlegroups.com

I'm not exactly sure what you're suggesting because I'm
not familiar with this class of models described this way,
but there's a difference between a covariance prior and
comptuing covariance in the posterior if that's what you're
suggesting.

- Bob

Stephen Martin

unread,

Nov 4, 2016, 5:22:51 PM11/4/16

to Stan users mailing list

The likelihood the OP is using is y ~ multinormal(means, covariance), where the covariance matrix is constructed from factor loadings, factor variances, and residuals.
In covariance-based latent variable modeling, the goal is to estimate factor loadings/variances and residuals, then construct the implied covariance matrix from such loadings and minimize the difference between the implied and observed covariance matrix (and there are several ways of assessing the difference between the matrices, hence why there are ML, WLS, GLS, ADF, etc estimators for CFA and SEM).

All I'm saying is to try not using the multinormal likelihood with means and the implied covariance matrix.
Instead, model each response of each individual using latent regression. Not ~ multinormal(means, covariance), but response[i,j] ~ normal(predicted value, residual), where predicted value is factor-loadings * latent abilities.

Stephen Martin

unread,

Nov 4, 2016, 5:25:53 PM11/4/16

to Stan users mailing list

Just to clarify: Use the responses directly in the likelihood rather than use a multinormal likelihood on the entire dataset.
If you still want the implied covariance matrix, then you can construct it in generated quantities for later analysis (e.g., for reporting, for fit metrics that require covariance matrix comparisons).

Erikson Kaszubowski

unread,

Nov 4, 2016, 5:54:33 PM11/4/16

to Stan users mailing list

Stephen, Bob, thanks for the input!

Stephen, my first essays with CFA on Stan were with the regression notation as you suggests. I quickly gave up on it because matrix computation makes everything much faster and elegant to code. Stan's matrix algebra is a joy to work with! I do feel it's more general, too, in terms of model definition, but I guess you could make the regression notation just as general. Do you think the regression notation might help mitigate the multimodality issues? I will try the optimization-based initialization. Is there any way to diagnose what in the model keeps pushing the chain against the constraint instead of converging to the more sensible mode?

Stephen Martin

unread,

Nov 4, 2016, 6:03:20 PM11/4/16

to Stan users mailing list

Yeah, I think the regression notation would take a bit more thoughtwork, but should still allow general models implemented via matrices. And I honestly have no idea whether it would help the multimodality, but it's the only way I've implemented CFA/SEM in stan, and I haven't hit multimodality problems. Moreover, I think it would allow flexibility in the likelihoods too (e.g., if you have dichotomous/polytomous outcomes, you can force some items to have bernoulli likelihoods without any need to compute biserial or polychoric correlations). Write up a model that uses this method and see whether you hit multimodality problems like you do with the same data in your implementation. If you don't, then you may consider using the regression technique; if you do, stick to your method. Another added benefit of this method is that you can get the latent abilities straight from the output along with all the model parameters.

Bob Carpenter

unread,

Nov 4, 2016, 6:26:24 PM11/4/16

to stan-...@googlegroups.com

> On Nov 4, 2016, at 6:03 PM, Stephen Martin <hwki...@gmail.com> wrote:
>
> Yeah, I think the regression notation would take a bit more thoughtwork, but should still allow general models implemented via matrices. And I honestly have no idea whether it would help the multimodality,

The only way it's going to reduce multimodality is if
it defines a different density. The only difference
between two different implementations of the same density will
be efficiency and arithmetic stability.

In general, scalars will be faster than matrices.
For example, you'd never want to use a diagonal matrix for a
multivariate normal. A vectorized scalar normal would be
better. That is,

y[n] ~ normal(mu, sigma);

is better than

y[n] ~ multi_normal(mu, diag_matrix(sigma));

- Bob

Erikson Kaszubowski

unread,

Nov 5, 2016, 12:05:20 PM11/5/16

to Stan users mailing list

I have tried some of the suggestions given here. Reparameterizing the model to the regression notation made it considerably slower than the earlier version, without any clear gain in terms of mixing chains or mode avoidance.

Surprisingly, 'optimizing' the model to obtain inits is not working very well: the point estimates are off the target, so I ended up using lavaan's point estimate for chain initialization. I also messed up with step size via higher adapt_delta parameter. Models with fixed loadings in one indicator variable seems to avoid other modes without a significant loss of performance. But models with standardized latent variables still suffer from sign switching, so I have to postprocess them.

Erikson Kaszubowski

unread,

Nov 6, 2016, 2:12:09 PM11/6/16

to Stan users mailing list

I e-mailed Edgar Merkle, author of the blavaan package, to ask about usual ways of dealing with those pesky multiple modes, especially the sign switching. Turns out that constraining just one loading per factor to be positive should suffice. I tried with a few models, but I occasionally got chains bumping at zero for those parameters. But, by using lavaan's point estimates for iniatizaliation, the chains explored just one mode. I still get divergent transition in some models, but the parameter estimates are quite close to lavaan's point estimates, and I'm getting very good mixing.

I'm planning to package it all up, but I attached the source files for those who might be interested!

lavaanSem.stan

lavaan2.R

lavaan.stan

lavaan.R

Erikson Kaszubowski

unread,

Nov 15, 2016, 6:21:01 PM11/15/16

to Stan users mailing list

I'm working on a CFA model that can handle equality constraints, now. I got it working, even though the code isn't pretty. Stephen, now it's possible to test measurement invariance using WAIC!

If anyone looked into the code I posted before, please disregard the SEM model: it contains some stupid errors that will be fixed soon.

I tried to work on a non-centered parameterization of the model to avoid some occasional divergent transitions that appear in some models. I used the manual suggestion of reparameterizing the multivariate normal, applied to factor score estimation. But the results are (1) much slower and (2) not able to avoid divergent transitions at all. Ideas for other choices of reparameterization are more than welcome!

lavaanCFA.R

lavaanCFA.stan

Stephen Martin

unread,

Nov 15, 2016, 11:03:39 PM11/15/16

to Stan users mailing list

I'll look at it. However:

1) I personally think the standard method of assessing measurement invariance is profoundly flawed for reasons I won't get into here.
2) I have no idea how well the WAIC or many of the information-based criteria work for assessing DIF; it seems like most people who assess DIF use, e.g., CFI, RMSEA, etc. I'd be interested in seeing a simulation study about how well WAIC detects DIF (measurement invariance).

Cool project though; once I'm finished up with my current bDIF procedure for 2PL models, I plan on moving onto implementing the theory into normal-theory CFA models.

Dr. Hans Hansen

unread,

Jan 18, 2017, 6:13:19 PM1/18/17

to Stan users mailing list

thanks for this thread - it helped me today to setup my first bayesian cfa as i followed stephens advise to use the regression notation. i am just wondering how i could account for correlated errors (residual correlations) between some of my observed variables - i guess it should go somewhere in here:

y[i,1] ~ normal(lambda[1,1]*theta[i], residual[1])

?

Bob Carpenter

unread,

Jan 20, 2017, 4:16:49 PM1/20/17

to stan-...@googlegroups.com

> On Jan 18, 2017, at 6:13 PM, Dr. Hans Hansen <drhans...@gmail.com> wrote:
>
> thanks for this thread - it helped me today to setup my first bayesian cfa as i followed stephens advise to use the regression notation. i am just wondering how i could account for correlated errors (residual correlations) between some of my observed variables - i guess it should go somewhere in here:
>
> y[i,1] ~ normal(lambda[1,1]*theta[i], residual[1])

What's correlated? Do you want to model the correlations or
just observe the posterior correaltions? If it's a time
series that's correlated, the usual approach is to write
time-series models.

- Bob

Stephen Martin

unread,

Jan 21, 2017, 11:13:55 AM1/21/17

to Stan users mailing list

You would use a multivariate normal likelihood. E.g., if x1 and x2 should have correlated errors, and they load on f1,
yhat_12[1] = mu[1] + f1*lambda[1];
yhat_12[2] = mu[2] + f1*lambda[2];

data'[1:2] ~ multi_normal(yhat_12, sigma_12);

sigma_12 would be a covariance matrix either constructed from variances and correlations or sampled from an LKJ. Read the manual for optimizing this (using multi_normal_cholesky; sigma_12 ~ lkj_cholesky)

Dr. Hans Hansen

unread,

Jan 24, 2017, 9:15:05 AM1/24/17

to Stan users mailing list

Thanks, I'll try that!

In case I have ordinal data instead of continuous, I assume I would also model yhat in this way, and then in a second step calculate thresholds with ordered_logistic? Is that reasonable?

Bob Carpenter

unread,

Jan 24, 2017, 4:55:58 PM1/24/17

to stan-...@googlegroups.com

Yes, you can do that. Our ordinal logistic isn't vectorized
yet, so it's relatively slow compared to our other distributions.

- Bob

> On Jan 24, 2017, at 9:15 AM, Dr. Hans Hansen <drhans...@gmail.com> wrote:
>
> Thanks, I'll try that!
>
> In case I have ordinal data instead of continuous, I assume I would also model yhat in this way, and then in a second step calculate thresholds with ordered_logistic? Is that reasonable?
>
>
>

Tran

unread,

Mar 31, 2017, 7:18:28 PM3/31/17

to Stan users mailing list

Hi Dr. Hans Hansen,

Have you done a CFA model for ordinal variables?

I am trying to run a CFA model for ordinal and I get divergent transitions many times.

Hopefully that I can discuss with you about this kind of model in Stan.

Tran.

Dr. Hans Hansen

unread,

Apr 2, 2017, 1:15:08 PM4/2/17

to Stan users mailing list

I didn't get it to work so far unfortunately. the model runs, but something about it is fishy - the thresholds seem to be too far apart, the loadings are far off. would be great if someone could have a look at my code, see attached.

r_script.R

cfa_model.stan

Trung Dung Tran

unread,

Apr 2, 2017, 2:43:34 PM4/2/17

to stan-...@googlegroups.com, drhans...@gmail.com

Hi,

I have one more problem that the input has missing values. In your case is 'Science' and stan does not support for integers in parameters, so I cannot specify data with missing values in this situation.

Tran.

P/S: Why do you put the priors for factor loadings so strict? Have you tried with N(0, 1000)?

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/DGgMcQP__Ks/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Dr. Hans Hansen

unread,

Apr 2, 2017, 4:03:39 PM4/2/17

to Stan users mailing list, drhans...@gmail.com

well, i did not put much thought in the prior as i wanted to get the model running first. changing it to N(0,1000) doesn't make a change. i suspect somewhere at the abilities is an error - if i compare those to a WLSMV model estimated with lavaan it turns out that the distribution obtained by stan is much wider.

Bob Carpenter

unread,

Apr 3, 2017, 12:48:28 PM4/3/17

to stan-...@googlegroups.com

You really want to code all that more efficiently.

The key thing to note is that becuase your sds_matrix
is diagonal, you don't need a multivariate normal:

pred[i, ] ~ multi_normal(pred[i, ], sds_matrix);

works out to be a very expensive way to evaluate

pred[i] ~ normal(pred[i], sds);

You don't need to declare loadings as a P x 1 matrix, just
make it a P vector.

That cov() function never gets used.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

> <r_script.R><cfa_model.stan>

Bob Carpenter

unread,

Apr 3, 2017, 12:59:28 PM4/3/17

to stan-...@googlegroups.com, drhans...@gmail.com

People often find this because Stan tends to do a better job of estimating posteriors than diffusive processes. The way to test is to simulate parameters from the priors, then simulate data from the parameters, then test that a posterior fit to the data recovers the parameters in its posterior intervals.

- Bob

> On Apr 2, 2017, at 4:03 PM, Dr. Hans Hansen <drhans...@gmail.com> wrote:
>
> well, i did not put much thought in the prior as i wanted to get the model running first. changing it to N(0,1000) doesn't make a change. i suspect somewhere at the abilities is an error - if i compare those to a WLSMV model estimated with lavaan it turns out that the distribution obtained by stan is much wider.
>
>

Dr. Hans Hansen

unread,

Apr 3, 2017, 3:22:13 PM4/3/17

to Stan users mailing list

Thanks, Bob. The multinormal is because i eventually want to allow correlated errors - i omitted that because i tried to replicate a model with uncorrelated errors first. However, it sounds that the model is correct, although it's not very efficient?

Do you have an idea, how to replicate the common factor model for ordinal data? There, one estimates polychoric correlations among the observed variables and i thought, that would be the same as fitting an ordinal_logistic in stan.

Bob Carpenter

unread,

Apr 3, 2017, 3:26:30 PM4/3/17

to stan-...@googlegroups.com

I'm afraid I don't. Somebody else might, though.

For multinormal, it's really critical you vectorize for efficiency.
Or if you are estimating the whole covariance matrix, use cholesky
factors.

- Bob

> On Apr 3, 2017, at 3:22 PM, Dr. Hans Hansen <drhans...@gmail.com> wrote:
>
>
> Thanks, Bob. The multinormal is because i eventually want to allow correlated errors - i omitted that because i tried to replicate a model with uncorrelated errors first. However, it sounds that the model is correct, although it's not very efficient?
>
> Do you have an idea, how to replicate the common factor model for ordinal data? There, one estimates polychoric correlations among the observed variables and i thought, that would be the same as fitting an ordinal_logistic in stan.
>

Reply all

Reply to author

Forward