Measurement Invariance and Latent Mean Comparison with non-normal data: MLR or Bootstraped ML?

29 views
Skip to first unread message

Basil Maly

unread,
Feb 13, 2026, 8:16:39 AMFeb 13
to lavaan

Dear all,


I’m testing measurement invariance across three groups and subsequently want to compare latent means. However, the sample sizes are unequal and rather small in two groups ( n₁ = 168, n₂ = 484, n₃ = 197). The indicators are not normally distributed, and multivariate normality is not given. However, model estimation with both seems to be stable.


My questions:

  1. Should I use MLR as the estimator, or ML with bootstrapped standard errors?
  2. My model comparisons are based on the global fit indices CFI(robust) and RMSEA(robust). Would it be better to use permuteMeasEq in this case?
Thanks for all inputs!

Terrence Jorgensen

unread,
Mar 26, 2026, 5:52:21 AM (10 days ago) Mar 26
to lavaan
Should I use MLR as the estimator, or ML with bootstrapped standard errors?

Bootstrapping is only as good as the sample is a good approximation of the population.  So if your population has a lot of kurtosis, modest samples probably won't capture the tails of your true sampling distribution by resampling.  But likewise, robust corrections also only work as well as the information from your sample approximates the true sampling distribution.  I'm not sure which would be preferable in your case, but I can definitely say that you should use MLM if you have complete data.  MLR is a less stable approximation whose only advantage is being available with incomplete data (so it can be combined with FIML).

My model comparisons are based on the global fit indices CFI(robust) and RMSEA(robust). Would it be better to use permuteMeasEq in this case?
Yes, but if you read my permutation paper, you'll see that there is no reason to rely on fit indices for testing.  Permutation doesn't make them perform better than a LRT (a scaled test in your case), only as well as that.  To test invariance, you should use a test statistic.  Fit indices are meant to be interpreted as effect sizes, not as a replacement for (but rather complementary to) test statistics.

In the case of DWLS for binary/ordinal data, a subsequent paper did show that permutation maintains somewhat better Type I error rates than the scaled-and-shifted test. But I have not investigated the case of nonnormal (approximately) continuous data.

Best,

Terrence D. Jorgensen    (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen
 
Reply all
Reply to author
Forward
0 new messages