Guidelines for invariance with skewed continuous predictors

Conal Monaghan

unread,

Jul 31, 2021, 11:56:24 PM7/31/21

to lavaan

Hi Everyone,

We have been looking for clear guidelines for investigating invariance with heavily skewed continuous predictors.

Although we are curious about general principles, the case we are looking at is a three factor structure with three different targets of the questions (rules for self, men, woman). We are treating this as longitudinal model, as each participant completes all three scales (equivalence of the same structure across all three targets / within subjects; i.e., using longFacNames and long.equal within measEq.syntax() ). (No Minimal reproducible example given theoretical discussion and guidelines).

Example indicator distribution attached. Screen Shot 2021-08-01 at 1.33.40 pm.png

1. When modelling a standard CFA for each model individually, it appears that WLS are better indicated given the distribution of some of the indicators. ML/MLR are strong methods of obtaining parameters when distributions are closer to normal, however, when this is substantially violated, WLS is better - essentially treating the indicators as ordinal. Given difficulties with converging, DWLS is often used here. (Forero et al., 2009; https://doi.org/10.1080/10705510903203573).

When taking this approach, would we need to specify all indicators as categorical (Note that here we are using 0-100, however, often Likert 1-7 is used), and do the standard alternative fit indices make sense (SRMR, RMSEA, CFI, NNFI)?

2. However, when then running invariance through standard sequential constraint of parameters, it is unclear whether the standard approach to change in CFI is still meaningful using DWLS and not ML/MLR. Can we still use change in CFI at .01 (liberal) and .005 / .002 being more stringent benchmarks for change.

With Kind Regards and thanks for your help in advance,

Conal Monaghan

Pat Malone

unread,

Aug 1, 2021, 9:06:35 AM8/1/21

to lav...@googlegroups.com

Conal,

ML assumes data are multivariate normal. MLR allows for heavy tails, but does not help with the kind of floor effect and skew you are facing.

WLS is usually considered unstable at best without sample sizes of at least 5,000.

I think your best bet if you want to stay in lavaan is binning your indicators (e.g., 0 to capture the floor, then quartiles among what remains) and using DWLS.

Another possibility is blavaan to avoid the distributional assumptions.

Though if your data have a hurdle process you might want to try to model that, but that's trickier. Example: If the variable is number of cigarettes smoked, zero is qualitatively different from 1 and up--the data-generating process has to cross a "hurdle" to get from the binary (0 / >0) part to the continuous part.

Pat

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/c1b36a16-6055-457b-b13b-3f014b8c2759n%40googlegroups.com.

--

Patrick S. Malone, PhD

Sr Research Statistician, FAR HARBΦR

+1 803.553.4181 | pat@ | farharbor.com

This message may contain confidential information; if you are not the intended recipient please notify the sender and delete the message.

Conal Monaghan

unread,

Aug 24, 2021, 8:54:11 PM8/24/21

to lavaan

Thanks for the reply. Definately WLS is difficult but maybe the best approach is DWLS going forward and try to remove as much of the skewed items as possible. Would there be any downside to DWLS over MLR in terms of interpreting fit indicies using the same criteria or implimenting invariance analyses?

I agree that anything with a hurdle is much more difficult. When this is the case, we could consider splitting data seperately for both groups (smokers / non-smokers) within your example.

Blavaan could be a good option, would this be approapriate for most non-normal underlying distributions / count variables? Similarly to DWLS, are you aware of interpreting BLavaan models in line with standard fit criteria (CFI,NNFI, RMSEA, SRMR) or Invariance modelling (through comparing CFI change).

Cheers,

Conal

Pat Malone

unread,

Aug 29, 2021, 10:13:50 AM8/29/21

to lav...@googlegroups.com

Hi, Conal.

Sorry this took a bit.

I'm not going to address fit indices, except to say the pros and cons are pretty much the same, though some that you're used to might not be available in DWLS.

As long as your invariance strategy is appropriate to ordered indicators, you're fine. Terrence Jorgensen has many posts on the subject in the archives, including recently, posting this reference

Kite, B. A., Jorgensen, T. D., & Chen, P. Y. (2018). Random permutation testing applied to measurement invariance testing with ordered-categorical indicators. *Structural Equation Modeling, 25*(4), 573--587. https://doi.org/10.1080/10705511.2017.1421467

If the hurdle variable is the outcome, it's really not that bad. Try looking on the keyword "semicontinuous outcomes." There's quite a lot out there.

blavaan (note case) handles odd distributions just fine. But only model comparisons are available--you won't get fit indices. Instead you'll look at WAIC, LOO, Bayes factors, etc. But for invariance testing, that's what you need.

Pat

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/78a1ebf6-3048-463d-9675-20aece94053fn%40googlegroups.com.

Reply all

Reply to author

Forward