Multiple group modeling with no variance in specific groups

Maher Said

unread,

Jan 3, 2022, 4:40:10 PM1/3/22

to lavaan

Hello!

I'm in the process of doing a multiple group SEM (grouping by AGE for example). Coincidentally, a group has no variance for a missing_dummy variable (there are no missing observations for one age group).

Is it possible to still do multiple group modeling in this case? Is there a way of telling the code to NOT estimate a coefficient for the missing dummy for specific age groups (in line with the constraining options shown in https://lavaan.ugent.be/tutorial/groups.html )?

As a toy example, here is a reduced version of my SEM model with 2 age groups (senior vs. non-senior). Non-seniors are the ones without any cases of missing income information (and therefore no variance in missing_income_dummy).

LV =~ ind1 + ind2 + ind3

y ~ income + missing_income_dummy + LV

lavaan(..., group = "senior")

I have attempted the following without much success:

LV =~ ind1 + ind2 + ind3

y ~ income + c(NA, 0)*missing_income_dummy + LV

lavaan(..., group = "senior")

Any help is appreciated,
Thank you!

Terrence Jorgensen

unread,

Jan 4, 2022, 12:34:58 PM1/4/22

to lavaan

I have attempted the following without much success:

LV =~ ind1 + ind2 + ind3
y ~ income + c(NA, 0)*missing_income_dummy + LV

Right, that won't work because the input data is still a problem (zero variance in that group makes its observed covariance matrix NPD). But you can simply omit that variable from one group's model, by using lavaan's block syntax, similar to what is used to define (potentially unique) models at different levels of a MLSEM: https://lavaan.ugent.be/tutorial/multilevel.html

Assuming your "senior" variable is a dummy code (or simply labeled "1" and "0"), it would look something like this:

group: 1

LV =~ ind1 + ind2 + ind3

y ~ income + missing_income_dummy + LV

group: 0

LV =~ ind1 + ind2 + ind3

y ~ income + LV

However, I would be wary of comparing other y coefficients across groups (e.g., for income itself) because their interpretation differs when you are not controlling for the same variables. Ideally, you would use interaction terms (or really, nested effects, but using product terms either way) to estimate a pattern mixture model, which would allow you to compare (or pool) coefficients across groups. That is one of the recommended methods for modeling MNAR data: https://doi.org/10.1080/01621459.1993.10594302

However, product terms cannot be calculated with latent predictors. This article might help you think of options, although the context of the application is a CFA rather than regression model.

https://doi.org/10.1080/10705511.2016.1250635

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Maher Said

unread,

Jan 20, 2022, 6:04:04 PM1/20/22

to lavaan

Thank you Terrence!

This is great help and I wouldn't have learned about it had it not been for you. I'll be careful about not comparing differing groups and follow the resources you suggested.

Reply all

Reply to author

Forward