How to use weights to 'balance' differences in group sample size in MGCFA invariance tests?

Fabrício Fialho

unread,

Jul 22, 2018, 1:39:56 PM7/22/18

to lavaan

Hi all,

I have a question about the use of weights in lavaan, and if doing so is appropriate for the analysis I aim to run.

I am planning a MGCFA for ordered-categorical variables where groups have unequal group sizes. I will analyze data from a cross-national survey, and sample sizes vary dramatically from country to country, from about 1000 in Country A to 2000 in Country B to 3000 in Country C; there are actually more countries in the dataset but these three illustrate the core challenge. It has been stated from a while now (e.g. Chen, 2007) that unequal sample sizes affect the sensitivity of goodness-of-fit indexes used to assess measurement invariance.

I first considered to randomly sample a subset of equal size from each dataset but it means to throw out a large amount of data for the countries with larger datasets. E.g, if subsamples have, say, 900 cases each, it would hold most of cases from Country A but would discard ~50% of the Country B sample and ~2/3 of Country C.

To avoid all that data loss, I wondered about weighting the data from each country such that they would contribute 'equally' (e.g. 'same sample sizes') to the MGCFA after weighted. For instance, attributing weight 1 to each observation from Country A, 0.5 to observations from Country B, and 0.33 to Country C so each country would count as 1000 observations.

lavaan 0.6 handles sampling weights, but it rescales the weights to the number of rows in the dataset, and apparently only works for robust ML estimators. Moreover, what I was considering is not to use sampling weights (I don't have them at all) but simply to attribute weights to some groups in order to downweight them and 'equalize' the contributions of groups of different sizes to the analysis.

So these are my questions, any suggestion will be much appreciated:

Is there a way to apply such a weighting strategy in lavaan? If it is not possible (or if it is pure nonsense), what would be an alternative way to prevent the larger group to 'dominate' model fit and potentially offuscate lack of invariance between/across groups?

(There are non-negligible % of missing data in the datasets, I will most probably do data imputation. I add this note because I will then have M 'complete' datasets for each country, which increases the computational burden to do the analysis. I considered the 'downweight' strategy above also for its simplicity. Strategies using resampling would exponentially complicate matters due to limited computations resources.)

Many thanks!

Terrence Jorgensen

unread,

Jul 24, 2018, 6:34:58 AM7/24/18

to lavaan

I would recommend posting this question on SEMNET and/or CrossValidated to get expert advice before doing this (and because this issue would have to be solved for any software, not just lavaan). Chen's conclusions are certainly relevant from a model-based perspective, which is how her simulation operated, but the use of sampling weights is typically more consistent with a design-based perspective (see Sterba, 2009). Which perspective you are operating from should probably inform whether you want the group weights to reflect their proportion of the total sample size, or rather their proportion of the total population size of those countries (or more accurately, the size of their sampling frame defined by the study's selection criteria).

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Message has been deleted

Fabrício Fialho

unread,

Jul 25, 2018, 8:32:49 AM7/25/18

to lavaan

Hi Terrence,

Many thanks for your detailed response.

I am afraid that my question might be simpler than it first implied. The question is actually software-oriented, it is about the possibility to use of weights in lavaan. I wondered about the possibility to arbitrarily assign weights to observations and use those weights in lavaan in order to deal with differences in sample size in MGCFA. Sure, other approaches may exist and be preferrable yet I am just wondering about how to implement this strategy via lavaan.

I am not doing a design-based analysis; I just referred to "sampling weights" because, if I understand it correctly, this is the only class of weights currently supported by lavaan.

Best,

F.

Edward Rigdon

unread,

Jul 25, 2018, 8:56:20 AM7/25/18

to lav...@googlegroups.com

So if you could bootstrap out of your various groups but specify that you want bootstrap samples from each group to be all the same size, and conduct your analysis based on the bootstrap samples, that would achieve your goal?

On Wed, Jul 25, 2018 at 8:30 AM Fabrício Fialho <fabrici...@gmail.com> wrote:

Hi Terrence,

Many thanks for your detailed response.

I am afraid that my question might be simpler than it first implied. The question is actually software-oriented, it is about the possibility to use of weights in lavaan. I wondered about the possibility to arbitrarily assign weights to observations and use those weights in lavaan in order to deal with differences in sample size in MGCFA.

I am not doing a design-based analysis; I just referred to "sampling weights" because, if I understand it correctly, this is the only class of weights currently supported by lavaan.

Best,
F.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward