measurement invariance for categorial data

1,048 views
Skip to first unread message

Dr. Hans Hansen

unread,
Apr 23, 2017, 4:58:21 PM4/23/17
to lavaan
hi,

I have some problems establishing measurement invariance model to my data and I hope I can find some help here. I try to fit a 1 factor 4 indicator model along Millsaps advice and the code provided by Sunthud (http://www.sunthud.com/catInvariance.html).

in order to narrow down the reason for convergence problems (i get these or very small SEs resulting in extremely large z-values) i stepwise added the constraints proposed by Millsap and the model seems to crash only when i constrain the indicator variances to 1 in the reference group and release them in group 2. Yves meant that the theta parameterization can lead to numerical issues and i suspect that the extremely skewed data is the culprit for my issues. Is there something i can do about it?

under MLR the results from measurement invariance analysis are quite good, also when i apply the (not anymore recommended?) way of constraining loadings and thresholds via group.equal. However, reviewers asked me how i handled the problem that loadings and thresholds can't be constrained independently from each other (i have not so far) and suggested that MLR is not appropriate for ordinal data. The paper referenced today in the other thread about partialinvariancecat() suggest that it's not so simple.

Finally, it would be great if someone could explain like i am 5 why i can estimate mgcfa models without imposing millsaps constraints on thresholds which are thought to identify the model - shouldn't a non-identified model be difficult to estimate?

Find attached some of my data (my final model has some more groups, factors and indicators, but the problems already appear in such a small part of the model) and code:

library(lavaan)
dat = read.table("test.txt", header = TRUE)

model0 = "Fac1 =~ P3_1 + P3_2 + P3_3 + P3_4"
model1 = 
  "Fac1 =~ P3_1 + P3_2 + P3_3 + P3_4
P3_1 |  c(t1P3_1, t1P3_1) *t1 + c(t2P3_1, t2P3_1)*t2 + t3 + t4
P3_2 |  c(t1P3_2, t1P3_2) *t1 + t2 + t3 + t4
P3_3 |  c(t1P3_3, t1P3_3) *t1 + t2 + t3 + t4
P3_4 |  c(t1P3_4, t1P3_4) *t1 + t2 + t3 + t4
"
model2 = 
  "Fac1 =~ P3_1 + P3_2 + P3_3 + P3_4
P3_1 |  c(t1P3_1, t1P3_1) *t1 + c(t2P3_1, t2P3_1)*t2 + t3 + t4
P3_2 |  c(t1P3_2, t1P3_2) *t1 + t2 + t3 + t4
P3_3 |  c(t1P3_3, t1P3_3) *t1 + t2 + t3 + t4
P3_4 |  c(t1P3_4, t1P3_4) *t1 + t2 + t3 + t4
Fac1 ~ c(0,NA)*1
"

model3 = 
  "Fac1 =~ P3_1 + P3_2 + P3_3 + P3_4
P3_1 |  c(t1P3_1, t1P3_1) *t1 + c(t2P3_1, t2P3_1)*t2 + t3 + t4
P3_2 |  c(t1P3_2, t1P3_2) *t1 + t2 + t3 + t4
P3_3 |  c(t1P3_3, t1P3_3) *t1 + t2 + t3 + t4
P3_4 |  c(t1P3_4, t1P3_4) *t1 + t2 + t3 + t4
Fac1 ~ c(0,NA)*1
Fac1 ~~ c(NA,NA)*Fac1
"
model4 = 
  "Fac1 =~ P3_1 + P3_2 + P3_3 + P3_4
P3_1 |  c(t1P3_1, t1P3_1) *t1 + c(t2P3_1, t2P3_1)*t2 + t3 + t4
P3_2 |  c(t1P3_2, t1P3_2) *t1 + t2 + t3 + t4
P3_3 |  c(t1P3_3, t1P3_3) *t1 + t2 + t3 + t4
P3_4 |  c(t1P3_4, t1P3_4) *t1 + t2 + t3 + t4
Fac1 ~ c(0,NA)*1
Fac1 ~~ c(NA,NA)*Fac1
P3_1 ~~ c(1, NA)*P3_1
P3_2 ~~ c(1, NA)*P3_2
P3_3 ~~ c(1, NA)*P3_3
P3_4 ~~ c(1, NA)*P3_4
"

models = list(model0, model1, model2, model3,model4)
x = lapply(models, function(y) cfa(y, data = dat, group = "COUNTRY", ordered = c("P3_1", "P3_2", "P3_3", "P3_4"), parameterization = "theta"))
summary(x[[4]])
summary(x[[5]])

Thanks for looking into this, best, felix
test.txt

Dr. Hans Hansen

unread,
Apr 23, 2017, 5:04:22 PM4/23/17
to lavaan
In case it's not entirely clear: i already fail to fit the configural invariant model.

Terrence Jorgensen

unread,
Apr 24, 2017, 4:03:28 AM4/24/17
to lavaan
In case it's not entirely clear: i already fail to fit the configural invariant model.

A colleague and I also found convergence issues with Millsap & Yun-Tein's identification method when we fit it to simulated data, where a statistically equivalent model converged fine (i.e., the "usual" configural invariance model, as fit by Sass et al., with fixed intercepts and factor means/variances, and free loadings and thresholds -- either the delta or theta parameterizations converged normally even when Millsap & Yun-Tein's model did not).  A recent Psychometrika article has finally extended that research and gone into much greater depth, calling into question Millsap & Yun-Tein's advice. 


Wu and Estabrook (2016) showed that equivalence of particular parameters can never be tested alone because imposing constraints on those parameters typically allows other parameters to be freed, which were previously fixed for identification purposes only.  They therefore recommend (a) starting with a test of thresholds, for reasons you can read at the above link, and (b) potentially using a different pair of models for each type of parameters to be tested for equivalence.  Given you have 4 thresholds, I will tell you the practical advice that I took away from reading the article, based on their Figure 1 (p., 1034).  Since Millsap & Yun-Tein's model did not converge for on your data, you can fit a baseline model that is the default in the cfa(..., std.lv = TRUE) function (fix factor variances and means, use either delta or theta parameterization, fix intercepts, free loadings and thresholds).  Then test equivalence of thresholds using a model with the following specification:
  • fix factor variances and means to 1 and 0, respectively, in all groups
  • free loadings in all groups 
  • free intercepts in all but one group (fix to zero in reference group)
  • using delta or theta parameterization, the residual variances in the reference group will be fixed such that the total or residual variance (respectively) will be fixed to 1.  But in all other groups, you can free the latent-response scales or residual variances, respectively.  I would recommend using theta (as Millsap & Yun-Tein's suggested) so you can work with residual variances instead of latent-response scales.
After this, testing additional levels of invariance is straight-forward (because your indicators have > 3 categories).  To the model with equal thresholds, simply add the additional constraint that loadings are equivalent and free factor variances in all but the reference group, then test the nested model.  To that model, you can add the additional constraint that intercepts are equivalent and free factor means in all but the reference group, then test the nested model.  If you are interested in testing strict invariance, to either the metric/weak or scalar/strong model, you can add the additional constraint that residual variances are equal across groups, then test nested models.

If you have only 2 or 3 categories, then not every type of parameter could be tested for equivalence without assuming other measurement parameters are also invariant.

Hope this helps,

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam


Dr. Hans Hansen

unread,
Apr 24, 2017, 5:32:47 PM4/24/17
to lavaan
Thanks a lot, i will try it that way. Fig 1 is confusing, i need to learn the greek alphabet :P

Anyway, i can't really wrap my head around the reason why there are different approaches to configural invariance. Yves wrote me that Millsap and Yun-Teins approach might be actually less constrained than cfa(..., group = "g"). I figured that the nice property of Millsap and Yun-Teins approach might be that the latent factor means can be actually compared with a minimal set of constraints, whereas under the other approaches those have to be fixed to 0,1? Is that true?

I got the request from reviewers to actually compare the impact of imposing constraints over groups on factor scores - is that a sensible thing to do? Wouldn't be the comparison unfair in models where both groups have the 0,1 constraint on the latent variable?

Thanks again, Felix

Terrence Jorgensen

unread,
Apr 25, 2017, 5:47:59 AM4/25/17
to lavaan
Anyway, i can't really wrap my head around the reason why there are different approaches to configural invariance.

It is still an open area of research, but moving relatively slowly in the factor analysis literature.  Much more work on latent trait models with categorical indicators has been (and is being) done in the IRT literature.  The advantages of SEM are great, but having categorical indicators means we not only have latent common factors, but additional latent response variables (one per binary/ordinal indicator).  The latent response variables have the same location/scale-identification issues that common factors have, but it is even more restrictive because there are not multiple indicators of the latent responses -- it is just an assumed normal variable underlying the discretized observed variable.  Anyway, I agree, it is frustratingly complicated.  The Wu & Estabrook (2016) article is a breath of fresh air, but it really just clarified the problem, and the field still needs to work out what the "best practices" should be.  "Watch this space" :-)
 
Yves wrote me that Millsap and Yun-Teins approach might be actually less constrained than cfa(..., group = "g").

The models are statistically equivalent, but M & Y-T wanted to identify the model without assuming equality of common-factor means/variances or residual variances (although they seemed fine with assuming intercepts were equivalent).

I figured that the nice property of Millsap and Yun-Teins approach might be that the latent factor means can be actually compared with a minimal set of constraints, whereas under the other approaches those have to be fixed to 0,1? Is that true?

No, the latent distributions (means & (co)variances) can still only be compared meaningfully across groups if the measurement parameters are constrained to equality (just like with continuous indicators).  So M & Y-T's method seems like an unnecessary exercise in defining a "less restrictive" statistically equivalent model to the standard default cfa(..., group = "g").  

I got the request from reviewers to actually compare the impact of imposing constraints over groups on factor scores - is that a sensible thing to do?  Wouldn't be the comparison unfair in models where both groups have the 0,1 constraint on the latent variable?

Yes, but when you constrain measurement parameters across groups, then you only need to fix the common-factor means/variances to 0/1 in a single group, freely estimating them in the remaining groups.

Terrence Jorgensen

unread,
Apr 25, 2017, 5:52:16 AM4/25/17
to lavaan
reviewers asked me how i handled the problem that loadings and thresholds can't be constrained independently from each other (i have not so far)

That is only the case for binary data (although Wu & Estabrook (2016) point out a limitation with 3-category data, which I forget).  But you have 5 categories, so you can constrain loadings in one step and thresholds in another to test weak and strong invariance separately.

and suggested that MLR is not appropriate for ordinal data. T

FYI, with 5 categories, MLR provides good results (i.e., loading estimates will not be very attenuated), as long as thresholds are not very asymmetric:

Terrence Jorgensen

unread,
Apr 25, 2017, 6:00:46 AM4/25/17
to lavaan
In case it's not entirely clear: i already fail to fit the configural invariant model.

I don't know if it will help, but I just updated the measurementInvariance() functions (including "...Cat" and "long...") to check for convergence of each model.  Now, it will return a list of results even if there was an error, so if any models do converge, then you can at least see the results for those particular models.  You can install the latest development versions of lavaan and semTools (requiring the devtools package) using:

install.packages("lavaan", repos = "http://www.da.ugent.be", type = "source")
devtools
::install_github("simsem/semTools/semTools")

But if the configural model won't even converge, I don't know how likely more restricted models are to converge if they are specified the same way.

Dr. Hans Hansen

unread,
Apr 25, 2017, 7:14:08 AM4/25/17
to lavaan
Thanks so much for your input, that is extremely helpful. It's a bummer that for something that often used in scale validation so little guidance exists - especially for people like me with a limited understanding of mathematics :P


Felix Fischer

unread,
Aug 18, 2017, 4:38:00 AM8/18/17
to lavaan
Hi Terrence,

i need to come back at this - i am confused by the point you make of having intercepts and thresholds at the same time in a model. i am under the impression that you have either thresholds or intercepts, but not both. could you expand on that a bit? 

Best, Felix

Felix Fischer

unread,
Aug 18, 2017, 5:07:57 AM8/18/17
to lavaan
just read in section 9 of the easterbrook paper that mplus sets the intercept to zero by default and one needs a fancy model definition to overwrite this. 

Terrence Jorgensen

unread,
Aug 21, 2017, 6:20:35 PM8/21/17
to lavaan
i am under the impression that you have either thresholds or intercepts, but not both. could you expand on that a bit? 

You typically only estimate one or the other (but that is always not necessary), but they are still both part of the model.  The latent responses are assumed continuous/normal, so they have a mean/intercept, and the observed variables are discrete,  so they have thresholds that correspond to how large the latent response needs to be before the observed response is assigned to a higher category.

Terrence Jorgensen

unread,
Aug 21, 2017, 6:24:30 PM8/21/17
to lavaan
just read in section 9 of the easterbrook paper that mplus sets the intercept to zero by default and one needs a fancy model definition to overwrite this. 

Same in lavaan.  But nothing really fancy to free intercepts.  Just use the intercept operator (variable ~ NA*1):


The threshold operator is specific to which threshold it is for that item:

var1 | NA*t1 # first threshold for item 1
var1
| NA*t2 # second threshold for item 1
var2
| NA*t1 # first threshold for item 2
...

Felix Fischer

unread,
Aug 22, 2017, 8:19:34 AM8/22/17
to lavaan
Thanks so much. 

Have you fitted that stuff in mplus? i have troubles to go from the invariant thresholds & loadings model to the one where i constrain intercepts and free factor means... mplus always imposes intercept = 0 in the first group. Have you gotten around this? Below is the adapted model from Wu, can you spot an error? I also tried to impose model constraints, to force intercept estimation [y1-y4*]... finally, fixing the intercept to an arbitrary value doesn't work. 

lavaan seems to not have a problem with this constraint, but i need a weighted analysis, so i am bound to mplus. probably a question for their support, but i thought you might already ran into this.



TITLE: A model with invariant thresholds and loadings of two
groups,
one common factor, four indicators and five categories
per indicator
DATA: FILE IS data.dat;
VARIABLE:
    NAMES ARE group x1-x4;
    USEVARIABLES ARE x1-x4;
CATEGORICAL ARE x1-x4;    
    GROUPING IS group (0=male 1=female););
ANALYSIS: ESTIMATOR IS WLSMV;
MODEL:
y1 By x1@1;
y2 By x2@1;
y3 By x3@1;
y4 By x4@1;
y1-y4@0;

factor BY y1* y2 y3 y4 (L1-L4);
factor@1;
[factor@0];
{x1-x4@1};
[y1-y4] (int1-int4); 
[x1$1-x1$4*](T1-T4);
[x2$1-x2$4*](T5-T8);
[x3$1-x3$4*](T9-T12);
[x4$1-x4$4*](T13-T16);

MODEL female:
factor BY y1* y2 y3 y4 (L1-L4);
factor*;
[factor@0];
{x1-x4*};
[y1-y4] (int1-int4); 
[x1$1-x1$4*](T1-T4);
[x2$1-x2$4*](T5-T8);
[x3$1-x3$4*](T9-T12);
[x4$1-x4$4*](T13-T16);

Felix Fischer

unread,
Aug 24, 2017, 4:26:55 PM8/24/17
to lavaan
from mplus discussion page:

You must mention them in MODEL male.

MODEL male:

[y1-y4] ;
[x1$1-x1$4*];
[x2$1-x2$4*];
[x3$1-x3$4*];
[x4$1-x4$4*];

Terrence Jorgensen

unread,
Aug 26, 2018, 6:14:19 PM8/26/18
to lavaan
Hi Felix,

If you are still following this thread, you might be interested to know there is a new function in the development version of semTools, which is more general and flexible than the previous invariance-testing functions.  Hope it is helpful.

Reply all
Reply to author
Forward
0 new messages