measEq.syntax (Wu&Estabrook) for binary items

Andres Perez

unread,

Dec 16, 2025, 6:17:31 AM12/16/25

to lavaan

Hi everyone,

I have been working with some (simulated) categorical data for a while and greatly appreciate the developments in the measEq.syntax function, as it has made my life easier when applying the identification constraints from Wu & Estabrook (2016).

There is still one thing that I do not understand, though. In the Wu & Estabrook paper, they mention that it is impossible to test for threshold and loading invariance simultaneously, as it is “statistically equivalent” to the baseline (i.e., configural) model (Section 5.3; Proposition 6). A screenshot of that specific part of the paper is attached to this post. For brevity, I will write “metric invariance” instead of “threshold and loading invariance” from now on.

Looking at the documentation from the semTools package, I see that the measEq.syntax function considers other non-testable non-invariances (e.g., Propositions 4 and 5 of Wu & Estabrook) on page 70:

"For binary data, there is no independent test of threshold, intercept, or residual-variance equality. Equivalence of thresholds must also be assumed for three-category indicators."

However, there is no mention of exactly what happens when testing for metric invariance with binary data. Following what Wu & Estabrook explain, I would expect that a configural model and a model with metric invariance should be “statistically equivalent” (e.g., test statistics; chi-squared). Is this correct?

I was playing around with a simple (simulated) toy data set with two groups. It was generated with equal thresholds and loadings across the groups. I fitted a configural and a metric invariant model and compared them, but I noticed they are not “statistically equivalent,” at least not in terms of chi-squared values.

Am I misinterpreting what Wu & Estabrook stated in their paper? Or is semTools doing something else in the background? I am just trying to understand precisely what Wu & Estabrook meant by their "Proposition 6".

Attached to this post, you can find the toy dataset and the R script where I ran the two models. I am also copying & pasting it here if that is easier to access.

Thank you very much for your help!

Best,

Andres Perez

Code:

# 2025-12-16
# Testing metric invariance with binary data using semTools

# Load libraries
library(lavaan)
library(semTools)

# Load data
load("binary_data.Rdata")

# Define the model
S1 <- '
# factor loadings
F1 =~ x1 + x2 + x3 + x4 + x5
F2 =~ z1 + z2 + z3 + z4 + z5
F3 =~ m1 + m2 + m3 + m4 + m5
F4 =~ y1 + y2 + y3 + y4 + y5
'

######################################
########### CONFIGURAL FIT ###########
######################################
S1.config <- as.character(
semTools::measEq.syntax(configural.model = S1,
dat = binary,
parameterization = "delta",
ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu",
group = "group")
)

fit.config <- cfa(model = S1.config,
data = binary,
group = "group",
ordered = T)

######################################
######## FULLY INVARIANT FIT #########
######################################

S1.inv <- as.character(
semTools::measEq.syntax(configural.model = S1,
dat = binary,
parameterization = "delta",
ordered = T,
ID.fac = "std.lv",
ID.cat = "Wu",
group = "group",
group.equal = c("thresholds", "loadings"))
)

fit.inv <- cfa(model = S1.inv,
data = binary,
group = "group",
ordered = T)

#################################
########### COMPARING ###########
#################################
fit.config
fit.inv

# summary(fit.config)
# summary(fit.inv)

fitmeasures(fit.config , c("chisq")) # chisq 219.339

fitmeasures(fit.inv , c("chisq")) # chisq 255.249

binary_data.Rdata

Screenshot 2025-12-16 104654.png

binary_test.R

Victoria Savalei

unread,

Dec 17, 2025, 4:50:43 PM12/17/25

to lavaan

This paper is not easy to understand. What they are stating in these Propositions are identification conditions -- that is, minimum sets of constraints for a model to be identified. For Proposition 6, I think they are pointing out that for binary data, there is an equivalent parameterization to the default configural model that has equal thresholds and equal loadings. In particular, if all factor loadings are set to 1 (and are thus invariant) and all thresholds are set to 0 (and are thus invariant), but we free all intercepts in all groups (so we have p*G of these estimates) and we free residual variances in all groups (so we have p*G of these estimates), we get a model with equivalent fit (chi-square and df) to the configural model. See Table 4, the line that says T and Lambda.

To see this in lavaan, you would have to manually free the intercepts and residual variances in both groups, which currently have some default constraints applied to them (intercepts zero in at least one group, depending on how loading and thresholds constraints are imposed; residual variances held at 1 under the "theta" parameterization).

So threshold and loadings constraints are not testable with binary data because the configural model can be "rotated" to imply that they are true, unless you make further assumptions on some of the other parameters. When you find differences in lavaan in your two runs, it means that it has already made some other assumptions that make these constraints testable (such as those above). This is all rather esoteric. The biggest applied contribution of Wu & Estabrook is to point out that intercepts of underlying continuous indicators can be free once thresholds have been constrained.

Andres Perez

unread,

Dec 19, 2025, 11:08:13 AM12/19/25

to lavaan

Hi Victoria,

Thank you very much for your response! The paper is indeed not easy to understand and a bit esoteric, but your explanation made things much clearer to me 😊.

After reading your post, I went to check exactly what semTools (specifically, the measEq.syntax function) is doing. For the baseline model, they basically do the same as in Proposition 3 of Wu & Estabrook:

But instead of fixing the thresholds (tau in the paper), semTools fixes the intercepts to 0. In any case, it leads to the same number of parameters. However, for the threshold-and-loading invariant model, they apply different constraints than Wu. They fix the intercepts to 0 in the first group (as you said) and fix the factor variances to 1 in the first group, but allow them to be freely estimated in the remaining groups. I played around with the syntax following Wu's paper and managed to obtain a metric invariant model with the same number of parameters and degrees of freedom as the configural invariant model. The test statistic is still not the same, but I may be doing something wrong. I still need to play around with it more.

Regardless of what semTools and lavaan are doing, may I ask you an extra question? Usually, when we force the loadings to be equal across groups, we allow the factor variances to be estimated parameters (except in group 1). Lavaan and semTools both do this. Wu does the same when we have ordinal items with more than two categories. However, when we have binary items and "invariant" loadings (fixed to 1 across groups), Wu also forces the factor variances to be 1 across all groups:

Is this not a very strong assumption about factor variability? If possible, could you explain to me why this is the case? It probably has to do with some of the previous propositions and transformations, but I am not sure about it.

Again, thank you very much for your help!

Best,

Andres

Reply all

Reply to author

Forward