categorial measurement invariance

Aiden

unread,

Mar 29, 2022, 7:22:37 PM3/29/22

to lavaan

Hello,

I am running a categorical measurement invariance using the Wu and Estabrook (2016) approach with delta parameterisation. I am using semTools::meaEq.syntax() to help me (amazing to have this function, thank you!) Though I would like to have some clarity on the steps, particularly the scalar invariance step. The way I have done it is as follows.

Configural invariance

Threshold Invariance

metric invariance (threshold + loading)

Scalar invariance (threshold + loading + intercept).

My understanding is that there are no intercepts here since we assume that the responses are ordinal in nature and that's why we use threshold invariance to constrain the latent response distribution. However, to estimate the factor means and to compare it between groups, I would still need to constrain the intercepts.

From the output of the metric invariance, the intercepts for the reference group were not estimated but the intercepts in group 2 were estimated. Why were they estimated in group 2 and not in group 1 and what are they exactly?

In the scalar invariance, the estimated intercepts for both groups are set to 0. Does that mean that given the the intercepts are set to 0, the factor means of group 2 can be compared against that of group 1?

Finally, in terms of partial MI, would it still be appropriate to use lavaan::lavTestScore() to identify potential parameters to allow them to freely estimate in each nested model?

I would appreciate any advice.

Kind regards,

Aiden

Terrence Jorgensen

unread,

Mar 30, 2022, 5:33:44 AM3/30/22

to lavaan

My understanding is that there are no intercepts here since we assume that the responses are ordinal

The latent responses still have intercepts. They are just fixed to zero by default as an arbitrary identification constraint. That constraint can be freed once sufficient constraints are placed on thresholds, which is what measEq.syntax() will do in your threshold-invariance model.

Note that if an indicator has only 2 thresholds, they are traded to estimate the intercept and (residual or marginal) variance, so no df are gained. If all indicators have only 2 thresholds, then threshold invariance is statistically equivalent to configural invariance, so it can only be assumed, not tested.

If an indicator has only 1 threshold, you cannot distinguish threshold, loading, and intercept invariance separately, so constrain them all at once to test scalar invariance relative to configural.

From the output of the metric invariance, the intercepts for the reference group were not estimated but the intercepts in group 2 were estimated. Why were they estimated in group 2 and not in group 1 and what are they exactly?

They are arbitrarily fixed to zero in group 1 for identification. They could be estimated if we instead fixed one threshold to an arbitrary value (e.g., 0), but the solutions would be statistically equivalent. The threshold equivalence does allow them to be compared across groups; although the absolute value of the differences is arbitrary, the difference relative to the (estimated or model-implied) marginal variances is proportionally equivalent across different identification choices.

In the scalar invariance, the estimated intercepts for both groups are set to 0. Does that mean that given the the intercepts are set to 0, the factor means of group 2 can be compared against that of group 1?

If group 2 is the same as group 1, but group 1's intercepts are fixed, then the way to test equivalence is to fix group 2's to the same value.

Same principle holds for residual variances or scaling factors, which you might not be investigating.

in terms of partial MI, would it still be appropriate to use lavaan::lavTestScore() to identify potential parameters to allow them to freely estimate in each nested model?

I'm pretty sure yes, but how depends what type of parameter.

If you are testing whether loadings have DIF, you would use the release= argument by specifying the equality constraint(s) of interest.
If you are testing whether intercepts have DIF, you would use the add= argument to specify which intercept you want to free (in group 2 only!), e.g., item1 ~ c(0, NA)*1

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Message has been deleted

Aiden

unread,

Mar 30, 2022, 7:16:04 PM3/30/22

to lavaan

Hi Terrence,

Thank you for the detailed explanation. Sorry for not including the details of my variables. I have 5 response options, so 4 thresholds. In that case, configured, thresh, metric and scalar will have different number of dfs. I attempted the lavTestScore() to conduct MI since the nested models between metric and scalar were significantly different. However, the output of the lavTestScore() function does not present the X2 values for the intercepts, only for the threshold and loadings. I guess that is expected since the intercepts are constrained to 0. In that case, how would you know which intercept(s) to release in group 2 within the CFA model to allow them to be freely estimated?

Kind regards,

Aiden

unread,

May 3, 2022, 8:28:45 PM5/3/22

to lavaan

Hello,

Just following from the thread above.

I wonder if anyone could provide an advice on how best to decide which item intercept(s) to free in group 2? Given that the intercepts are constrained to 0, then we can't really use the lavTestScore() function. In that case, what would then be the appropriate way to identify possible item intercepts that may have DIF?

Kind regards,

Aiden

Terrence Jorgensen

unread,

May 4, 2022, 12:23:07 PM5/4/22

to lavaan

Given that the intercepts are constrained to 0, then we can't really use the lavTestScore() function. In that case

Does it not work to add= the intercept estimates in group 2?

lavTestScore(fit, add = "item1 ~ c(0, NA)*1")

what would then be the appropriate way to identify possible item intercepts that may have DIF?

You can just fit a set of models with 1 intercept freed, then obtain the LRT stat (asymptotically equivalent to the Score test when H0 is true). That's a lot of models to fit, but if the score test doesn't work, then "them's the breaks."

Aiden

unread,

May 5, 2022, 6:51:18 PM5/5/22

to lavaan

Thanks Terrence. It doesn't seem to work. It returns empty row. I've created a reproducible code using the bfi dataset from psych package.

require(psych)
require(semTools)
data(bfi)

bfi.model <- '
A =~ A1 + A2 + A3 + A4 + A5
C =~ C1 + C2 + C3 + C4 + C5
E =~ E1 + E2 + E3 + E4 + E5
N =~ N1 + N2 + N3 + N4 + N5
O =~ O1 + O2 + O3 + O4 + O5
'

syntax.scalar <- measEq.syntax(configural.model = bfi.model,
data = bfi,
ordered = c('A1', 'A2', 'A3', 'A4', 'A5', 'C1', 'C2', 'C3', 'C4', 'C5', 'E1', 'E2', 'E3',
'E4', 'E5', 'N1', 'N2', 'N3', 'N4', 'N5', 'O1', 'O2', 'O3', 'O4', 'O5'),
parameterization = "delta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "gender",
group.equal = c("thresholds",
"loadings",
"intercepts"))

model.scalar<- as.character(syntax.scalar)
cat(model.scalar)

fit.scalar <- cfa(model.scalar, data = bfi, group = "gender",
ordered = c('A1', 'A2', 'A3', 'A4', 'A5', 'C1', 'C2', 'C3', 'C4', 'C5', 'E1', 'E2', 'E3',
'E4', 'E5', 'N1', 'N2', 'N3', 'N4', 'N5', 'O1', 'O2', 'O3', 'O4', 'O5'))

summary(fit.scalar)

# test 2: the score test for adding two (currently fixed
# to zero) cross-loadings
newpar = '
A1 ~ c(0, NA)*1
'
# Return 0 rows

lavTestScore(fit.scalar, add = newpar)

I would appreciate any advice.

Kind regards,

Aiden

Terrence Jorgensen

unread,

May 6, 2022, 12:45:24 PM5/6/22

to lavaan

It doesn't seem to work

Then you will need to fit partial invariance models to obtain LRTs.

Message has been deleted

Aiden

unread,

May 6, 2022, 5:45:46 PM5/6/22

to lavaan

ah okay. I see what you mean by

'You can just fit a set of models with 1 intercept freed, then obtain the LRT stat (asymptotically equivalent to the Score test when H0 is true).'

Using the example from the lavTestLRT() function below. is it the case that I always have two models?

For example, in the first set of model, it includes:

One model where all intercepts are fully constrained and a second model where the intercept for item 1 is freed.

Then the next set of model, it includes:

One model where all intercepts are fully constrained and a second model where the intercept for item 2 is freed.

Once I have many sets of models, I can rank them by the size of the chi-square difference test, which will allow me to determine which set of model has the highest chi-square difference. Is that about right?

Kind regards,

Aiden

Terrence Jorgensen

unread,

May 10, 2022, 1:13:45 PM5/10/22

to lavaan

is it the case that I always have two models?

For example, in the first set of model, it includes:
One model where all intercepts are fully constrained and a second model where the intercept for item 1 is freed.

Then the next set of model, it includes:
One model where all intercepts are fully constrained and a second model where the intercept for item 2 is freed.

Once I have many sets of models, I can rank them by the size of the chi-square difference test, which will allow me to determine which set of model has the highest chi-square difference. Is that about right?

Correct.

Reply all

Reply to author

Forward