How to interpret lack of metric invariance

Blair Burnette

unread,

Oct 10, 2018, 8:01:17 PM10/10/18

to lavaan

Hello all,

I'm testing measurement invariance and comparing two groups. I ran the configural model and then weak invariance in lavaan. The chi-square difference test was significant, suggesting I do not have metric invariance. Can anyone direct me to how I might locate the problematic factor(s)? I tried modification indices, but upon doing some reading, I came under the impression that modification indices are more helpful for scalar invariance.

Is the answer in my output, or do I need to write specific code to find it out?

I am new to R, so I'm just bumbling my way through learning these steps. Sorry for the ignorance, but any help is appreciated in helping understand my findings.

Thank you,

Blair

Blair Burnette

unread,

Oct 10, 2018, 8:05:08 PM10/10/18

to lavaan

Perhaps I should paste my code. This is a 1 to 5 scale, so I am running this as ordinal data. I recently added correlated factors as someone told me that lavaan assumes factors are orthogonal.

#setting up latent variables and running general CFA

sataq.model <- ' SATAQMI =~ sataq1 + sataq2 + sataq6 + sataq7 + sataq10

SATAQTI =~ sataq3 + sataq4 + sataq5 + sataq8 + sataq9

SATAQFP =~ sataq11 + sataq12 + sataq13 + sataq14

SATAQPP =~ sataq15 + sataq16 + sataq17 + sataq18

SATAQMP =~ sataq19 + sataq20 + sataq21 + sataq22

SATAQTI ~~ SATAQMI + SATAQFP + SATAQMP + SATAQPP

SATAQMI ~~ SATAQFP + SATAQPP + SATAQMP

SATAQFP ~~ SATAQPP + SATAQMP

SATAQPP ~~ SATAQMP'

fit<- cfa(sataq.model, data=sataq4, ordered = c("sataq1", "sataq2", "sataq3", "sataq4", "sataq5", "sataq6", "sataq7", "sataq8", "sataq9", "sataq10", "sataq11", "sataq12", "sataq13", "sataq14", "sataq15", "sataq16", "sataq17", "sataq18", "sataq19", "sataq20", "sataq21", "sataq22"))

summary(fit, standardized=TRUE, rsq=TRUE, fit.measures=TRUE)

#run CFA for both groups, using ordinal data

config <- cfa(sataq.model, data=sataq4, group="racecat", ordered = c("sataq1", "sataq2", "sataq3", "sataq4", "sataq5", "sataq6", "sataq7", "sataq8", "sataq9", "sataq10", "sataq11", "sataq12", "sataq13", "sataq14", "sataq15", "sataq16", "sataq17", "sataq18", "sataq19", "sataq20", "sataq21", "sataq22"))

summary(config, standardized=TRUE, rsq=TRUE, fit.measures=TRUE)

#weak invariance

weak.invariance <- cfa(sataq.model, data=sataq4, group="racecat", group.equal = c("loadings"), ordered = c("sataq1", "sataq2", "sataq3", "sataq4", "sataq5", "sataq6", "sataq7", "sataq8", "sataq9", "sataq10", "sataq11", "sataq12", "sataq13", "sataq14", "sataq15", "sataq16", "sataq17", "sataq18", "sataq19", "sataq20", "sataq21", "sataq22"))

summary(weak.invariance, standardized=TRUE, rsq=TRUE, fit.measures=TRUE)

#chi-square difference test

anova(config, weak.invariance)

#modification indices

modindices(weak.invariance)

Terrence Jorgensen

unread,

Oct 11, 2018, 6:12:54 AM10/11/18

to lavaan

I came under the impression that modification indices are more helpful for scalar invariance.

No, the problem is that modification indices are only available for fixed parameters, not for estimated parameters that are constrained in some way (here. to equality across groups). Instead, use the more general score-test function.

https://groups.google.com/d/msg/lavaan/fzIg5jrxIAI/46AvpjKfAQAJ

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Blair Burnette

unread,

Oct 11, 2018, 4:49:50 PM10/11/18

to lav...@googlegroups.com

Thank you, the thread you posted was enormously helpful. I printed my parameter table and ran the lavTestScore. I did not specify specific constraints to be freed on this first pass. It seems to me that parameter 9 is clearly the most problematic, although 14, 16, and 8 and 10 are also higher than others.

I am unsure what these acronyms mean, including lhs and rhs. Do you know of any resources for interpreting this output?

I am looking at a widely-used measure in my field. It was normed on an overwhelmingly White female sample, but it is used widely with racially diverse women. I am looking at its invariance across Black and White women. I hypothesized that we would not achieve metric invariance. The constructs measured, particularly one subscale, do not have the same salience in Black women as White women. However, I'm so new to R and this is my first invariance analysis, so I just feel lost tracking down the source of non-invariance and being able to interpret it. Any help is appreciated! Sorry to need so much hand-holding, but eager to learn! :)

total score test:

test X2 df p.value

1 score 96.103 17 0

$uni

univariate score tests:

lhs op rhs X2 df p.value

1 .p2. == .p198. 0.024 1 0.876

2 .p3. == .p199. 0.369 1 0.543

3 .p4. == .p200. 5.275 1 0.022

4 .p5. == .p201. 0.756 1 0.385

5 .p7. == .p203. 0.818 1 0.366

6 .p8. == .p204. 7.558 1 0.006

7 .p9. == .p205. 40.175 1 0.000

8 .p10. == .p206. 6.746 1 0.009

9 .p12. == .p208. 2.714 1 0.099

10 .p13. == .p209. 4.530 1 0.033

11 .p14. == .p210. 16.196 1 0.000

12 .p16. == .p212. 14.424 1 0.000

13 .p17. == .p213. 0.141 1 0.707

14 .p18. == .p214. 1.105 1 0.293

15 .p20. == .p216. 1.476 1 0.224

16 .p21. == .p217. 0.223 1 0.636

17 .p22. == .p218. 1.948 1 0.163

$cumulative

cumulative score tests:

lhs op rhs X2 df p.value

1 .p2. == .p198. 40.175 1 0

2 .p3. == .p199. 56.330 2 0

3 .p4. == .p200. 70.779 3 0

4 .p5. == .p201. 70.970 4 0

5 .p7. == .p203. 72.910 5 0

6 .p8. == .p204. 77.934 6 0

7 .p9. == .p205. 83.624 7 0

8 .p10. == .p206. 86.187 8 0

9 .p12. == .p208. 88.172 9 0

10 .p13. == .p209. 89.675 10 0

11 .p14. == .p210. 91.142 11 0

12 .p16. == .p212. 92.353 12 0

13 .p17. == .p213. 92.354 13 0

14 .p18. == .p214. 92.354 14 0

15 .p20. == .p216. 92.673 15 0

16 .p21. == .p217. 95.851 16 0

17 .p22. == .p218. 96.103 17 0

$epc

expected parameter changes (epc) and expected parameter values (epv):

lhs op rhs group free label plabel est epc epv

1 SATAQMI =~ sataq1 1 0 .p1. NA NA NA

2 SATAQMI =~ sataq2 1 1 .p2. .p2. 1.004 -0.007 0.997

3 SATAQMI =~ sataq6 1 2 .p3. .p3. 1.083 -0.003 1.080

4 SATAQMI =~ sataq7 1 3 .p4. .p4. 1.104 -0.025 1.079

5 SATAQMI =~ sataq10 1 4 .p5. .p5. 1.059 -0.005 1.055

6 SATAQTI =~ sataq3 1 0 .p6. NA NA NA

7 SATAQTI =~ sataq4 1 5 .p7. .p7. 0.603 -0.016 0.587

8 SATAQTI =~ sataq5 1 6 .p8. .p8. 1.227 0.007 1.234

9 SATAQTI =~ sataq8 1 7 .p9. .p9. 1.010 -0.093 0.917

10 SATAQTI =~ sataq9 1 8 .p10. .p10. 1.127 0.014 1.141

11 SATAQFP =~ sataq11 1 0 .p11. NA NA NA

12 SATAQFP =~ sataq12 1 9 .p12. .p12. 0.984 -0.020 0.964

13 SATAQFP =~ sataq13 1 10 .p13. .p13. 0.988 -0.028 0.961

14 SATAQFP =~ sataq14 1 11 .p14. .p14. 0.870 -0.044 0.826

15 SATAQPP =~ sataq15 1 0 .p15. NA NA NA

16 SATAQPP =~ sataq16 1 12 .p16. .p16. 1.117 0.041 1.158

17 SATAQPP =~ sataq17 1 13 .p17. .p17. 1.113 0.020 1.133

18 SATAQPP =~ sataq18 1 14 .p18. .p18. 1.105 0.023 1.128

19 SATAQMP =~ sataq19 1 0 .p19. NA NA NA

20 SATAQMP =~ sataq20 1 15 .p20. .p20. 1.064 0.005 1.069

21 SATAQMP =~ sataq21 1 16 .p21. .p21. 1.031 0.002 1.033

22 SATAQMP =~ sataq22 1 17 .p22. .p22. 1.047 -0.003 1.044

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/zR1oMys4yII/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,

Oct 18, 2018, 6:42:00 AM10/18/18

to lavaan

I am unsure what these acronyms mean, including lhs and rhs. Do you know of any resources for interpreting this output?

Look at your parameter table. The last entries are these equality constraints, and you can look for the parameter labels in the "plabel" column to see which parameters they refer to. If only the loadings are constrained in this model, these should correspond to the equality constraints of the loadings in the order they appear in your syntax (except for the first loadings, which are fixed to 1 and thus assumed to be equal).

I am looking at a widely-used measure in my field. It was normed on an overwhelmingly White female sample, but it is used widely with racially diverse women. I am looking at its invariance across Black and White women. I hypothesized that we would not achieve metric invariance. The constructs measured, particularly one subscale, do not have the same salience in Black women as White women.

Then I would NOT recommend assuming that the first indicator of each construct is invariant. Set std.lv=TRUE, and free the second group's latent variances when you constrain loadings, e.g., "SATAQTI ~~ c(1, NA)*SATAQTI". (Likewise, free the second group's latent means when you constrain intercepts: "SATAQTI ~~ c(0, NA)*1". ONLY constrain intercepts for indicators whose loadings are equivalent.) That way, you can check whether any of the loadings differ, including the first indicators' loadings.

However, I'm so new to R and this is my first invariance analysis, so I just feel lost tracking down the source of non-invariance and being able to interpret it. Any help is appreciated! Sorry to need so much hand-holding, but eager to learn! :)

The $uni table contains chi-squared test statistics that are asymptotically (i.e., as N approaches infinity) equivalent to the likelihood ratio (chi-squared difference) test between nested models with and without that constraint. But this function does not provide a robust test statistic, which you need with DWLS estimation. So to get a test statistic whose p value you can actually trust, you would need to actually fit a model with that constrain released and use anova() or lavTestLRT() to compare the models, so that the test statistic will be robust.

Actually, even if you had multivariate normal data, one invalid constraint will bias tests of the other constraints, so the Type I error rate would be inflated if you have more than one invalid constraint in the weak-invariance model. So it would still be advisable to free one constraint at a time (the one with the largest X2 statistic), and compare that model to the configural model to see if that level of partial invariance is tenable.

Indeed, equality constraint ".p9. == .p205." has the highest test statistic, so try freeing that one first. If lavTestLRT() says that the partial weak-invariance model still fits worse than the configual model, then run lavTestScore() on that partial model to find the constraint with the highest X2, and repeat the process until either

the configural and partial invariance models have similar fit, or
you no longer have at least 2 invariant indicators per factor. If you only have 1 invariant indicator, it is equivalent to the configural model (for that factor), and you can no longer test or assume that even that indicator is invariant.

Good luck,

Blair Burnette

unread,

Aug 6, 2019, 1:56:41 PM8/6/19

to lav...@googlegroups.com

Hi,

Thanks so much for your help. Your feedback helped us greatly as we developed this manuscript. We submitted it for publication and are now working on an R&R. One of our reviewers is curious about our decision to not choose a referent indicator. I took your suggestion and, rather than choose an RI, freed the second group's latent variances when constraining loadings. We found two factor loadings that were not invariant between groups. We freed those loadings (one at a time-our LRT was still significant after freeing the first) and then tested scalar invariance (constraining thresholds). As you suggested, we freed those two parameters and the second group's latent means. I attempted to explain how this worked to the Reviewer, but s/he is requesting a citation. I have found a lot of literature on the problems associated with RIs and how incorrectly choosing an RI can have serious implications for results and conclusions. However, I am wondering if you have a reference for the steps you suggested? Most of the papers I am finding are almost a decade old, and still suggest the cumbersome steps recommended by Rensvold and Cheung.

Thanks again for your help!

Blair

Terrence Jorgensen

unread,

Aug 7, 2019, 7:25:36 AM8/7/19

to lavaan

I have found a lot of literature on the problems associated with RIs and how incorrectly choosing an RI can have serious implications for results and conclusions. However, I am wondering if you have a reference for the steps you suggested? Most of the papers I am finding are almost a decade old

Nothing wrong with a decade old, but here is one published this year:

https://doi.org/10.1080/00273171.2018.1514484

See Footnote 7 for an explanation and many citations you could use. In fact, Footnote 7 would serve as a good response to the reviewer.

Reply all

Reply to author

Forward