Interpretation lavTestscore() and partable() functions for partial measurement invariance

Amonet

unread,

May 12, 2018, 5:39:07 PM5/12/18

to lavaan

Hi,

Some context: I am doing a measurement invariance analysis and noticed that the strong invariant model is not fully supported by the chi-square difference test (p =.025), relative to the weak invariant model. Note that my sample size is quite small (roughly 120 individuals measured at 4 time points), so the test might lack power and thus even be on the large side here. Although the difference in CFI is <.01 (i.e. 0.0072), and the AIC actually improves (i.e. a decrease of 0.72 in the more restricted (strong) model), I thought it would be good to be on the safe side and release some of the imposed constraints of the strong invariant model. To this end I am using the lavTestscore() function and what I understand from here, is that the lavTestscore applies a multivariate score test (or Lagrange multiplier test) and that if this turns out non-significant, none of the constraints should be released. This is sort of 'problematic' for me, as the objective is to study the latent means over time.

In my analysis this is the case: the multivariate score test is non-significant (p = .42), while the strong model may not be preferred over the weak model (see above).

However, there are some univariate score tests that are borderline significant (e.g., p =.048). Am I conducting a 'proper statistical analysis' if I would turn to these univariate tests, while the multivariate test indicates non-significance? In the end, here I might lack power also due to small sample size... this time "not in my favour".

With this, I have a question on how to interpret the output. Please see the picture of the output below (red marks indicate the 3 borderline significant tests):

Now, when I use partable() to find out with which constraints these 3 tests are associated with, I don't understand what they represent:

\

I didn't name anything .p##. myself, so I guess it's done internally.

Lastly, I had thought that if the test would fail, that at least some intercepts (that I imposed to be equal over time) would pop up in the univariate tests. After all, that's the only thing that was changed between the weak and strong model. Where could I go from here to potentially show partial strong invariance?

Any help will be appreciated.

Thank you,

Amonet

Terrence Jorgensen

unread,

May 13, 2018, 2:44:05 PM5/13/18

to lavaan

In my analysis this is the case: the multivariate score test is non-significant (p = .42), while the strong model may not be preferred over the weak model (see above).

The multivariate score test is not comparing your weak and strong models. It is testing all the constraints listed in the univariate tests output, which include both loadings and intercepts, so it is closer to comparing your configural and strong models. But you already performed the omnibus test using a likelihood ratio test, so you don't need to pay attention to the multivariate score test (unless you have a hypothesis about freeing multiple particular constraints, in which case you should run lavTestScore() and specify only those constraints). Just look at the univariate follow-up tests.

However, there are some univariate score tests that are borderline significant (e.g., p =.048). Am I conducting a 'proper statistical analysis' if I would turn to these univariate tests, while the multivariate test indicates non-significance?

No, but you did not specify only the intercept constraints when you run lavTestScore(). To get the correct omnibus test, you can run (assuming the intercept constraints are in rows 18 through 32 of the univariate tests):

lavTestScore(fit, release = 18:32)

But again, you already tested the omnibus null hypothesis that all intercept-constraints hold in the population. The score test is asymptotically equivalent to the LRT you already performed by comparing the weak and strong models.

https://doi.org/10.1080/00031305.1982.10482817

Also, the "proper" type of analysis you are hoping to perform, if you use alpha = 5% for your omnibus test, then conduct several follow-up tests, you should adjust for the number of tests to control the familywise Type I error rate (e.g., a Bonferroni adjustment, dividing alpha by the number of intercept-contraints you test following your LRT).

Now, when I use partable() to find out with which constraints these 3 tests are associated with, I don't understand what they represent: I didn't name anything .p##. myself, so I guess it's done internally.

Yes, those are the labels lavaan creates, using the parameter number in the "id" column. .p1. == .p6. means those factor loadings are constrained to equality (evidenced by the .p1. label also appearing in the "label" column for each loading that is constrained to be equal to it).

So when you see the univariate score test labeled .p1. == .p6., that is a test of that particular constraint.

Lastly, I had thought that if the test would fail, that at least some intercepts (that I imposed to be equal over time) would pop up in the univariate tests. After all, that's the only thing that was changed between the weak and strong model. Where could I go from here to potentially show partial strong invariance?

Well, the only 2 significant univariate tests are both associated with the intercept labeled .p57., although neither of them would be significant if you took steps to control your Type I error rate. To perform fewer tests (and have more power), you could run only 1 post-hoc test of that item's intercepts:

lavTestScore(fit, release = 18:20)

If that particular 3-df post-hoc test is significant, release those constraints and compare the partial-strong model to the weak model using a LRT.

Also see my post on using EPCs and EPC-interest (e.g., how much do latent means differ if those 3 constraints are freed?) as an effect-size/impact criterion rather than statistical significance: https://groups.google.com/d/msg/lavaan/fzIg5jrxIAI/xoq5l803AAAJ

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

sg

unread,

May 19, 2018, 9:28:02 AM5/19/18

to lavaan

Hello I need help with understanding my output for measurement invariance

Below is the input could you pelase guide where the things are wrong

thank you

Chi Square Difference Test

Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)

fit.configural 6 2660.6 2818.8 9.5528

fit.loadings 12 2651.7 2783.6 12.7212 3.1684 6 0.7874

fit.intercepts 18 2641.5 2747.0 14.5084 1.7872 6 0.9382

fit.means 20 2641.0 2737.8 18.0140 3.5056 2 0.1733

Fit measures:

cfi rmsea cfi.delta rmsea.delta

fit.configural 0.978 0.054 NA NA

fit.loadings 0.996 0.017 0.017 0.037

fit.intercepts 1.000 0.000 0.004 0.017

fit.means 1.000 0.000 0.000 0.000

Terrence Jorgensen

unread,

May 22, 2018, 4:08:54 AM5/22/18

to lavaan

Hello I need help with understanding my output for measurement invariance

The far-right column contains the p values for the chi-squared difference tests comparing each sequence of more-restricted models. None of the comparisons are significant, so you cannot reject the null hypothesis of invariance of any of the parameters tested.

Saima Ghazal

unread,

May 22, 2018, 6:26:12 PM5/22/18

to lav...@googlegroups.com

Thank you Terrence,

I am concerned with fit indices? Like why how CFI is 1.00 and rmsea is 0.00. I was wondering if i have done something wrong

Actually, these are results from three parallel test forms, where i want to establish measurement in-variance. I have same sample of n=200 and they have taken the three parallel test forms.

I considered 'Form type' as group (so categorical variable is form A, B, and C).

I appreciate your help in interpretation and justification of these values (is it due to small df?)

Thank you so much!

Here is output again

Fit measures:

cfi rmsea cfi.delta rmsea.delta

fit.configural 0.978 0.054 NA NA

fit.loadings 0.996 0.017 0.017 0.037

fit.intercepts 1.000 0.000 0.004 0.017

fit.means 1.000 0.000 0.000 0.000

Chi Square Difference Test

Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)

fit.configural 6 2660.6 2818.8 9.5528

fit.loadings 12 2651.7 2783.6 12.7212 3.1684 6 0.7874

fit.intercepts 18 2641.5 2747.0 14.5084 1.7872 6 0.9382

fit.means 20 2641.0 2737.8 18.0140 3.5056 2 0.1733

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

--

Saima Ghazal, PhD, Michigan Technological University, USA

Assistant Professor, Institute of Applied Psychology

University of the Punjab, Lahore, Pakistan

http://www.riskliteracy.org/contact/

https://sites.google.com/site/descidelab/

https://www.linkedin.com/in/saima-ghazal-77b62629?trk=hp-identity-name

http://faculty.saima-ghazal.pu.edu.pk/

Amonet

unread,

May 22, 2018, 6:30:28 PM5/22/18

to lavaan

Hi Terrence,

Thanks for helping me out.

Also, the "proper" type of analysis you are hoping to perform, if you use alpha = 5% for your omnibus test, then conduct several follow-up tests, you should adjust for the number of tests to control the familywise Type I error rate (e.g., a Bonferroni adjustment, dividing alpha by the number of intercept-contraints you test following your LRT).

Thus, if I would test to release all intercepts constraints of one item (i.e. 3 in total), then p < (0.05/3) is needed?

By the way, shouldn't I also correct for the multiple chi-square tests that I perform to test for measurement invariance? (E.g., configural -> weak -> strong would be 2 tests). If so, does the P-value then need to be greater than P = (0.05*2)?

Lastly, I had thought that if the test would fail, that at least some intercepts (that I imposed to be equal over time) would pop up in the univariate tests. After all, that's the only thing that was changed between the weak and strong model. Where could I go from here to potentially show partial strong invariance?

Well, the only 2 significant univariate tests are both associated with the intercept labeled .p57., although neither of them would be significant if you took steps to control your Type I error rate. To perform fewer tests (and have more power), you could run only 1 post-hoc test of that item's intercepts:

lavTestScore(fit, release = 18:20)

If that particular 3-df post-hoc test is significant, release those constraints and compare the partial-strong model to the weak model using a LRT.

Shouldn't I also account for the first tests I did? That is, LavTestScore(fit) to identify which constraints needed to be released? Or how else should I know that I should release the intercepts associated with parameters 18:20, i.e. lavTestScore(fit, release = 18:20)?

Also, should I in general always release all or none of the intercepts associated with 1 item? Or does it make sense to only release some of its intercepts? Here, since parameter 18 and 20 were significant (univariate score tests), I would then test only to release those 2.

Either way the tests turns out insignificant: p = 0.09 for releasing 18:20 and p = 0.054 for releasing 18 and 20. But let's say it was significant and I would follow your advice and release the intercepts to test for partial measurement invariance afterwards. I'd think this latter test (chi-square test is what I would use), should also be corrected again, right?

Kind regards,

Amonet

Terrence Jorgensen

unread,

Jun 2, 2018, 1:46:28 PM6/2/18

to lavaan

Thus, if I would test to release all intercepts constraints of one item (i.e. 3 in total), then p < (0.05/3) is needed?

Yes

By the way, shouldn't I also correct for the multiple chi-square tests that I perform to test for measurement invariance? (E.g., configural -> weak -> strong would be 2 tests). If so, does the P-value then need to be greater than P = (0.05*2)?

It depends what error rate you are trying to control, just like in the exploratory ANOVA paradigm. To control experimentwise error (super conservative), you would account for all tests in your report. It is more common to control the familywise error rate, and I would consider each type of measure parameter to be a family of tests. So equivalence of intercepts would be one null hypothesis at alpha = 5% or whatever, with associated an associated family of post-hoc tests of 3 individual intercepts at alpha = 5% / 3.

Shouldn't I also account for the first tests I did? That is, LavTestScore(fit) to identify which constraints needed to be released? Or how else should I know that I should release the intercepts associated with parameters 18:20, i.e. lavTestScore(fit, release = 18:20)?

Right, that is the omnibus test at alpha = 5%. If it is significant, you find out why by partitioning the omnibus null into smaller components (post-hoc tests).

Saima Ghazal

unread,

Jun 25, 2018, 9:21:38 PM6/25/18

to lav...@googlegroups.com

Thank you Terrence,

I am concerned with fit indices? Like why how CFI is 1.00 and rmsea is 0.00. I was wondering if i have done something wrong

Actually, these are results from three parallel test forms, where i want to establish measurement in-variance. I have same sample of n=200 and they have taken the three parallel test forms.

I considered 'Form type' as group (so categorical variable is form A, B, and C).

Below is the input could you pelase guide where the things are wrong

Chi Square Difference Test

Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)

fit.configural 6 2660.6 2818.8 9.5528

fit.loadings 12 2651.7 2783.6 12.7212 3.1684 6 0.7874

fit.intercepts 18 2641.5 2747.0 14.5084 1.7872 6 0.9382

fit.means 20 2641.0 2737.8 18.0140 3.5056 2 0.1733

Fit measures:

cfi rmsea cfi.delta rmsea.delta

fit.configural 0.978 0.054 NA NA

fit.loadings 0.996 0.017 0.017 0.037

fit.intercepts 1.000 0.000 0.004 0.017

fit.means 1.000 0.000 0.000 0.000

How should i report the results, saying that Measurement In-variance holds across groups?

I appreciate your help in interpretation and justification of these values (is it due to small df?)

Thank you so much!

Here is output again

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,

Jun 29, 2018, 2:44:57 PM6/29/18

to lavaan

I am concerned with fit indices? Like why how CFI is 1.00 and rmsea is 0.00

If you study the formulas for these fit indices, you can see that this happens whenever your model's chi-squared is less than its df, which is the case for the configural- and loadings-invariant models.

I have same sample of n=200 and they have taken the three parallel test forms. I considered 'Form type' as group (so categorical variable is form A, B, and C).

This sounds like the same 200 people took each of the 3 parallel tests. That would constitute repeated measures, not independent groups, so you should test "longitudinal" invariance, not multiple-group invariance.

But if they are parallel tests, does that mean the indicators in each test are actually different variables (parallel, but not the same)? Unless each item has a parallel item in each test (i.e., the question 1 in test 1 is parallel / analogous to question 1 in test 2), then I don't see how you could test invariance across tests.

Saima Ghazal

unread,

Jul 2, 2018, 2:25:44 AM7/2/18

to lav...@googlegroups.com

Thank you so much Terrence, your help is much cheaper appreciated!

Best,

saima

--

You received this message because you are subscribed to the Google Groups "lavaan" group.

To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Saima Ghazal

unread,

Jul 2, 2018, 2:27:48 AM7/2/18

to lav...@googlegroups.com

Thank you so much Terrence, your help is much appreciated! (I am sorry for auto correct/prediction option in last email, my appology)

Best,

saima

On Fri, Jun 29, 2018, 9:44 AM Terrence Jorgensen <tjorge...@gmail.com> wrote:

--

You received this message because you are subscribed to the Google Groups "lavaan" group.

To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Reply all

Reply to author

Forward