Testing measurement/structural invariance

637 views
Skip to first unread message

Tino

unread,
Oct 22, 2015, 7:18:15 PM10/22/15
to lavaan

Hello,


thanks a lot for this great pakage! I am kindly asking for help concerning two quetions.

I was testing for measurement/structural invariance and aim at comparison of latent means between 4 groups and a test for moderation. My model is specified like this:

 

#SEMfinal

Model1  <- '

#define latent variables

Latent1 =~ x1+x2+x3

Latent2 =~ x4+x5+x6

Latent3 =~ x7+x8

Latent4 =~ x9+x10

#define structural relations

Latent1 ~ Latent2

Latent3 ~ Latent1 + Latent2

Latent4 ~ Latent1 + Latent2 + Latent3

x2 ~~ x3

'

Because most indicator variables are ordinal (Likert 4-scale) I proceed like this:

fit Model1 <- sem(Model1, data=Data1, ordered=c("x1" , "x2" , "x3" , "x4" , "x5" , "x6" , "x7"))

 

These are the results for the measurement/structural invariance procedure:

 

Number of observations per group        

  3                                               2353        2605

  2                                               2422        2721

  4                                               3248        3524

  1                                               1194        1350

 

                                   chisq                df                    cfi                    rmsea               group.equal=

1. Configural                764.749            112                  0.992                0.050

2. Weak                      919.505            130                  0.990                0.051                ("loadings")

3. Strong                     1.055.989         172                  0.989                0.047                ("loadings","intercepts")

4. Mean                       1.192.819         184                  0.988                0.049               ("loadings","intercepts","means")

5. Structural                 1.537.189         202                  0.984                0.054               ("loadings","intercepts","means","regressions")

 

To evaluate invariance I would primarily look at cfi values, because of the dwls estimator and the rather large sample size. I would use cfi cutoff value from Meade et al. (2008) of .002. As I said before I aim at the comparison of latent means and a test for moderation. This leads me to the following issues:


1.     1. Is it defensible to report latent mean differences, even there is no decrease of the model fit (cfi in particular) between model 3 (Strong) and model 4 (Mean)?

2.     2. Is it defensible to assume a moderation effect, because of the decrease of cfi between model 4 (Mean) and Model 5 (Structural) concerning Meade`s (2008) cfi cutoff value?

 

I look forward to your answers and will be pleased to provide you with more detailed information about the model.


Thank you!

 

Reference:

Meade, A.W., Johnson, E.C., & Braddy, P.W. (2008): Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568-592.

Message has been deleted

Tino

unread,
Dec 4, 2015, 9:41:35 PM12/4/15
to lavaan
Many thanks!


Am Freitag, 23. Oktober 2015 10:22:58 UTC+2 schrieb Terrence Jorgensen:

Model1  <- '

#define latent variables

Latent1 =~ x1+x2+x3

Latent2 =~ x4+x5+x6

Latent3 =~ x7+x8

Latent4 =~ x9+x10

#define structural relations

Latent1 ~ Latent2

Latent3 ~ Latent1 + Latent2

Latent4 ~ Latent1 + Latent2 + Latent3

x2 ~~ x3

'


I know this isn't your question, but... Your Latent1 factor is already just-identified with only 3 indicators.  Adding a residual correlation between two of the indicators might make it empirically under-identified.  I'm not sure whether embedding that measurement model within a larger model makes it identified, but it might be the case -- that is the case for the 2-indicator factors (whose parameters would be under-identified on their own, but are identified in a larger model with simple structure).  

Because most indicator variables are ordinal (Likert 4-scale) I proceed like this:


In order to establish strong invariance, all location parameters in the measurement model must be constant across groups.  The intercepts apply only to your continuous indicators (x8 - x10).  You also need to test the threshold constraints for x1 - x7.  You can do this at the same time as intercepts:

2. Weak                      ("loadings")

3. Strong                     ("loadings","intercepts","thresholds")


Or you can do so in separate steps (the order doesn't matter).

2. Weak                      ("loadings")

3. Strong(1)               ("loadings","intercepts")

4. Strong(2)               ("loadings","intercepts","thresholds")


 

5. Structural                      ("loadings","intercepts","means","regressions")


In order for structural regressions to be comparable across groups, the latent variables need to be on the same scale, so you need to first test those restrictions:

5. Structural scale                ("loadings","intercepts","thresholds","means","lv.variances")
6. Structural relations          ("loadings","intercepts","thresholds","means","lv.variances","regressions")

If you can't constrain the latent (residual) variances to equality, you can still compare structural regressions by using phantom variables.  Essentially, for each latent (residual) variance that can't be constrained, you define a second-order factor with variance fixed to 1, the residual variance of the first order factor fixed to zero, and freely estimate the "loading" (beta path), which will be the square root of your original latent (residual) variance.  Then, you estimate regressions among the phantom variables, which are on the same (standardized) scale.  If you need to implement this, here is a paper that employs that rather clever trick:


To evaluate invariance I would primarily look at cfi values, because of the dwls estimator and the rather large sample size. I would use cfi cutoff value from Meade et al. (2008) of .002.


Careful, that study was based on continuous data.  The CFI is calculated from chi-squared, so I'm not sure why you think DWLS invalidates one but not the other.  Certainly the large sample sizes will make the chi-squared sensitive to trivial differences, but read this recent paper about using change in CFI with ordinal indicators:



1.     1. Is it defensible to report latent mean differences, even there is no decrease of the model fit (cfi in particular) between model 3 (Strong) and model 4 (Mean)?


Not really, that's the point of testing the constraints.  The null hypothesis is that those parameters do not differ across groups, and you failed to reject that null hypothesis.  But remember, your comparison was not valid because you failed to constrain thresholds, so those differences in thresholds may have absorbed the misspecification of equal means if the null is really false.  So there is still hope for rejection once you update your method :-)

2.     2. Is it defensible to assume a moderation effect, because of the decrease of cfi between model 4 (Mean) and Model 5 (Structural) concerning Meade`s (2008) cfi cutoff value?


See my note above about putting latent variables on the same scale across groups before making inferences about whether regression paths actually differ across groups.  But once you update your method, yes, rejecting the null hypothesis of equal regression slopes means that the magnitude of at least one slope depends on group (i.e., interaction / moderation).

Terry

Message has been deleted

Erin717

unread,
Nov 30, 2020, 3:34:56 AM11/30/20
to lavaan

Hi Terrence,

Sorry I am digging up old feed here. I have a similar question regarding testing the latent factor regression invariance across groups. 

You said in the previous thread that "
In order for structural regressions to be comparable across groups, the latent variables need to be on the same scale, so you need to first test those restrictions:

5. Structural scale                ("loadings","intercepts","thresholds","means","lv.variances")
6. Structural relations          ("loadings","intercepts","thresholds","means","lv.variances","regressions")
"

I was wondering if group.equal = "means" is necessary to compare structural relations if the objective is not to compare latent mean differences but just regression coefficients? 
I also read somewhere that metric invariace is sufficient for examining latent regression coefficient. Literature on this seems a bit vague. Could I just clarify please? If possible, could you direct me to some references please? 

If the objectives are both on examining latent mean differences and latent regression paths, does this work flow sound correct?

For individual measurement models
1. loadings
2. loadings + intercepts
3. loadings + intercepts + residual (not necessary)

If at least the first two hold, then link measurement models using sem pertaining to theory
4. compare latent means
5. loadings + intercepts + (potetially residual) + lv.variances (is lv.variances necessary as a precondition for the next step?)
6.  loadings + intercepts + (potetially residual) + lv.variances(?) + regression + lv.covariances

Thank you in advance!

Erin 

On Friday, October 23, 2015 at 9:22:58 PM UTC+13 Terrence Jorgensen wrote:

Model1  <- '

#define latent variables

Latent1 =~ x1+x2+x3

Latent2 =~ x4+x5+x6

Latent3 =~ x7+x8

Latent4 =~ x9+x10

#define structural relations

Latent1 ~ Latent2

Latent3 ~ Latent1 + Latent2

Latent4 ~ Latent1 + Latent2 + Latent3

x2 ~~ x3

'


I know this isn't your question, but... Your Latent1 factor is already just-identified with only 3 indicators.  Adding a residual correlation between two of the indicators might make it empirically under-identified.  I'm not sure whether embedding that measurement model within a larger model makes it identified, but it might be the case -- that is the case for the 2-indicator factors (whose parameters would be under-identified on their own, but are identified in a larger model with simple structure).  

Because most indicator variables are ordinal (Likert 4-scale) I proceed like this:


In order to establish strong invariance, all location parameters in the measurement model must be constant across groups.  The intercepts apply only to your continuous indicators (x8 - x10).  You also need to test the threshold constraints for x1 - x7.  You can do this at the same time as intercepts:

2. Weak                      ("loadings")

3. Strong                     ("loadings","intercepts","thresholds")


Or you can do so in separate steps (the order doesn't matter).

2. Weak                      ("loadings")

3. Strong(1)               ("loadings","intercepts")

4. Strong(2)               ("loadings","intercepts","thresholds")


 

5. Structural                      ("loadings","intercepts","means","regressions")


In order for structural regressions to be comparable across groups, the latent variables need to be on the same scale, so you need to first test those restrictions:

5. Structural scale                ("loadings","intercepts","thresholds","means","lv.variances")
6. Structural relations          ("loadings","intercepts","thresholds","means","lv.variances","regressions")

If you can't constrain the latent (residual) variances to equality, you can still compare structural regressions by using phantom variables.  Essentially, for each latent (residual) variance that can't be constrained, you define a second-order factor with variance fixed to 1, the residual variance of the first order factor fixed to zero, and freely estimate the "loading" (beta path), which will be the square root of your original latent (residual) variance.  Then, you estimate regressions among the phantom variables, which are on the same (standardized) scale.  If you need to implement this, here is a paper that employs that rather clever trick:


To evaluate invariance I would primarily look at cfi values, because of the dwls estimator and the rather large sample size. I would use cfi cutoff value from Meade et al. (2008) of .002.


Careful, that study was based on continuous data.  The CFI is calculated from chi-squared, so I'm not sure why you think DWLS invalidates one but not the other.  Certainly the large sample sizes will make the chi-squared sensitive to trivial differences, but read this recent paper about using change in CFI with ordinal indicators:



1.     1. Is it defensible to report latent mean differences, even there is no decrease of the model fit (cfi in particular) between model 3 (Strong) and model 4 (Mean)?


Not really, that's the point of testing the constraints.  The null hypothesis is that those parameters do not differ across groups, and you failed to reject that null hypothesis.  But remember, your comparison was not valid because you failed to constrain thresholds, so those differences in thresholds may have absorbed the misspecification of equal means if the null is really false.  So there is still hope for rejection once you update your method :-)

2.     2. Is it defensible to assume a moderation effect, because of the decrease of cfi between model 4 (Mean) and Model 5 (Structural) concerning Meade`s (2008) cfi cutoff value?


Terrence Jorgensen

unread,
Nov 30, 2020, 9:18:27 AM11/30/20
to lavaan
Hi Erin,
 
Sorry I am digging up old feed here.

I have just deleted my old post (although a record of it is still in each response to that message) because it contained incorrect advice about thresholds (I wrote this before Wu & Estabrook, 2016, was published).  I have also thought more about the issues above, and that was unnecessary advice, too (as well as incomplete, as your questions show).  
 
You said in the previous thread that "
In order for structural regressions to be comparable across groups, the latent variables need to be on the same scale, so you need to first test those restrictions:

5. Structural scale                ("loadings","intercepts","thresholds","means","lv.variances")
6. Structural relations          ("loadings","intercepts","thresholds","means","lv.variances","regressions")

Essentially, I was skeptical that latent regression slopes were really comparable across groups because the scale of each group's factors is arbitrarily set for identification.  But as long as metric-invariance constraints hold (which includes loadings AND thresholds for ordinal variables), the latent/structural slopes are comparable.  Even though they will vary across different identification constraints (fixing 1 group's factor (residual) variance to 1 or any indicator's loading to 1), the ratio of slopes between groups should be constant across identification methods.  

I am still skeptical about Wald tests that compare the slopes, because they tend to be in terms of differences between groups, and I'm not yet sure the SEs are also proportionally equivalent across identification methods (there has been evidence to the contrary in the context of comparing factor loadings).  But comparisons using LRTs of equality constraints should still be valid. 

I was wondering if group.equal = "means" is necessary to compare structural relations if the objective is not to compare latent mean differences but just regression coefficients? 

No
 
I also read somewhere that metric invariace is sufficient for examining latent regression coefficient.

Correct
 
Literature on this seems a bit vague. Could I just clarify please? If possible, could you direct me to some references please? 

To clarify the advice, or proof showing why that is true?

The issue is that groups can have different amounts of common-factor variance and measurement error in the same item, which would account for why Y~X would yield different slopes across groups.  Common-factor components (let's call it "reliability" for short) of an indicator could differ across groups either because 
  • the factor loading differs
  • the factor variance differs
(or both differ).  Linking the common-factor scales by equating loadings across groups (factor variances can still differ) implies mathematically that group differences in covariances (X~~Y) or slopes (Y~X) between indicators of each common factors can only occur because of differences in the corresponding latent slopes.  


If the objectives are both on examining latent mean differences and latent regression paths, does this work flow sound correct?

For individual measurement models
1. loadings
2. loadings + intercepts
3. loadings + intercepts + residual (not necessary)

Okay so far...
 
If at least the first two hold, then link measurement models using sem pertaining to theory
4. compare latent means

Actually, this depends.  Latent means are not parameters; intercepts are.  For exogenous factors (i.e., no predictors), the intercept is coincidentally the grand mean.  So you could compare latent means of exogenous factors immediately after establishing Step 3 above.

However, it would only make sense to compare latent intercepts (i.e., a "main effect" of group on an endogenous factor Y) if there is no interaction between the grouping variable G and the latent predictor X.  When Y ~ X slopes differ across groups, that means the slope is moderated by group, so it is equivalent to estimating the X:G interaction in a more familiar regression model like this:

Y ~ 1 + X + G + X:G # where 1 is for the intercept (mean when G==0 & X==0)

Comparing that to a model in which the Y~X slope is constrained to equality across groups is like comparing the regression model above to one without the interaction term (only main effects of X and G).  If that H0 holds (i.e., failing to reject the H0 of equal group slopes using a LRT), then comparing the Y intercepts is meaningful.  This is the same as the homogeneity-of-slopes assumption in ANCOVA (X is the covariate, G is the grouping variable), and comparing latent intercepts is equivalent to comparing latent "adjusted means".

If you fail to reject the H0 of interaction, then continue letting the slopes differ across groups.  If you are still interested in comparing group intercepts, then you should compare intercepts at different values of X (i.e., probe the interaction) because the difference in Y intercepts is comparing means of Y only among people whose X == 0.  You can re-fit the model with X's mean constrained to different values (e.g., +/- 1 SD) to probe the X:G interaction.  If you fixed the first group's factor variance == 1 to identify the model, then that is as simple as running the model with X~1*1 and X~-1*X in the syntax to compare intercepts under different conditions.

Here is a primer on moderation, if you are unfamiliar with probing interactions.

 
5. loadings + intercepts + (potetially residual) + lv.variances

Sure, but you can compare latent variances immediately after verifying loadings can be constrained to equality.
 
(is lv.variances necessary as a precondition for the next step?)
6.  loadings + intercepts + (potetially residual) + lv.variances(?) + regression + lv.covariances

No, you can compare latent slopes immediately after loadings are constrained to equality.  

However, latent (residual) covariances are only comparable if the latent (residual) variances are equal.  If they are not, then covariances could differ just because heterogeneity differs, not because correlations differ.  If you cannot validly constrain latent (residual) variances to equality across groups, then you can still compare correlations using phantom construnts, as shown in the article I linked to in my OP:  http://agencylab.ku.edu/~agencylab/manuscripts/(Card%20and%20Little.in%20press.%20-%20SEM%20and%20agg).pdf

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Erin717

unread,
Nov 30, 2020, 5:25:40 PM11/30/20
to lavaan
Hi Terrence,

Thank you so much for your prompt and illuminating explanations! These truly have clarified my long time confusions! 

Can I double check with you the codes please with different objectives specified. I am dividing this post into continous / ordinal indicators portions. 

For continuous indicators ---- objectives: compare latent means as well as latent regression weights:
For individual measurement models (I am ommiting residual for simplicity):
1. loadings (metric)
2. loadings + intercepts (scalar) 
3. if 1 and 2 two hold, can compare latent means for individual measurement models as well as linking with sem.

4. linking measurement models with sem, and under sem ... 
5. loadings 
6. loadings + regressions (to test latent regression invariance, compared to step 5 fit)
7. loadings + lv.variance + lv.covariance (to test latent covariance invariance compared to step 5 fit.)
8. loadings + intercepts + regressions (only if step 6 holds, this step inspects if the latent means of endogeneous latent variables (and mediating variables?) are the same)
9. loadings + intercepts + regressions (if step 6 does not hold, use the probing interaction to examine latent endogeneous variable mean difference)  

More to step 8, I noticed that (for the non-reference groups) the unstandardised latent variable intercepts of exogenous variables shown in the summary output are the same as lavInspect (fit.model, "mean.lv"). However, for endogenous latent variables as well as mediating latent variables, the intercepts shown are different between the summary output and "mean.lv". Is it because one is the unadjusted mean ("mean.lv" approach, not controlled for predictors) and the other is the adjusted mean (summary output, controlled for predictors)? 


For categorical indicators -- objective: compare latent means for one measurement model (one latent factor with both binary and polytomous indicators):
Using the Wu and Estabrook (2016) approach and measEq.syntax function to generate correct lavaan model syntax (parameterization = "delta" ):
1. configural model: (ID.fac = "std.lv", ID.cat="Wu", group.equal="configural")
2. metric model:  (ID.fac = "std.lv", ID.cat="Wu", group.equal=c("thresholds","loadings")) -- this model constrains intercept as zero for the reference group and free the other groups.
3. scalar model: (ID.fac = "std.lv", ID.cat="Wu", group.equal=c("thresholds","loadings","intercepts")) -- this model essentially constrains intercepts as zero for all groups.
If the scalar model holds, it is possible to compare latent means across groups. 


Thank you again for all of your help and the excellent packages! 

Best,
Erin 

Terrence Jorgensen

unread,
Dec 4, 2020, 8:43:10 AM12/4/20
to lavaan
2. loadings + intercepts (scalar) 
...
8. loadings + intercepts + regressions 

As long as you mean indicator intercepts in Step 2 and intercepts of endogenous common factors in Step 8, then that looks fine to me.  

Further, if Step 7's H0 is rejected, then you can also separately test lv.variances without constraining lv.covariances.  Possibly homoskedasticity holds but groups have unequal correlations.  Or possibly homoskedasticity holds for some factors but not others, in which case you can test equality of covariances among homoskedastic factors, but would need phantom constructs to test correlations among (homo- and) heteroskedastic factors.

More to step 8, I noticed that (for the non-reference groups) the unstandardised latent variable intercepts of exogenous variables shown in the summary output are the same as lavInspect (fit.model, "mean.lv"). However, for endogenous latent variables as well as mediating latent variables, the intercepts shown are different between the summary output and "mean.lv". Is it because one is the unadjusted mean ("mean.lv" approach, not controlled for predictors) and the other is the adjusted mean (summary output, controlled for predictors)? 

Correct, "mean.lv" returns model-implied latent-variable grand means, not the intercepts (the latter of which are the model parameters). 

A variable's intercept is always interpreted as its expected value when its predictor(s) == 0, so an intercept is only an "adjusted mean" (as the term is used in texts about ANCOVA) if the predictors' GRAND means happen to be zero (i.e., in all groups).  Likewise, variable's intercept is its grand mean (in any particular group) only when the variable has no predictor(s) OR its predictors' means all == 0 (in that group).

For categorical indicators -- objective: compare latent means for one measurement model (one latent factor with both binary and polytomous indicators):

How many categories does the polytomous indicators have?  If only 3, then your sequence below is fine, but your configural model can already constrain thresholds to equality (it is statistically equivalent to equating the intercepts and latent scales, which is the default).

Using the Wu and Estabrook (2016) approach and measEq.syntax function to generate correct lavaan model syntax (parameterization = "delta" ):
1. configural model: (ID.fac = "std.lv", ID.cat="Wu", group.equal="configural")

Don't set group.equal= to anything if you are constraining anything.  "configural" is not a type of parameter.

2. metric model:  (ID.fac = "std.lv", ID.cat="Wu", group.equal=c("thresholds","loadings")) -- this model constrains intercept as zero for the reference group and free the other groups.

If your polytomous indicators have > 3 categories, you can separately test equality of thresholds first, then additionally constrain loadings (of items whose thresholds are invariant).  But other than being more informative about the type of DIF, I don't think there is a practical advantage to knowing whether there is DIF in thresholds vs loadings.  But the latter are only comparable when thresholds are equal.

3. scalar model: (ID.fac = "std.lv", ID.cat="Wu", group.equal=c("thresholds","loadings","intercepts")) -- this model essentially constrains intercepts as zero for all groups.
If the scalar model holds, it is possible to compare latent means across groups. 

Yes, although partial scalar invariance is also sufficient.

Erin717

unread,
Jan 27, 2021, 4:23:39 AM1/27/21
to lav...@googlegroups.com
Dear Terrence,

Thank you very much for your advice! And I am sorry that I am back for more support. 

In the previous email, I asked about adjusted means of endogenous variables under the framework of SEM. You replied that in Step 8, the intercepts equality constraint should be the intercepts of endogenous common factors. I was wondering how it could be specified in lavaan? 
 
8. loadings + intercepts + regressions 
As long as you mean indicator intercepts in Step 2 and intercepts of endogenous common factors in Step 8, then that looks fine to me.  

I tried group.equal = c("loadings","intercepts","regressions","means"). The inclusion of "intercepts" only constrained the exogenous latent variable indicators intercepts to be equal across groups. And the inclusion of "means" essentially made all the latent factors (exogenous and endogenous) to be zero across all the groups. Is there a way to formally specify exogenous common factor intercepts to be equal in lavaan? Or did I misunderstand?

Thank you very much for your support all along!

Kindly,
Erin

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/10ed09b5-4414-416a-96a2-812077824609n%40googlegroups.com.

Erin717

unread,
Jan 27, 2021, 4:43:07 AM1/27/21
to lav...@googlegroups.com
Dear Terrence,

Thank you very much for your advice! And I am sorry that I am back for more support. 

In the previous email, I asked about adjusted means of endogenous variables under the framework of SEM. You replied that in Step 8, the intercepts equality constraint should be the intercepts of endogenous common factors. I was wondering how it could be specified in lavaan? 
 
8. loadings + intercepts + regressions 
As long as you mean indicator intercepts in Step 2 and intercepts of endogenous common factors in Step 8, then that looks fine to me.  

I tried group.equal = c("loadings","intercepts","regressions","means"). The inclusion of "intercepts" only constrained the exogenous and endogenous latent variable indicators' intercepts to be equal across groups. And the inclusion of "means" essentially made all the latent factors (exogenous and endogenous) zero across all the groups. Is there a way to formally specify endogenous common factor intercepts to be equal in lavaan? Or did I misunderstand?

Thank you very much for your support all along!

Kindly,
Erin
On Sat, 5 Dec 2020 at 02:43, Terrence Jorgensen <tjorge...@gmail.com> wrote:

Terrence Jorgensen

unread,
Jan 27, 2021, 10:58:12 AM1/27/21
to lavaan
intercepts of endogenous common factors. I was wondering how it could be specified in lavaan? 
 ...
I tried group.equal = c("loadings","intercepts","regressions","means").

That should do it.  
  • "loadings" refers to loadings of observed indicators on latent common factors (i.e., the lambda matrix)
  • "intercepts" refers to intercepts of observed indicators (or their latent item responses, if the observations are ordered= categorical), i.e., the vector "nu"
  • "regressions" refers to any effects of a latent common factor on another latent common factor (i.e., the "beta" matrix), which therefore includes higher-order loadings
  • "means" refers to intercepts of latent variables (both endogenous common-factor intercepts and exogenous common-factor means), i.e., the vector "alpha"
Reply all
Reply to author
Forward
0 new messages