CFA measurement invariance longitudinal model with binary and orderd catagorical indicators

223 views
Skip to first unread message

Stanley Friedemann

unread,
Dec 7, 2021, 11:30:26 AM12/7/21
to lavaan
Hello,

I am examining measurement invariance of the health survey SF-12 (version 1) which has binary and ordered categorical items with 3 to 5 categories. The measure was used at three occasions within 10 years. Complete case N = 1000.  I used semTools to fit the configural model. 
The model is composed of two latent factors (mental and physical health). I am novice to this analysis and lavaan/semTools, but tried my best to write a correct code, hopefully. I do not consider measurement invariance between groups, but between the repeated measures.
Could you please take a look at the code and tell me if there is anything wrong?
The model converge, but the (residual?) covariances between H7 ~~ H1.7, H7 ~~ H2.7, and H1.7 ~~ H2.7 were negative (-.22 to -.55), although their polychoric correlations (lavCor) are positive. Do you have any suggestions why this happened and whether (and how) it should be fixed?

mod.cat <- ' 
# Latent common factor definitions
MH =~ H6 + H7 + H9 + H10 + H11 + H12
PH =~ H1 + H2 + H3 + H4 + H5 + H8
  H6 ~~ H7
  H4 ~~ H5
  H2 ~~ H3
  H9 ~~ H11
f1MH =~ H1.6 + H1.7 + H1.9 + H1.10 + H1.11 + H1.12
f1PH =~ H1.1 + H1.2 + H1.3 + H1.4 + H1.5 + H1.8
  H1.6 ~~ H1.7
  H1.4 ~~ H1.5
  H1.2 ~~ H1.3
  H1.9 ~~ H1.11
f2MH =~ H2.6 + H2.7 + H2.9 + H2.10 + H2.11 + H2.12
f2PH =~ H2.1 + H2.2 + H2.3 + H2.4 + H2.5 + H2.8 
  H2.6 ~~ H2.7
  H2.4 ~~ H2.5
  H2.2 ~~ H2.3
  H2.9 ~~ H2.11
'
syntax.config <- measEq.syntax(configural.model = mod.cat,
                               data = dat,
                               ordered = c("H6","H7","H9","H10","H11","H12",
                                           "H1","H2","H3","H4","H5","H8",
                                           "H1.6","H1.7","H1.9","H1.10","H1.11","H1.12",
                                           "H1.1","H1.2","H1.3","H1.4", "H1.5","H1.8",
                                           "H2.6","H2.7","H2.9","H2.10","H2.11","H2.12",
                                           "H2.1","H2.2","H2.3","H2.4", "H2.5","H2.8"),
                               parameterization = "theta",
                               longFacNames = list(MHtime = c("MH","f1MH","f2MH"),
                                                   PHtime = c("PH","f1PH","f2PH")),
                               longIndNames = list(L6 = c("H6","H1.6","H2.6"),
                                                   L7 = c("H7","H1.7","H2.7"),
                                                   L9 = c("H9","H1.9","H2.9"),
                                                   L10 = c("H10","H1.10","H2.10"),
                                                   L11 = c("H11","H1.11","H2.11"),
                                                   L12 = c("H12","H1.12","H2.12"),
                                                   L1 = c("H1","H1.1","H2.1"),
                                                   L2 = c("H2","H1.2","H2.2"),
                                                   L3 = c("H3","H1.3","H2.3"),
                                                   L4 = c("H4","H1.4","H2.4"),
                                                   L5 = c("H5","H1.5","H2.5"),
                                                   L8 = c("H8","H1.8","H2.8")),
                               ID.fac = "UV",                                                 # UV due to bifactor model
                               ID.cat = "Wu.Estabrook.2016",
                               auto = TRUE,
                               group = NULL,
                               return.fit = TRUE)

And: the next step would be to set constraints step by step, beginning with long.equal = "thresholds" ?

I'd be grateful for any hints. Thank you in advance!

Best,
Stanley

Terrence Jorgensen

unread,
Dec 8, 2021, 10:21:44 AM12/8/21
to lavaan
binary and ordered categorical items with 3 to 5 categories

Do you have a mix within each factor?  

I am novice to this analysis and lavaan/semTools, but tried my best to write a correct code, hopefully. I do not consider measurement invariance between groups, but between the repeated measures.  
Could you please take a look at the code and tell me if there is anything wrong?

Syntax looks fine, but you don't need to explicitly set ID.fac=, ID.cat=, or group= when you are using the defaults.

the (residual?) covariances between H7 ~~ H1.7, H7 ~~ H2.7, and H1.7 ~~ H2.7 were negative (-.22 to -.55), although their polychoric correlations (lavCor) are positive. Do you have any suggestions why this happened and whether (and how) it should be fixed?

Nothing to fix.   Residual correlations are partial correlations, given their common predictor(s)/factor(s).  The direction is not bound to match the corresponding zero-order correlation. 

                               return.fit = TRUE)

I would not recommend this.   Better to inspect the syntax object to ensure you are fitting the model you intend to, as the help-page examples show.

And: the next step would be to set constraints step by step, beginning with long.equal = "thresholds" ?

Yes, assuming you have at least one polytomous indicator per factor.  If one factor has only binary indicators, then you will need to simultaneously constrain thresholds, loadings, and intercepts because binary data cannot distinguish between sources of DIF.  If you have a mix, then all your df to test threshold invariance will simply come from the polytomous indicators.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Stanley Friedemann

unread,
Dec 8, 2021, 10:52:30 AM12/8/21
to lavaan
Dear Terrence,

thank you very much for your support!

Terrence Jorgensen schrieb am Mittwoch, 8. Dezember 2021 um 16:21:44 UTC+1:
binary and ordered categorical items with 3 to 5 categories

Do you have a mix within each factor?  

Here are the measurement scales of the indicators (bin = binary, xcat = number of categories; prior I mistakenly wrote max. no. of cat is 5, but its 6) per factor:

MH : H6(bin) + H7(bin) + H9(6cat) + H10(6cat) + H11(6cat) + H12(5cat)
PH : H1(5cat) + H2(3cat) + H3(3cat) + H4(bin) + H5(bin) + H8(5cat)

So, yes pretty mixed up. I have read a paper (don't remember which), which said number of categories of indicators should at most deviate by one category. I know according to Liu et al. (2017) such sparse data creates problems "in specifying invariance constraints on the threshold parameters." I examined indicators and each of their categories were used in the sample at the 3 occasions.
 

I am novice to this analysis and lavaan/semTools, but tried my best to write a correct code, hopefully. I do not consider measurement invariance between groups, but between the repeated measures.  
Could you please take a look at the code and tell me if there is anything wrong?

Syntax looks fine, but you don't need to explicitly set ID.fac=, ID.cat=, or group= when you are using the defaults.

the (residual?) covariances between H7 ~~ H1.7, H7 ~~ H2.7, and H1.7 ~~ H2.7 were negative (-.22 to -.55), although their polychoric correlations (lavCor) are positive. Do you have any suggestions why this happened and whether (and how) it should be fixed?

Nothing to fix.   Residual correlations are partial correlations, given their common predictor(s)/factor(s).  The direction is not bound to match the corresponding zero-order correlation. 

                               return.fit = TRUE)

I would not recommend this.   Better to inspect the syntax object to ensure you are fitting the model you intend to, as the help-page examples show.

And: the next step would be to set constraints step by step, beginning with long.equal = "thresholds" ?

Yes, assuming you have at least one polytomous indicator per factor.  If one factor has only binary indicators, then you will need to simultaneously constrain thresholds, loadings, and intercepts because binary data cannot distinguish between sources of DIF.  If you have a mix, then all your df to test threshold invariance will simply come from the polytomous indicators.

From your further response I understood, that I can proceed with the analysis (of threshold invariance; there is nothing to add to the code), although the binary indicators do not contribute this particular test. 

Thanks also for your other suggestions!

Kind regards,
Stanley

Stanley Friedemann

unread,
Dec 9, 2021, 7:53:27 AM12/9/21
to lavaan
Dear Terrence,
S
when examining threshold invariance (long.equal = "thresholds") and fitting the cfa model I receive the warning

In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= 3.424880e-17) is close to zero. This may be a symptom that the
    model is not identified.

I know this warning has been reported here many times, but potentially this may be related to the specific nature of the measurement scales of my indicators(?). The baseline invariance model converges with good fit. The fit of the threshold invariance model is also good. I noticed that the in output of the latter model there are a few "NA" for some indicators under "Variances:" column "Std.Err" (which do not appear in the output of the baseline invariance model test). I do not know whether this is related to the warning. Is there anything that should be done to fix the issue?

Another more general question is whether invariance of intercepts should also be tested with ordered categorical indicators? Svetina et al. (2020) just test baseline, thresholds, and thresholds&loadings invariance and mention that intercept invariance testing has been the last invariance test when indicators are continuous. In some propositions of Wu & Estabrook (2017) intercept invariance is stated to be an identification condition. I hope I do not mix things up here, but any clarifying response would be appreciated. Thank you again in advance!

Best,
Stanley

Terrence Jorgensen

unread,
Dec 19, 2021, 4:46:06 AM12/19/21
to lavaan
I do not know whether this is related to the warning

Nor do I, and there is not a lot of information in your post.  

 
whether invariance of intercepts should also be tested with ordered categorical indicators? 

Yes, "because threshold invariance (for polytomous data) effectively equates the scales of the latent responses, the general rules for continuous outcomes apply. For example, since for continuous outcomes the invariance of loadings and intercepts guarantees the comparison of both factor means and variances, for ordered polytomous data one can do the same comparison with invariant thresholds, loadings, and intercepts'' (Wu & Estabrook, 2016, p. 1033, section 5.8).

Stanley Friedemann

unread,
Jan 16, 2022, 11:01:45 AM1/16/22
to lavaan
Regrading the warning message.

For testing the configural model with threshold invariance I used:
syntax.config <- measEq.syntax(configural.model = mod.cat,
                               data = dat,
                               ordered = c("H6","H7","H9","H10","H11","H12",
                                           "H1","H2","H3","H4","H5","H8",
                                           "H1.6","H1.7","H1.9","H1.10","H1.11","H1.12",
                                           "H1.1","H1.2","H1.3","H1.4", "H1.5","H1.8",
                                           "H2.6","H2.7","H2.9","H2.10","H2.11","H2.12",
                                           "H2.1","H2.2","H2.3","H2.4", "H2.5","H2.8"),
                               parameterization = "theta",
                               longFacNames = list(MHtime = c("MH","f1MH","f2MH"),
                                                   PHtime = c("PH","f1PH","f2PH")),
                               longIndNames = list(L6 = c("H6","H1.6","H2.6"),
                                                   L7 = c("H7","H1.7","H2.7"),
                                                   L9 = c("H9","H1.9","H2.9"),
                                                   L10 = c("H10","H1.10","H2.10"),
                                                   L11 = c("H11","H1.11","H2.11"),
                                                   L12 = c("H12","H1.12","H2.12"),
                                                   L1 = c("H1","H1.1","H2.1"),
                                                   L2 = c("H2","H1.2","H2.2"),
                                                   L3 = c("H3","H1.3","H2.3"),
                                                   L4 = c("H4","H1.4","H2.4"),
                                                   L5 = c("H5","H1.5","H2.5"),
                                                   L8 = c("H8","H1.8","H2.8")),
                               ID.cat = "Wu.Estabrook.2016",
                               long.equal = "thresholds",
                               auto = TRUE)

cat(as.character(syntax.config))
summary(syntax.config)
model <- as.character(syntax.config)
fit.prop4 <- cfa(model, data=dat, ordered = c("H6","H7","H9","H10","H11","H12",

                                        "H1","H2","H3","H4","H5","H8",
                                        "H1.6","H1.7","H1.9","H1.10","H1.11","H1.12",
                                        "H1.1","H1.2","H1.3","H1.4", "H1.5","H1.8",
                                        "H2.6","H2.7","H2.9","H2.10","H2.11","H2.12",
                                        "H2.1","H2.2","H2.3","H2.4", "H2.5","H2.8"))

Then, the warning appears.

Due to a post of yours on Github I checked for linear dependencies, but  there is only one large correlation of .88 between two latent factors. But such a dependencies is partly expected from earlier studies.

Or - as mentioned before - the warning is related to "NA" in the column Std.Err of some of the variances of the indicators? e.g.,
                Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
.H1.9    (t.15)    0.557       NA                      0.557    0.557
.H1.3    (t.21)    0.432       NA                      0.432    0.432
.H2.10   (t.28)    0.377       NA                      0.377    0.377
.H2.11   (t.29)    0.503                               0.503    0.503
.H2.12   (t.30)    0.278       NA                      0.278    0.278
.H2.1    (t.31)    0.325       NA                      0.325    0.325
.H2.2    (t.32)    0.297       NA                      0.297    0.297

Due you have a clue? (in case you need more information please let me know what exactly).

Terrence Jorgensen

unread,
Jan 25, 2022, 5:57:38 PM1/25/22
to lavaan
I checked for linear dependencies, but  there is only one large correlation of .88 between two latent factors

The message is not about estimated correlations among variables.  It is about correlations among parameters' estimated sampling distributions (vcov).

Due you have a clue? (in case you need more information please let me know what exactly)

Posting a more complete script helped.  When you fit the model, you need to tell lavaan to use the same parameterization (theta) that measEq.syntax() assumed would be used when writing the model syntax.  I'm guessing that you are freeing residual variances on occasions 2 and 3 when thresholds are constrained, but those are not model parameters under the default delta parameterization.

Also, the generated syntax is exhaustive (specifies every nonzero parameter), so you should be able to use lavaan() rather than any potentially problematic defaults set by cfa().
Reply all
Reply to author
Forward
0 new messages