Testing longitudinal measurement invariance

Timothy Wong

unread,

Dec 4, 2022, 7:25:28 PM12/4/22

to lavaan

Hello lavaan community,

I have a dataset of 781 participants measured on two ocassions: 'baseline' and 'follow-up'. The participants responded to 32 categorical indicators, repeated at both timepoints. Because the indicators come from different questionnaires, and also due to sparse data issues (resulting in some items having their response categories collapsed), the indicators vary in the number of categories (between 2 and 5; most of the items have four categories). Running an EFA on only the baseline data suggested a four factor solution, as so:

I now want to establish measurement invariance: that is, to see whether this four-factor solution fits the follow-up data. Following Liu et al. (2017)’s rules for identifying a measurement invariance CFA with categorical indicators (pg. 11), I have written the following syntax to test configural invariance:

'

# BL_ = Baseline (1st measurement occasion)
# FU_ = Follow-up (2nd measurement occasion)

# Factor loadings - at all measurement occasions, the same observed measure is chosen as the marker variable (rule 3)

slp1 =~ NA*BL_s1 + 1*BL_s2 + BL_s3 + BL_s4 + BL_s5 + BL_s6 + BL_s7 + BL_s8
slp2 =~ NA*FU_s1 + 1*FU_s2 + FU_s3 + FU_s4 + FU_s5 + FU_s6 + FU_s7 + FU_s8

dep1 =~ NA*BL_d1 + 1*BL_d2 + BL_d3 + BL_d4 + BL_d5 + BL_d6 + BL_d7 + BL_d8
dep2 =~ NA*FU_d1 + 1*FU_d2 + FU_d3 + FU_d4 + FU_d5 + FU_d6 + FU_d7 + FU_d8

app1 =~ NA*BL_a1 + 1*BL_a2 + BL_a3 + BL_a4
app2 =~ NA*FU_a1 + 1*FU_a2 + FU_a3 + FU_a4

man1 =~ NA*BL_m1 + BL_m2 + 1*BL_m3 + BL_m4 + BL_m5 + BL_m6 + BL_m7 + BL_m8 + BL_m9 + BL_m10 + BL_m11 + BL_m12
man2 =~ NA*FU_m1 + FU_m2 + 1*FU_m3 + FU_m4 + FU_m5 + FU_m6 + FU_m7 + FU_m8 + FU_m9 + FU_m10 + FU_m11 + FU_m12

# Factor variances and covariances
slp1 ~~ slp1 + dep1 + app1 + man1 + slp2 + dep2 + app2 + man2
dep1 ~~ dep1 + app1 + man1 + slp2 + dep2 + app2 + man2
app1 ~~ app1 + man1 + slp2 + dep2 + app2 + man2
man1 ~~ man1 + slp2 + dep2 + app2 + man2
slp2 ~~ slp2 + dep2 + app2 + man2
dep2 ~~ dep2 + app2 + man2
app2 ~~ app2 + man2
man2 ~~ man2

# Factor means - at one measurement occasion, the factor mean is constrained to zero (rule 2)
slp1 ~ 0*1
dep1 ~ 0*1
app1 ~ 0*1
man1 ~ 0*1

slp2 ~ 1
dep2 ~ 1
app2 ~ 1
man2 ~ 1

# Thresholds - one threshold for each indicator (and a second, for the marker variable of each factor) is constrained to be equal across occasions (rule 4)
BL_s1 | BL_s1t1*t1 + s1t2*t2 + BL_s1t3*t3
FU_s1 | FU_s1t1*t1 + s1t2*t2 + FU_s1t3*t3
BL_s2 | s2t1*t1 + s2t2*t2 + BL_s2t3*t3
FU_s2 | s2t1*t1 + s2t2*t2 + FU_s2t3*t3
BL_s3 | BL_s3t1*t1 + s3t2*t2 + BL_s3t3*t3
FU_s3 | FU_s3t1*t1 + s3t2*t2 + FU_s3t3*t3
BL_s4 | s4t1*t1 + BL_s4t2*t2 + BL_s4t3*t3
FU_s4 | s4t1*t1 + FU_s4t2*t2 + FU_s4t3*t3
BL_s5 | BL_s5t1*t1 + s5t2*t2 + BL_s5t3*t3
FU_s5 | FU_s5t1*t1 + s5t2*t2 + FU_s5t3*t3
BL_s6 | BL_s6t1*t1 + s6t2*t2 + BL_s6t3*t3
FU_s6 | FU_s6t1*t1 + s6t2*t2 + FU_s6t3*t3
BL_s7 | BL_s7t1*t1 + s7t2*t2 + BL_s7t3*t3
FU_s7 | FU_s7t1*t1 + s7t2*t2 + FU_s7t3*t3
BL_s8 | BL_s8t1*t1 + s8t2*t2 + BL_s8t3*t3
FU_s8 | FU_s8t1*t1 + s8t2*t2 + FU_s8t3*t3

BL_d1 | BL_d1t1*t1 + d1t2*t2 + BL_d1t3*t3
FU_d1 | FU_d1t1*t1 + d1t2*t2 + FU_d1t3*t3
BL_d2 | BL_d2t1*t1 + d2t2*t2 + d2t3*t3
FU_d2 | FU_d2t1*t1 + d2t2*t2 + d2t3*t3
BL_d3 | BL_d3t1*t1 + d3t2*t2 + BL_d3t3*t3
FU_d3 | FU_d3t1*t1 + d3t2*t2 + FU_d3t3*t3
BL_d4 | BL_d4t1*t1 + d4t2*t2 + BL_d4t3*t3
FU_d4 | FU_d4t1*t1 + d4t2*t2 + FU_d4t3*t3
BL_d5 | BL_d5t1*t1 + d5t2*t2 + BL_d5t3*t3
FU_d5 | FU_d5t1*t1 + d5t2*t2 + FU_d5t3*t3
BL_d6 | BL_d6t1*t1 + d6t2*t2 + BL_d6t3*t3
FU_d6 | FU_d6t1*t1 + d6t2*t2 + FU_d6t3*t3
BL_d7 | BL_d7t1*t1 + d7t2*t2 + BL_d7t3*t3
FU_d7 | FU_d7t1*t1 + d7t2*t2 + FU_d7t3*t3
BL_d8 | BL_d8t1*t1 + d8t2*t2
FU_d8 | FU_d8t1*t1 + d8t2*t2

BL_a1 | BL_a1t1*t1 + a1t2*t2 + BL_a1t3*t3
FU_a1 | FU_a1t1*t1 + a1t2*t2 + FU_a1t3*t3
BL_a2 | a2t1*t1 + a2t2*t2 + BL_a2t3*t3
FU_a2 | a2t1*t1 + a2t2*t2 + FU_a2t3*t3
BL_a3 | BL_a3t1*t1 + a3t2*t2 + BL_a3t3*t3
FU_a3 | FU_a3t1*t1 + a3t2*t2 + FU_a3t3*t3
BL_a4 | BL_a4t1*t1 + a4t2*t2 + BL_a4t3*t3
FU_a4 | FU_a4t1*t1 + a4t2*t2 + FU_a4t3*t3

BL_m1 | BL_m1t1*t1 + m1t2*t2
FU_m1 | FU_m1t1*t1 + m1t2*t2
BL_m2 | BL_m2t1*t1 + m2t2*t2
FU_m2 | FU_m2t1*t1 + m2t2*t2
BL_m3 | m3t1*t1 + m3t2*t2
FU_m3 | m3t1*t1 + m3t2*t2
BL_m4 | BL_m4t1*t1 + m4t2*t2
FU_m4 | FU_m4t1*t1 + m4t2*t2
BL_m5 | BL_m5t1*t1 + m5t2*t2
FU_m5 | FU_m5t1*t1 + m5t2*t2
BL_m6 | BL_m6t1*t1 + m6t2*t2 + BL_m6t3*t3 + BL_m6t4*t4
FU_m6 | FU_m6t1*t1 + m6t2*t2 + FU_m6t3*t3 + FU_m6t4*t4
BL_m7 | BL_m7t1*t1 + m7t2*t2 + BL_m7t3*t3 + BL_m7t4*t4
FU_m7 | FU_m7t1*t1 + m7t2*t2 + FU_m7t3*t3 + FU_m7t4*t4
BL_m8 | BL_m8t1*t1 + m8t2*t2
FU_m8 | FU_m8t1*t1 + m8t2*t2
BL_m9 | BL_m9t1*t1 + m9t2*t2
FU_m9 | FU_m9t1*t1 + m9t2*t2
BL_m10 | BL_m10t1*t1 + m10t2*t2
FU_m10 | FU_m10t1*t1 + m10t2*t2
BL_m11 | m11t1*t1
FU_m11 | m11t1*t1
BL_m12 | m12t1*t1
FU_m12 | m12t1*t1

# Item intercepts - at all measurement occasions, the latent intercepts are fixed to zero (rule 1)
BL_s1 + FU_s1 ~ 0*1
BL_s2 + FU_s2 ~ 0*1
BL_s3 + FU_s3 ~ 0*1
BL_s4 + FU_s4 ~ 0*1
BL_s5 + FU_s5 ~ 0*1
BL_s6 + FU_s6 ~ 0*1
BL_s7 + FU_s7 ~ 0*1
BL_s8 + FU_s8 ~ 0*1

BL_d1 + FU_d1 ~ 0*1
BL_d2 + FU_d2 ~ 0*1
BL_d3 + FU_d3 ~ 0*1
BL_d4 + FU_d4 ~ 0*1
BL_d5 + FU_d5 ~ 0*1
BL_d6 + FU_d6 ~ 0*1
BL_d7 + FU_d7 ~ 0*1
BL_d8 + FU_d8 ~ 0*1

BL_a1 + FU_a1 ~ 0*1
BL_a2 + FU_a2 ~ 0*1
BL_a3 + FU_a3 ~ 0*1
BL_a4 + FU_a4 ~ 0*1

BL_m1 + FU_m1 ~ 0*1
BL_m2 + FU_m2 ~ 0*1
BL_m3 + FU_m3 ~ 0*1
BL_m4 + FU_m4 ~ 0*1
BL_m5 + FU_m5 ~ 0*1
BL_m6 + FU_m6 ~ 0*1
BL_m7 + FU_m7 ~ 0*1
BL_m8 + FU_m8 ~ 0*1
BL_m9 + FU_m9 ~ 0*1
BL_m10 + FU_m10 ~ 0*1
BL_m11 + FU_m11 ~ 0*1
BL_m12 + FU_m12 ~ 0*1

# Unique variances - at one measurement occasion, the unique factor covariance matrix is constrained to be the identity matrix (rule 2)
BL_s1 ~~ 1*BL_s1
BL_s2 ~~ 1*BL_s2
BL_s3 ~~ 1*BL_s3
BL_s4 ~~ 1*BL_s4
BL_s5 ~~ 1*BL_s5
BL_s6 ~~ 1*BL_s6
BL_s7 ~~ 1*BL_s7
BL_s8 ~~ 1*BL_s8
BL_d1 ~~ 1*BL_d1
BL_d2 ~~ 1*BL_d2
BL_d3 ~~ 1*BL_d3
BL_d4 ~~ 1*BL_d4
BL_d5 ~~ 1*BL_d5
BL_d6 ~~ 1*BL_d6
BL_d7 ~~ 1*BL_d7
BL_d8 ~~ 1*BL_d8
BL_a1 ~~ 1*BL_a1
BL_a2 ~~ 1*BL_a2
BL_a3 ~~ 1*BL_a3
BL_a4 ~~ 1*BL_a4
BL_m1 ~~ 1*BL_m1
BL_m2 ~~ 1*BL_m2
BL_m3 ~~ 1*BL_m3
BL_m4 ~~ 1*BL_m4
BL_m5 ~~ 1*BL_m5
BL_m6 ~~ 1*BL_m6
BL_m7 ~~ 1*BL_m7
BL_m8 ~~ 1*BL_m8
BL_m9 ~~ 1*BL_m9
BL_m10 ~~ 1*BL_m10
BL_m11 ~~ 1*BL_m11
BL_m12 ~~ 1*BL_m12

# Unique variances - at all other measurement occasions, the unique factor covariance matrix is a diagonal matrix with the diagonal elements freely estimated (rule 2)
# N.B. Additional constraints added due to dichotomous items (m11 and m12)
FU_s1 ~~ NA*FU_s1
FU_s2 ~~ 1*FU_s2
FU_s3 ~~ NA*FU_s3
FU_s4 ~~ NA*FU_s4
FU_s5 ~~ NA*FU_s5
FU_s6 ~~ NA*FU_s6
FU_s7 ~~ NA*FU_s7
FU_s8 ~~ NA*FU_s8
FU_d1 ~~ NA*FU_d1
FU_d2 ~~ 1*FU_d2
FU_d3 ~~ NA*FU_d3
FU_d4 ~~ NA*FU_d4
FU_d5 ~~ NA*FU_d5
FU_d6 ~~ NA*FU_d6
FU_d7 ~~ NA*FU_d7
FU_d8 ~~ NA*FU_d8
FU_a1 ~~ NA*FU_a1
FU_a2 ~~ 1*FU_a2
FU_a3 ~~ NA*FU_a3
FU_a4 ~~ NA*FU_a4
FU_m1 ~~ NA*FU_m1
FU_m2 ~~ NA*FU_m2
FU_m3 ~~ 1*FU_m3
FU_m4 ~~ NA*FU_m4
FU_m5 ~~ NA*FU_m5
FU_m6 ~~ NA*FU_m6
FU_m7 ~~ NA*FU_m7
FU_m8 ~~ NA*FU_m8
FU_m9 ~~ NA*FU_m9
FU_m10 ~~ NA*FU_m10
FU_m11 ~~ NA*FU_m11
FU_m12 ~~ NA*FU_m12

# Lagged unique covariances - unique factors allowed to freely correlate with itself over time # but not with other unique factors at other measurement occasions
BL_s1 ~~ FU_s1
BL_s2 ~~ FU_s2
BL_s3 ~~ FU_s3
BL_s4 ~~ FU_s4
BL_s5 ~~ FU_s5
BL_s6 ~~ FU_s6
BL_s7 ~~ FU_s7
BL_s8 ~~ FU_s8
BL_d1 ~~ FU_d1
BL_d2 ~~ FU_d2
BL_d3 ~~ FU_d3
BL_d4 ~~ FU_d4
BL_d5 ~~ FU_d5
BL_d6 ~~ FU_d6
BL_d7 ~~ FU_d7
BL_d8 ~~ FU_d8
BL_a1 ~~ FU_a1
BL_a2 ~~ FU_a2
BL_a3 ~~ FU_a3
BL_a4 ~~ FU_a4
BL_m1 ~~ FU_m1
BL_m2 ~~ FU_m2
BL_m3 ~~ FU_m3
BL_m4 ~~ FU_m4
BL_m5 ~~ FU_m5
BL_m6 ~~ FU_m6
BL_m7 ~~ FU_m7
BL_m8 ~~ FU_m8
BL_m9 ~~ FU_m9
BL_m10 ~~ FU_m10
BL_m11 ~~ FU_m11
BL_m12 ~~ FU_m12
'

cfa1 <- cfa(config.mod, data = df1, ordered = T)

Running this syntax I receive the warning:

Warning messages:
1: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= 2.203828e-24) is close to zero. This may be a symptom that the
    model is not identified.
2: In lav_object_post_check(object) :
lavaan WARNING: covariance matrix of latent variables
                is not positive definite;
                use lavInspect(fit, "cov.lv") to investigate.

I’ve googled instances where others find the same error, and it appears the problem is caused by an insufficient sample size to estimate all the covariances and thresholds. Is this also the case for my model? If so, what could I do to make the model simpler and hence estimable? Thanks (I can post more of my results and model details upon request).

Timothy Wong

unread,

Dec 5, 2022, 6:47:24 PM12/5/22

to lavaan

An update:

In the original syntax, I did not specify parameterization = "theta" in the call to cfa. Now, after retrying using

cfa1 <- cfa(config.mod, data = df1, ordered = T, parameterization = "theta")

R seems to be 'stuck' trying to compute the solution; much longer than when parameterization is not specified (defaults to delta). I left R running for more than half an hour, yet I don't see any warnings or errors. Is this a sign of non-convergence?

Terrence Jorgensen

unread,

Dec 12, 2022, 8:12:14 AM12/12/22

to lavaan

R seems to be 'stuck' trying to compute the solution; much longer than when parameterization is not specified (defaults to delta). I left R running for more than half an hour, yet I don't see any warnings or errors. Is this a sign of non-convergence?

The theta parameterization can have trouble finding a solution in a situation when you would get a negative residual variance (Heywood case) with the delta parameterization. In that case, fixing it to 1 just makes everything blow up.

Following Liu et al. (2017)’s rules for identifying a measurement invariance CFA with categorical indicators

That article perpetuates the mistakes of Millsap & Tein (2004), which were pointed out by Wu & Estabrook (2016). Try using the semTools::measEq.syntax() function, which implements less problematic identification rules.

Terrence D. Jorgensen (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Message has been deleted

Timothy Wong

unread,

Dec 19, 2022, 7:10:15 PM12/19/22

to lavaan

Hi Terrence (sorry for the double post, I clicked 'reply to author' accidentally, when I meant to post here),

Try using the semTools::measEq.syntax() function, which implements less problematic identification rules.

As per your advice, I generated measurement invariance syntax using semTools::measEq.syntax(). Unlike the original model formulated with Liu et al.'s rules, this model is able to converge normally:

lavaan 0.6-12 ended normally after 98 iterations

Estimator DWLS
Optimization method NLMINB
Number of model parameters 269

Number of observations 781

However, I am still receiving this warning after the model converges:

Warning message:

In lav_object_post_check(object) :
lavaan WARNING: covariance matrix of latent variables
is not positive definite;
use lavInspect(fit, "cov.lv") to investigate.

Is a non positive-definite covariance matrix a concern, if lavaan converges? If so, what steps should I take to diagnose and fix the problem?

Daniel Gruehn

unread,

Dec 20, 2022, 8:37:53 AM12/20/22

to lavaan

Hi,

I would assume that your model is a magnitude too complex for the given sample size. With the guideline of having 10x more people (or even 5x) than estimated parameters, your sample size would have to be way larger. Adding to it the categorical nature of your variables, which often generates model problems in my experience, I am not surprised that you run into identification problems or non-positive definite matrices. There are often category combinations between variables that simply don't exist in the sample making it difficult to estimate them properly.

My suggestion would probably be to carefully screen your data. Is the baseline factor structure actually good? Can you remove problematic variables? If you had to combine categories for some variables that sounds problematic - probably doing some cross-tables to check the actual distribution... Can you combine variables or more categories? And then, I would probably start with just one latent variable rather than all four.