RI-CLPM with binary endogenous variable

495 views
Skip to first unread message

Felix B.

unread,
Oct 11, 2021, 12:23:00 PM10/11/21
to lavaan

Dear community,
I am interested in estimating a RI-CLPM (using a continuous and a binary variable with 3 waves in total). As using a binary variable in this case is apparently rather problematic, I wonder whether anyone here has tried it before and can give some advice (see also https://jeroendmulder.github.io/RI-CLPM/faq.html). Will this give sensible estimates? At the moment I attempt to compare the performance of RI-CLPM to CLPM. If anyone has some ideas, also regarding the type of estimator that might work best, I am very happy for suggestions. My code looks like this at the moment (R 4.1.1 and lavaan 0.6-9).


library(lavaan)
RICLPM <- '# Create between components (random intercepts)
            RImath =~ 1*math1 + 1*math2 + 1*math4
            RIaspi =~ 1*aspi1 + 1*aspi2 + 1*aspi4
            
            # Create within-person centered variables
            wmath1 =~ 1*math1
            wmath2 =~ 1*math2
            wmath4 =~ 1*math4
            waspi1 =~ 1*aspi1
            waspi2 =~ 1*aspi2
            waspi4 =~ 1*aspi4
            
            # Estimate the lagged effects between the within-person centered variables
            wmath2 + waspi2 ~ wmath1 + waspi1
            wmath4 + waspi4 ~ wmath2 + waspi2
            
            # Adding control variables (effects may vary by wave)
            math1 + math2 + math4 ~ logincome
            aspi1 + aspi2 + aspi4 ~ logincome
            
            # Estimate the covariance between the within-person centered variables at the first wave
            wmath1 ~~ waspi1
            
            # Estimate the covariances between the residuals of the within-person centered variables
            wmath2 ~~ waspi2
            wmath4 ~~ waspi4
            
            # Estimate the variance and covariance of the random intercepts
            RImath ~~ RImath
            RIaspi ~~ RIaspi
            RIaspi ~~ RImath
            
            # Estimate the (residual) variance of the within-person centered variables
            wmath1 ~~ wmath1
            wmath2 ~~ wmath2
            wmath4 ~~ wmath4
            waspi1 ~~ waspi1
            waspi2 ~~ waspi2
            waspi4 ~~ waspi4'

RICLPM.fit <- lavaan(RICLPM, data = testdata, meanstructure = T, int.ov.free = T)
summary(RICLPM.fit, standardized = FALSE, fit.measures = TRUE)

aspi is the binary variable, math the continuous one. Note that math is already standardized by wave 1.

Thanks a lot. Best wishes

Terrence Jorgensen

unread,
Oct 13, 2021, 5:04:13 AM10/13/21
to lavaan
aspi is the binary variable

Then those should be declared in the ordered= argument (so use estimator = "WLSMV").  At time 1, there is nothing to identify the latent (residual) variance, so it has to be fixed to an arbitrary value.  For example,

waspi1 =~ 1*aspi1
aspi1 ~~ 0*aspi1
waspi1 ~~ 1*waspi1

You could freely estimate variances at later time points if you link the latent-response scales by assuming threshold invariance

aspi1 | thr.asp*t1
aspi2 | thr.asp*t1
aspi4 | thr.asp*t1

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Felix B.

unread,
Oct 14, 2021, 7:30:31 AM10/14/21
to lavaan
Thanks a lot for your reply!
First, I found a neat diagram so you see which model I have in mind (https://eprints.whiterose.ac.uk/115142/1/Praetorius_et_al._29.03.17.pdf, page 58).

Second, you are right, I did not declare the binary variables as one should do. The reason for this is, when I run the model I showed in post 1, it works (produces results). As soon as I declare the binary vars, the model breaks and no sensible output is produced. I am not sure why this happens.

Third, maybe this relates to the last aspect, the fixing of the values. Following your suggestions, I changed my code as follows.

RICLPM <- '# Create between components (random intercepts)
            RImath =~ 1*math1 + 1*math2 + 1*math4
            RIaspi =~ 1*aspi1 + 1*aspi2 + 1*aspi4
            
            # Create within-person centered variables
            #wmath1 =~ math1        line commented out
            wmath2 =~ math2
            wmath4 =~ math4
            #waspi1 =~ aspi1        line commented out
            waspi2 =~ aspi2
            waspi4 =~ aspi4
            
            #Changes

            waspi1 =~ 1*aspi1
            aspi1 ~~ 0*aspi1
            waspi1 ~~ 1*waspi1
            
            wmath1 =~ 1*math1
            math1 ~~ 0*math1
            wmath1 ~~ 1*wmath1

            
            # Estimate the lagged effects between the within-person centered variables
            wmath2 + waspi2 ~ wmath1 + waspi1
            wmath4 + waspi4 ~ wmath2 + waspi2
            
            # Adding control variables (effects may vary by wave)
            math1 + math2 + math4 ~ logincome
            aspi1 + aspi2 + aspi4 ~ logincome
            
            # Estimate the covariance between the within-person centered variables at the first wave
            wmath1 ~~ waspi1
            
            # Estimate the covariances between the residuals of the within-person centered variables
            wmath2 ~~ waspi2
            wmath4 ~~ waspi4
            
            # Estimate the variance and covariance of the random intercepts
            RImath ~~ RImath
            RIaspi ~~ RIaspi
            RIaspi ~~ RImath
            
            # Estimate the (residual) variance of the within-person centered variables
            #wmath1 ~~ wmath1

            wmath2 ~~ wmath2
            wmath4 ~~ wmath4
            #waspi1 ~~ waspi1

            waspi2 ~~ waspi2
            waspi4 ~~ waspi4'


#Test first without imputed data
testdata <- datalist[["imp28pv3"]]    #Select one imputed dataset for testing the model
RICLPM.fit <- lavaan(RICLPM, data = testdata, meanstructure = T, int.ov.free = T, ordered = c("aspi1", "aspi2", "aspi4"))

summary(RICLPM.fit, standardized = FALSE, fit.measures = TRUE)

After estimation I receive the following errors:

1: In lav_samplestats_step2(UNI = FIT, wt = wt, ov.names = ov.names,  :
  lavaan WARNING: correlation between variables aspi2 and math4 is (nearly) 1.0
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
3: In lav_object_post_check(object) :
  lavaan WARNING: some estimated ov variances are negative
4: In lav_object_post_check(object) :
  lavaan WARNING: some estimated lv variances are negative


Did I make an error when adapting my model or is there just something wrong with the estimation? Also, maybe it is a problem that the grades are not equal spaced (so t1, t2 and t4 with t3 missing). Any ideas on how to proceed are much appreciated, thanks.

Terrence Jorgensen

unread,
Oct 15, 2021, 10:05:18 AM10/15/21
to lavaan
After estimation I receive the following errors:

There were no errors posted, only warnings.

 
1: In lav_samplestats_step2(UNI = FIT, wt = wt, ov.names = ov.names,  :
  lavaan WARNING: correlation between variables aspi2 and math4 is (nearly) 1.0

Looks like (nearly) perfect separation; barely any overlap in the math4 distributions between aspi2 groups.  That means the sample covariance matrix has (near) linear dependency, so it could cause estimation issues, like you see in the other warnings.  

You can see how this manifests problems in logistic/probit regression as well.  Try regressing math4 ~ aspi2 using glm(..., family = binomial(link = "logit")), and you might see a problem there, too.
 
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.

It could also be a sign of something else, like collinearity.
  
maybe it is a problem that the grades are not equal spaced (so t1, t2 and t4 with t3 missing). 

I doubt that is a problem.  You aren't constraining AR/CL effects to equality over time.  As far as the software is concerned, these are just different variables, except that their residuals also have common factors (random intercepts).  Have you tried running it without the random intercepts?  I expect the problem will persist because it is a problem with the input data.

Felix B.

unread,
Oct 18, 2021, 4:33:55 AM10/18/21
to lavaan
Thanks a lot for the explanation. I did some tests, however, I cannot trace the problems. For example, perfect separation is not present as this graph clearly shows (math4).
graph.png
When I estimate the "normal" CLPM, it works fine, no errors or anything, the results also make sense from a theoretical point of view to me (as long as I do not declare the binary variables as ordered). So maybe the problem is indeed the RIs with binary variables. I am now looking into alternatives. I would like to hear your opinion about the CL2PM model if you have any (a CLMP but t1 variables can also influence t3 variables directly). See https://psyarxiv.com/6f85c/ page 8. The cited article is, however, not able to give a general recommendation (CLPM vs CL2PM vs RI-CLPM). My question is: from a theoretical point of view, how would one best interpret the effect of math1 on math4, that is, apparently, independent of math2?

Best wishes
Reply all
Reply to author
Forward
0 new messages