Cross-lagged Mediation Model using Lavaan

1,245 views
Skip to first unread message

David Freire

unread,
Apr 6, 2020, 10:52:16 PM4/6/20
to lavaan
Hi everyone!

I had a question regarding how to exactly construct a cross-lagged mediation model using lavaan. My current project I am working on involves 3 variables measured across 4 waves of data. I am attempting to create a model that accounts for each of the variables' correlations across waves (e.g. x1->x2->x3->x4) as well as 2 cross lagged mediations (x1->y2->z3 and x2->y3->z4) and mediations within each wave (e.g. x1->y1->z1). The following code is what I have so far:

longForm =
#Regressions
'w2DEPRESS ~ w2DISCRIM + w2DEPRESS + w2PSSM
 w3DEPRESS ~ w3DISCRIM + w3DEPRESS + w3PSSM
 w4DEPRESS ~ w4DISCRIM + w4DEPRESS + w4PSSM
 w5DEPRESS ~ w5DISCRIM + w5DEPRESS + w5PSSM
 
 #Covariances
 w5DISCRIM ~~ w2DISCRIM + w3DISCRIM + w4DISCRIM
 w5DEPRESS ~~ w2DEPRESS + w3DEPRESS + w4DEPRESS
 w5PSSM ~~ w2PSSM + w3PSSM + w4PSSM
 
 #Direct Effects
 w4DEPRESS ~ e*w2DISCRIM
 w5DEPRESS ~ f*w2DISCRIM
 
 #Mediators
 w3PSSM ~ a*w2DISCRIM
 w4DEPRESS ~ b*w3PSSM
 w4PSSM ~ c*w3DISCRIM
 w5DEPRESS ~ d*w4PSSM
 
 #Mediations
 med1 := a*b
 med2 := c*d
 mediated := med1 + med2
 
 #Total Effect
 total := e + f + a*b + c*d'
longModel = lavaan::sem(longForm, data=seasons.data, missing = 'fiml')
summary(longModel, fit.measures=TRUE)

Anyone have any suggestions?
I tried going off this website's code (https://jflournoy.github.io/2017/10/20/riclpm-lavaan-demo/) but couldn't figure out how to translate that into my syntax.

Any help is appreciated!

Nickname

unread,
Apr 7, 2020, 9:39:35 PM4/7/20
to lavaan
David,
  It is hard to intuit exactly what model you want to specify.  The blog post you linked to does not present a cross-lagged mediation model but rather an alternative to it.  Your model requires the strong assumptions described therein and in the article cited by the blog post.  Also, I cannot help you with the sem() function because I cannot keep all the defaults straight in my head.  However, here are some suggestions, working from your model description.

  If I follow your description, you want a model that contains the following 13 effects:
x1->x2->x3->x4
y1->y2->y3->y4
z1->z2->z3->z4
x1->y2->z3
x2->y3->z4

It seems natural to add two more effects:
y1 -> z2
x3 -> y4

You can specify the above 15 effects as follows:

x2 ~ x1
y2 ~ y1 + x1
z2 ~ z1 + y1
x3 ~ x2
y3 ~ y2 + x2
z3 ~ z2 + y2
x4 ~ x3
y4 ~ y3 + x3
z4 ~ z3 + y3

In your model syntax, you had each variable affecting itself at the same time.  Leslie Hayduk has advocated such reflexive effects but it is not clear that this is what you want here.  There are no such effects in your description.  I omit them, but you can add them if intended.

Your specification of the covariances also looks odd.  You correlate the residual of the time 4 variables with the residuals of the same variable at each of the prior times.  However, you do not covary any of the other pairs of residuals.  I believe that your stability coefficients (effects from a variable at one time to the same variable at the next time) take care of the covariance from time to time within the same variable.  So, I suspect that what you actually want is to covary the residuals of different variables at the same time.  This accounts for time-specific shocks that might affect all the variables for the same person.

x1 ~~ y1 + z1
y1 ~~ z1
x2 ~~ y2 + z2
y2 ~~ z2
x3 ~~ y3 + z3
y3 ~~ z3
x4 ~~ y4 + z4
y4 ~~ z4

That gives you 12 covariances.  I assume that the sem() function defaults take care of the 12 variances.  You can then specify indirect effects and total effects as desired.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/



Gavin T. L. Brown

unread,
Apr 7, 2020, 9:48:47 PM4/7/20
to lavaan
Hi Keith
thanks for tidying up the model with lavaan-style syntax; i prefer to see the model as a drawing, because it seems easier to interpret what has been specified.
it seems to me that what you have set up is a trivariate, auto-regressive model with cross-lagging
this chapter gives a nice explanation of how these can be done
Curran, P. J., & Bollen, K. A. (2001). The best of both worlds: Combining autoregressive and latent curve models. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 107-135). Washington, DC: APA.
i agree with your strategy of correlating residuals within each time period to ensure the correlations between x, y, and z are captured. I wouldn't want to correlate residuals across time when the time element is expressed by the auto-regressive and cross-lagged elements.

Prof. Gavin T L Brown, PhD

The University of Auckland
Honorary Professor, Dept. of Curriculum & Instruction, Education University of Hong Kong

Affiliated Professor, Dept. of Applied Educational Sciences, University of Umea, Sweden

Frontiers in Education Chief Section Editor: Assessment, Testing, and Applied Measurement

Routledge author: https://www.routledge.com/authors/i14875-gavin-brown

David Freire

unread,
Apr 8, 2020, 1:03:05 PM4/8/20
to lavaan
Thank you so much for answering my question. Your edits all made sense! Here is what I have so far but I am struggling with how to use the coefficients to fix the error of "duplicate model elements". This is what I have from the edited code:

practicemodel3 =
  '
#Regressions  
 w3DISCRIM ~ w2DISCRIM
 w3PSSM ~ w2PSSM + w2DISCRIM
 w3DEPRESS ~ w2DEPRESS + w2PSSM
 w4DISCRIM ~ w3DISCRIM
 w4PSSM ~ w3PSSM + w3DISCRIM
 w4DEPRESS ~ w3DEPRESS + w3PSSM
 w5DISCRIM ~ w4DISCRIM
 w5PSSM ~ w4PSSM + w4DISCRIM
 w5DEPRESS ~ w4DEPRESS + w4PSSM

 #Direct Effects
 w4DEPRESS ~ e*w2DISCRIM
 w5DEPRESS ~ f*w3DISCRIM
 w2DEPRESS ~ o*w2DISCRIM
 w3DEPRESS ~ p*w3DISCRIM
 w4DEPRESS ~ q*w4DISCRIM
 w5DEPRESS ~ r*w5DISCRIM
 direct := e*f*o*p*q*r
 
 #Mediators
 w3PSSM ~ a*w2DISCRIM
 w4DEPRESS ~ b*w3PSSM
 w4PSSM ~ c*w3DISCRIM
 w5DEPRESS ~ d*w4PSSM
 w2PSSM ~ g*w2DISCRIM
 w2DEPRESS ~ h*w2PSSM
 w3PSSM ~ i*w3DISCRIM
 w3DEPRESS ~ j*w3PSSM
 w4PSSM ~ k*w4DISCRIM
 w4DEPRESS ~ l*w4PSSM
 w45PSSM ~ m*w5DISCRIM
 w5DEPRESS ~ n*w5PSSM
 
 #Mediations
 med1 := a*b
 med2 := c*d
 med3 := g*h
 med4 := i*j
 med5 := k*l
 med6 := m*n
 mediated := med1 + med2 + med3 + med4 + med5 + med6
 
 #Covariances
 w2DISCRIM ~~ w2PSSM + w2DEPRESS
 w2PSSM ~~ w2DEPRESS
 w3DISCRIM ~~ w3PSSM + w3DEPRESS
 w3PSSM ~~ w3DEPRESS
 w4DISCRIM ~~ w4PSSM + w4DEPRESS
 w4PSSM ~~ w4DEPRESS
 w5DISCRIM ~~ w5PSSM + w5DEPRESS
 w5PSSM ~~ w5DEPRESS
  
 #Total Effect
 total := direct + mediated'

Should I move the coefficients to the first regressions? Thank you again for all of your help!

David Freire

unread,
Apr 8, 2020, 1:15:02 PM4/8/20
to lavaan
Additionally, when I tried to move a-d coefficients to the top regressions I received the following error:

2: In lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= -7.790405e+09) is smaller than zero. This may be a symptom that
    the model is not identified.

Nickname

unread,
Apr 8, 2020, 11:52:40 PM4/8/20
to lavaan
David,
Your first error is a model syntax error.

+ lavaanify(practicemodel3)
Error in lavParseModelString(model.syntax = model, warn = warn, debug = FALSE) :
  lavaan ERROR
: duplicate model element in: w3PSSM~a*w2DISCRIM

You can only specify a parameter once in lavaan model syntax.  This seems like a good idea because if users could specify the same parameter over and over the result would be scripts that were very hard to read both by other users and by lavaan.  The error is telling you that you have specified the same parameter twice, which indeed you have.


w3PSSM
~ w2PSSM + w2DISCRIM
...

w3PSSM
~ a*w2DISCRIM

There is no need to include the second line once you have written the first.  The parameter label can be added in the first line.

Your second error is not a model syntax error, it is a convergence error.  This error can occur due to a problem in model specification, such as under-identification, as lavaan suggested in the error itself.  It can also occur as an empirical mater specific to your data if the model is very badly misspecified.

It is very hard work to specify a recursive path model that is not identified, but not impossible.  My suggestion would be to use the lavaanify() function with your revised model syntax and then check the list of parameters for any surprises.  If you are using any convenience parameters with the function that you are using to fit the model (e.g., lavaan(), sem()), then you need to include that with lavaanify too.  One possibility is that you are not using a convenience feature to specify the residual variances that are omitted from your syntax, and the absence of residual variances is causing the model to fail.  (I can only guess based on the information provided.)

It can be hard to debug a complex model.  If the above does not reveal the problem, my suggestion would be to trim the model little by little (perhaps removing one time point at a time) until you arrive at the minimal model which reproduces the error.  See if you can spot the problem in the simplified model.  Then scale back up once you get the minimal model to run.

David Freire

unread,
Apr 9, 2020, 12:37:55 AM4/9/20
to lavaan
Hi Keith!

Thank you so much for your help with this. You are truly a life saver. I have some beginner level questions so I apologize in advance but: (1) I used the lavaanify() function and did not see any parameters that were out of place (I'm assuming that is all the lavaanify function does) so I am probably gonna move to your other suggestions. (2) What specific convenience features are you referencing that lavaan uses to control the unspecified residuals? Thank you again for all of your help!

Sincerely,
David Freire

Nickname

unread,
Apr 10, 2020, 12:15:28 PM4/10/20
to lavaan
David,
  Consider the following script to which I will refer in answering both questions.  This is a modification of the example in the lavaan() help file and is thus self-contained.

# The Holzinger and Swineford (1939) example
HS
.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '



lavaanify
(HS.model,
             
auto.var=TRUE, auto.fix.first=TRUE,
             
auto.cov.lv.x=TRUE)
             
lavaanify
(HS.model,
             
auto.var=FALSE, auto.fix.first=TRUE,
             
auto.cov.lv.x=TRUE)
             
fit
<- lavaan(HS.model, data=HolzingerSwineford1939,
             
auto.var=FALSE, auto.fix.first=TRUE,
             
auto.cov.lv.x=TRUE)


Note that the unique variances of the observed variables are not explicitly included in the model syntax.  I ran lavaanify() twice, once with the autol.var convenience parameter left on, and once with it shut off.  If you compare the output to the screen from the two calls of the lavaanify() function, you will see that these parameters are free in one but not the other.  You should see similar rows with your model and can inspect these to see if the residual variances (in your case) are free in your model despite being absent from your model syntax.

Free:
   id     lhs op     rhs user block
group free ustart exo label plabel
10 10      x1 ~~      x1    0     1     1    7     NA   0        .p10.

Not Free:
10 10      x1 ~~      x1    0     1     1    0      0   0        .p10.


In some ways I am the worst person to answer questions about convenience parameters because I generally try to avoid using them.  As a result, I am not fluent with them.  These are described in the lavOptions help file.  However, they require some practice using them.  For instance, based on the description in the help file which refers to latent variables, one might not have expected the above behavior with respect to observed variables.  These sorts of things only become second nature with experience using the parameters.  The lavaanify() function can be helpful in confirming that lavaan is parsing your model description you intended.  In the case of a discrepancy, lavaanify is more or less the gold standard for correct interpretation and the user needs to adjust to adopt that interpretation.

Finally, if you try to fit the model with the unique variances not free, you get the following error.

Error in lav_model_estimate(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan ERROR
: initial model-implied matrix (Sigma) is not positive definite;
  check your model
and/or starting parameters.

This is not surprising because the specified model implies that the common factor completely reproduces the observed variables with no error, and do so with no variance.  This is not the same error that you produced, but it may reflect a similar underlying issue.  This is just a hypothesis but one that you can check using lavaanify().

David Freire

unread,
Apr 13, 2020, 4:21:47 PM4/13/20
to lavaan
Hi Keith!

Thank you so much for your help! Quick question, if I wanted to rerun the model with only the vertical paths (i.e. just have the cross lagged mediations and the vertical mediations within the concurrent waves of data) would I need to change the covariances at all? I am attempting to run it with the aforementioned covariances like so but it is not converging:

practicemodel4<-
  '
 #Direct Effects
 w4DEPRESS ~ e*w2DISCRIM
 w5DEPRESS ~ f*w3DISCRIM
 w2DEPRESS ~ o*w2DISCRIM
 w3DEPRESS ~ p*w3DISCRIM
 w4DEPRESS ~ q*w4DISCRIM
 w5DEPRESS ~ r*w5DISCRIM
 direct := e + f + o + p + q + r
 
 #Mediators
 w3PSSM ~ a*w2DISCRIM
 w4DEPRESS ~ b*w3PSSM 
 w4PSSM ~ c*w3DISCRIM
 w5DEPRESS ~ d*w4PSSM 
 w2PSSM ~ g*w2DISCRIM
 w2DEPRESS ~ h*w2PSSM
 w3PSSM ~ i*w3DISCRIM
 w3DEPRESS ~ j*w3PSSM
 w4PSSM ~ k*w4DISCRIM
 w4DEPRESS ~ l*w4PSSM
 w5PSSM ~ m*w5DISCRIM
 w5DEPRESS ~ n*w5PSSM
 
 #Mediations
 med1 := a*b
 med2 := c*d
 med3 := g*h
 med4 := i*j
 med5 := k*l
 med6 := m*n
 mediated := med1 + med2 + med3 + med4 + med5 + med6

 #Covariances
 w2DISCRIM ~~ w2PSSM + w2DEPRESS
 w2PSSM ~~ w2DEPRESS
 w3DISCRIM ~~ w3PSSM + w3DEPRESS
 w3PSSM ~~ w3DEPRESS
 w4DISCRIM ~~ w4PSSM + w4DEPRESS
 w4PSSM ~~ w4DEPRESS
 w5DISCRIM ~~ w5PSSM + w5DEPRESS
 w5PSSM ~~ w5DEPRESS
 
 #Total Effect
 total := direct + mediated'
practicemodelrun4 <- lavaan::sem(practicemodel4, data=seasons.edited, missing = 'fiml', fixed.x = FALSE)
summary(practicemodelrun4, fit.measures = TRUE)

Thank you again for all of your help!

Sincerely,
David 

Nickname

unread,
Apr 14, 2020, 11:19:17 AM4/14/20
to lavaan
David,
Three things caught my eye:

First, the first few effects in you code leap frog time points.  Are these theoretically motivated or empirically driven?  This may or may not be the best way to adjust the model if the preceding state of DISCRIM fails to screen off prior states of DISCRIM.  However, if this were the case that prior state fails to screen off earlier states, then one might expect that to be more consistent across time points.  (I have to wonder whether these effects were added to address misfit introduced by the removal of stability coefficients discussed below.)

Second, it is not immediately obvious to me that it makes sense to sum effects across different dependent variables, even if these are the same measurement at different times.  Again, I could be missing the larger picture.  However, effects on different variables need not be on comparable scales.  If you want a combined estimate across time points, perhaps what you want is to compute the indirect effect separately for each time point but to impose equality constraints on the direct effects across time points.

Third, if you remove the stability coefficients as you have done here, then you encode the following assumption.  The value of DEPRESS at time t+1 is independent of the value of DEPRESS at time t controlling for the values of DISCRIM and PSSM at time t.  This suggests the thought experiment that if you were to take two people with the same values of DISCRIM and PSSM at time t but very different values of DEPRESS at time t, then you would expect each of them to have the same probability distribution for DEPRESS at time t+1.  Is that plausible?  Is it plausible that DISCRIM and PSSM fully account for any continuity in DEPRESS over time?  If this is implausible, then you might not expect the resulting model to fit very well.

I hope that helps.  It is hard to intuit what you are trying to accomplish from what you posted.

David Freire

unread,
Apr 14, 2020, 11:57:17 AM4/14/20
to lavaan
Hi Keith! 

Thank you so much for your response! The first few effects leap time points because the first 2 direct effects are the cross-lagged direct effects from wave 2 to wave 4/wave3 to wave5. I was confused on your second point as I agree with your statements but am unsure how to "code my model to compute the indirect effects separately but impose equality constraints on the direct effects across time points". Would this be the best way to interpret hypotheses about these paths in which I want to see if the cross lagged mediations (wave2>3->4) are significant while accounting for the concurrent mediations (wave2DISCRIM-> wave2PSSM -> wave2DEPRESS) within each wave of data? The only reason I did removed the stability coefficients was because I did not want to include the horizontal paths of the model. Thank you in advance again for all of your help!

Sincerely,
David 

Nickname

unread,
Apr 15, 2020, 10:39:20 AM4/15/20
to lavaan
David,
I am not familiar with the use of 'cross lagged' in that way.  A single lag requires that variables at time t-1 affect variables at time t.  There is nothing inherently wrong with a model that assumes that every variable depends upon the variables at the two previous times, but one might expect that to be consistent across variables and times.

For simplicity, let us call your three variables x, m, and y with time indicies as suffixes.  We can label the effects a-d, with the numeric indices representing iterations, but not times.

x0 has two indirect effects on y2:
x0 -a0-> x1 -b0-> y2
x0 -c0-> m1 -d0-> y2

So the total effect is (a0 * b0) + (c0 * d0).  The mediated effect is c0 * d0.

You can then repeat the whole thing increasing each time index by 1.  Now you have four defined parameters.

Tot1 := (a0 * b0) + (c0 * d0)
Tot2 := (a1 * b1) + (c1 * d1)
Med1 := c0 * d0
Med2 := c1 * d1

To obtain a single estimate, you can constrain Tot1 == Tot2 and Med1 == Med2.  Of course, you could do the same for a * b which is the unmediated effect.  It seems potentially misleading to call it a direct effect in this context.

It only makes sense to consider, e.g., Tot1 and Tot2 as two estimates of the same parameter if the variables are scaled comparably.  If they are observed variables, they should be measured and coded the same way.  It is okay if there is empirical variation in the variances.  (This characterization assumes a framework that takes effects as basic and variances as derived phenomena in nature.)

I hope that helps,
Reply all
Reply to author
Forward
0 new messages