lavaan syntax for control variables in mediation

663 views
Skip to first unread message

AvidLavaanUser

unread,
Jul 15, 2019, 6:25:33 PM7/15/19
to lavaan
Hi lavaan developer and community,

I am not sure if control variables need to be listed only to the line for a mediator or to the outcome (direct effect) as well. I don't see this information after searching for it here and Googling for hours. I would really appreciate your clarification on if this syntax is correct for adding two control variables for a mediation. 

X: predictor
M: mediator
Y: outcome
CV1: control variable 1
CV2: control variable 2

1) adding controls to only the mediator

# direct effect
Y ~ c*X
# mediator
M ~ a*X + c1*CV1 + c2*CV2 
Y ~ b*M
# indirect effect (a*b)
ab := a*b  # do I need to define any indirect effect from the controls?
# total effect
total := c + (a*b)

2) adding controls to the mediator and the direct effect

# direct effect
Y ~ c*X + c1*CV1 + c2*CV2
# mediator
M ~ a*X + c1*CV1 + c2*CV2 
Y ~ b*M
# indirect effect (a*b)
ab := a*b
ac1 := a*c1 # Do I need to define indirect effect terms by controls and b as well?
ac2 := a*c2 
# total effect
total := c + (a*b) + ac1 + ac2  # Do I need to define total effect terms with controls and b as well?

Thanks in advance,
Avid lavaan user

Terrence Jorgensen

unread,
Jul 18, 2019, 12:00:37 AM7/18/19
to lavaan
I am not sure if control variables need to be listed only to the line for a mediator or to the outcome (direct effect) as well.

I think the interpretation would be confusing if only one of the paths in an indirect effect were estimated controlling for covariates.

do I need to define any indirect effect from the controls?

No.

# direct effect
Y ~ c*X + c1*CV1 + c2*CV2
# mediator
M ~ a*X + c1*CV1 + c2*CV2 

Don't give them the same labels.  There is no reason CV1 has the same slope/effect on M and on Y.

Avid lavaan user

Not sure why so many people post on this forum without identifying themselves...

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

AvidLavaanUser

unread,
Jul 19, 2019, 8:20:09 PM7/19/19
to lavaan
Thank you so much Terrence, I really appreciate it.

I suspect many people might want to ask questions anonymously because people are generally worried that their stat questions will make them look stupid. Thanks for understanding.

AvidLavaanUser

unread,
Jul 26, 2019, 5:50:11 PM7/26/19
to lavaan
Hi lavaan developer and community,

Now lavaan gives me an error when I added control variables in my mediation paths: "lavaan ERROR: duplicate model element in:" I learned that I can list the same controls only b or c path, not both paths, but I still do not know which path is statistically better to add the controls: b or c path. I heard from a statistician that one needs to know how lavaan configures/programs controls in mediation to know which path is better to add control variables statistically. I searched for this community log and other websites for R, but I still do not see how lavaan configures/programs controls in mediation. Could anyone explain lavaan configures/programs controls in mediation, so whether b or c path is better to list with controls in mediation?

X: predictor
M: mediator
Y: outcome
CV1: control variable 1
CV2: control variable 2

0) the syntax that received an error: b and c paths have the same controls
# direct effect
Y ~ c*X + c3*CV1 + c4*CV2 
# mediator
M ~ a*X + c1*CV1 + c2*CV2 
Y ~ b*M + c5*CV1 + c6*CV2 

1) adding controls in a and b paths, not c path
# direct effect
Y ~ c*X
# mediator
M ~ a*X + c1*CV1 + c2*CV2 
Y ~ b*M + c3*CV1 + c4*CV2 

2) adding controls in a and c path, not b path
# direct effect
Y ~ c*X + c3*CV1 + c4*CV2 
# mediator
M ~ a*X + c1*CV1 + c2*CV2 
Y ~ b*M 

I would really appreciate your guidance.

Thanks,
AvidLavaanUser

Alex Schoemann

unread,
Jul 26, 2019, 7:31:03 PM7/26/19
to lavaan
Any controls on the "b" or "c" path (really its c' in this model) are in the same regression equation. Both b and c' are estimated in the same equation (they both have Y as the outcome variable) so you only need to list the covariates once. In fact you could simplify your syntax to:

M ~ a*X + c1*CV1 + c2*CV2 
Y ~ c*X + b*M + c5*CV1 + c6*CV2 

Alex

AvidLavaanUser

unread,
Jul 26, 2019, 8:38:53 PM7/26/19
to lavaan
Thanks Alex, but I have a follow-up question.

The syntax you wrote, doesn't Y line have the two controls twice? Since M already is regressed by the two controls. M ~ a*X + c1*CV1 + c2*CV2 

Y ~ c*X + b*(a*X + c1*CV1 + c2*CV2) + c5*CV1 + c6*CV2 

IF this is true, then do I need to list the two controls for just one time? Either M or Y. 

M ~ a*X + c1*CV1 + c2*CV2 
Y ~ c*X + b*M

Thanks,
AvidLavaanUser

Alex Schoemann

unread,
Jul 29, 2019, 9:58:04 AM7/29/19
to lavaan
Okay, it seems like there's a bit of a misunderstanding here. A basic mediation model (like this one) is simply two regression equations (with intercepts removed for clarity):

M = a*x
Y = c'*x + b*M

You can introduce covariates on either M or Y or both. If you want to control for covariates when predicting both M and Y then you need to include them in both regression equations (which is what the syntax I posted earlier does). I'd recommend reading up on mediation in general (Andrew Hayes's books/articles are a nice place to start) to gain a deeper understanding of how these work.

Alex

AvidLavaanUser

unread,
Jul 29, 2019, 6:09:45 PM7/29/19
to lavaan
Alex, thanks for your reply. But that's not what I asked.


Yves (or anyone who knows lavaan's logic),

Could you please explain how lavaan configures/programs control variables in mediation to show the error "lavaan ERROR: duplicate model element in:" when control variables are listed both b and c path in mediation? I need to check this information to confidently report the results in my manuscript and decide which path is better to add control variables statistically, since lavaan allows the same control variables only b or c path (which differs from other programs). 

Nickname

unread,
Jul 30, 2019, 8:30:25 AM7/30/19
to lavaan
Avid,
  Variables are added to equations, not to effects.  It is just a syntactic convenience that you are allowed to have multiple lines for the same dependent variable.  Separate lines with the same dependent variable all refer to the same equation.  To see this, try the following.

require(lavaan)

foo
<- '
  #1) adding controls in a and b paths, not c path

  # direct effect
  Y ~ c*X
  # mediator
  M ~ a*X + c1*CV1 + c2*CV2
  Y ~ b*M + c3*CV1 + c4*CV2
' # end foo


bar
<- '
  #2) adding controls in a and c path, not b path

  # direct effect
  Y ~ c*X + c3*CV1 + c4*CV2
  # mediator
  M ~ a*X + c1*CV1 + c2*CV2
  Y ~ b*M
' # end bar

# Compare parameters
fooTable
<- lavaanify(foo)
barTable
<- lavaanify(bar)
fooTable
barTable


# Compare matrix of effect coefficients
fooNoFit
<- lavaan(foo, doFit=FALSE)
barNoFit
<- lavaan(bar, doFit=FALSE)

lavInspect
(fooNoFit, 'free')$beta
lavInspect
(barNoFit, 'free')$beta


Notice that by moving the control variables in your model syntax, all you have done is to permute the order of the parameters.  Your first try attempted to add the same two variables to the same equation, the equation for y, in two different places, giving each of two parameters two distinct names.  To see this, consider where you might place all four parameters in the beta matrix (each row represents an equation).  To see the full parameterization, drop the "$beta" from the end of the last two lines.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/


AvidLavaanUser

unread,
Aug 14, 2019, 4:36:08 PM8/14/19
to lavaan
Thanks Keith, some codes did not work, but I got the idea and tested it.


lavaan users,

Is there a way to add a nominal variable with three levels to mediation syntax? I am trying to add another control variable (nominal, site information with three sites) in my mediation path models. I checked lavaan's website about categorical variables, but I don't see any information on what to do for nominal variables (the website says something about ordinal, ordered). My variable (site.f) has three sites with letters, and when I added it into my syntax below, I got this error below.

Error in lav_data_full(data = data, group = group, cluster = cluster,  : 
  lavaan ERROR: unordered factor(s) with more than 2 levels detected as exogenous covariate(s): site.f

X: predictor
M: mediator
Y: outcome
CV1: control variable 1
CV2: control variable 2
site.f: control variable I am adding

# direct effect
Y ~ c*X
# mediator
M ~ a*X + za*CV1 + ya*CV2 + wa*site.f
Y ~ b*M + zb*CV1 + yb4*CV2 + wb*site.f


If lavaan does not support using nominal variables, is there a way to control for sites for mediation? 
I noticed a significant difference in people's missing data based on sites; more participants from one site completed the measures than the others. I need to control for sites to proceed my mediational path models. I pasted the chi-square tests from two groups: those who answered all measures (Complete) vs those who did not answer any measure (Missing).
  • more Bloomington residents completed all measures and less Knoxville residents completed all measures.
    • > chisq.test(cm.st) # sig

      Pearson's Chi-squared test

      data:  cm.st
      X-squared = 8.506, df = 2, p-value = 0.01422
      > csq.cm.st<-chisq.test(cm.st)
      > csq.cm.st$expected
                       
                                      Complete  Missing
        Bloomington, IN 98.92857 26.07143
        Knoxville, TN       91.01429 23.98571
        Nashville, TN       87.05714 22.94286
      > csq.cm.st$observed
                       
                                   Complete Missing
        Bloomington, IN      108      17
        Knoxville, TN            91      24
        Nashville, TN            78      32
      > round(csq.cm.st$residuals,3)
                       
                                   Complete Missing
        Bloomington, IN    0.912  -1.777
        Knoxville, TN         -0.001   0.003
        Nashville, TN         -0.971   1.891
I would really appreciate any guidance you can provide.

AvidLavaanUser

Terrence Jorgensen

unread,
Aug 17, 2019, 9:37:43 AM8/17/19
to lavaan
I checked lavaan's website about categorical variables, but I don't see any information on what to do for nominal variables 

The very top section tells you to make dummy codes for k-1 categories to use as predictors.  You would include both in your syntax, and define indirect effects separately from each of them through the mediator (both interpreted as effects of being in that group, relative to your chosen reference category).  
Reply all
Reply to author
Forward
0 new messages