Path analysis with observed variables: Covariance estimations for exogenous variables

484 views
Skip to first unread message

Nick Rosemarino

unread,
Jun 8, 2023, 6:33:12 PM6/8/23
to lavaan
Hi all,

My question is in regards to estimation of covariances for exogenous variables in a path model. I am trying to estimate the following model below:

Screen Shot 2023-06-08 at 2.16.20 PM.png

I noticed that unlike an SEM with latent variables, lavaan does not appear to automatically estimate a covariance between my exogenous variables when running a path analysis with observed variables. Since I normally see path diagrams include covariances between exogenous variables, I went ahead and wrote that into the code as can be seen below (NM_Density~~NR_Density): 

DNT.second.order.PATH <- '
#Regressions
Per_org_supp ~ a1*NM_Density + Gender + Education + Age
Job_Satisfaction ~ b1*NM_Density + Gender + Education + Age
Work_Goal_Satisfaction ~  c1*NM_Density + Gender + Education + Age
Burnout ~  d1*NR_Density + Gender + Education + Age
Turnover_Intentions ~  e1*NR_Density + Gender + Education + Age
Conflict_Culture ~ f1*NR_Density + Gender + Education + Age
NM_Density~~NR_Density'

set.seed(62973)
fDNT.second.order.PATH <- cfa(DNT.second.order.PATH,data=d_,se="robust.sem", estimator = 'MLM', bootstrap=2000)

To my surprise, I received the following error message. Yet, when I deleted the code for the covariance (NM_Density~~NR_Density), my model ran just fine. 
Warning message:
In lav_partable_vnames(FLAT, "ov.x", warn = TRUE) : lavaan WARNING:
    model syntax contains variance/covariance/intercept formulas
    involving (an) exogenous variable(s): [NM_Density NR_Density];
    These variables will now be treated as random introducing
    additional free parameters. If you wish to treat those variables
    as fixed, remove these formulas from the model syntax. Otherwise,
    consider adding the fixed.x = FALSE option.



However, to my surprise, when I ran the model again, this time without the control variables (code below), I did not receive this error message, and Lavaan appeared to be able to estimate the covariance parameter. Does anyone have any insight that would help me make sense of this? I think ideally I'd like to have the model run with both my control variables and while also estimating the exogenous variable covariance; yet that currently seems not possible. Thank you!!! 

DNT.second.order.PATH.2 <- '
#Regressions
Per_org_supp ~ a1*NM_Density
Job_Satisfaction ~ b1*NM_Density
Work_Goal_Satisfaction ~  c1*NM_Density
Burnout ~  d1*NR_Density
Turnover_Intentions ~  e1*NR_Density
Conflict_Culture ~ f1*NR_Density
NM_Density~~NR_Density'

set.seed(62973)
fDNT.second.order.PATH.2 <- cfa(DNT.second.order.PATH.2,data=d_,se="robust.sem", estimator = 'MLM', bootstrap=2000)





Shu Fai Cheung (張樹輝)

unread,
Jun 8, 2023, 9:59:35 PM6/8/23
to lavaan
> Since I normally see path diagrams include covariances between exogenous variables

If your goal is to have the covariances between exogenous variables, these are two possible options.

First, as suggested by the warning message, add fixed.x = FALSE.

Some of the results may change. It is because exogenous variables are treated as random (values may change if a new sample is drawn, like other observed variables in the model). Incremental fit measures may also change because the baseline model changed. Adding fixed.x = FALSE also means that the distributional assumption need to be applied to all exogenous variables too. This may not (or may?) be a serious problem given that you used MLM and bootstrap SEs (but I am not sure as there are many many possibilities of nonnormality). If these consequences are OK with you, then add fixed.x = FALSE is the simplest solution. You will see the variances and covariances of all exogenous variables, as well as their p-values and CIs.

Second, you can run lavInspect( fDNT.second.order.PAT, "cov.all") to get the implied varainces and covariances of all variables, including the exogenous variables. (Use "cor.all" in place of "cov.all" to get the correlations. You can learn more about lavInspect() from its help page.)

This works even with fixed.x = TRUE, the default. You will not see p-values or CIs for the variances and covariances of the exogenous observed variables because they are not random variables and so their variances and covariances are fixed to their sample values (as computed by lavaan). However, you do not need to run the analysis again. If you only need the values of the variances and covariances, not their p-values and CIs, want to keep your original results, and are OK with the default settings, this option is what you need.

A few remarks:
- This behavior is not unique to lavaan. Mplus, by default, also uses the fixed.x = TRUE approach (except for some conditions) for observed exogenous variables.
- You can search this group to learn more about the discussion on the fixed.x issue.
- (Not related to fixed.x) It is a good practice to always set the argument iseed to an arbitrary positive integer when doing bootstrapping (see the help page of lavOptions on this argument), to ensure that you can reproduce the results. Otherwise, every time you fit the model again, you may get different bootstrapping results.

My two cents.

-- Shu Fai

Shu Fai Cheung (張樹輝)

unread,
Jun 8, 2023, 10:00:57 PM6/8/23
to lavaan
Oh ... I am really really sorry for my oversight. You did use `set.seed()` before calling sem() to make the results reproducible. Please accept my apology for the last remark.

-- Shu Fai

Nick Rosemarino

unread,
Jun 8, 2023, 10:26:44 PM6/8/23
to lavaan
Hi Shu Fai, 

This was extremely helpful. Thank you very much for your well thought out response!! 

Reply all
Reply to author
Forward
0 new messages