Regression results from SEM and fit measures

Tabea Schoeler

unread,

Feb 17, 2016, 6:38:24 AM2/17/16

to lavaan

Dear all,

I have the following cross-lagged panel model, in which I want to control for 3 covariates (meds + drugs + can_premor) in the regression paths for both outcomes (can_Y2 and relapse_Y2).

I ran the following model:

M1 <- '

# Lagged path

can_Y2 ~ relapse_Y1

relapse_Y2 ~ can_Y1

# Autoregressive path

can_Y2 ~ can_Y1

relapse_Y2 ~ relapse_Y1

# Covariates

can_Y2 ~ meds + drugs + can_premor

relapse_Y2 ~ meds + drugs + can_premor'

Fit measures:

chisq pvalue cfi tli rmsea

0 NA 1 1 0

Based on the fit estimates, this model doesn’t seem appropriate to me. I also ran one model that included the covariates only for one of the main outcomes and this model seem to fit better:

M2 <- '

# Lagged path

can_Y2 ~ relapse_Y1

relapse_Y2 ~ can_Y1

# Autoregressive path

can_Y2 ~ can_Y1

relapse_Y2 ~ relapse_Y1

# Covariates

relapse_Y2 ~ meds + drugs + can_premor'

Fit measures:

chisq pvalue cfi tli rmsea

1.750 0.626 1.000 1.054 0.000

The two models give me the same regression estimates for the variables included, but the model fits are different. So I was wondering whether I can I include my covariates only in one of the regression paths or am I missing something in my initial models (M1)?

Any advice in this regard would be much appreciated!

Kind regards,

Tabea

Terrence Jorgensen

unread,

Feb 18, 2016, 3:48:22 AM2/18/16

to lavaan

Fit measures:

chisq pvalue cfi tli rmsea

0 NA 1 1 0

Based on the fit estimates, this model doesn’t seem appropriate to me.

The fit measures indicator your model fits perfectly, and the p value is missing because you have no degrees of freedom. This is because your model is saturated (you are estimating as many covariance-structure parameters as you have observed (co)variances).

I also ran one model that included the covariates only for one of the main outcomes and this model seem to fit better:

Fit measures:

chisq pvalue cfi tli rmsea

1.750 0.626 1.000 1.054 0.000

No, the model fits worse, because the chi-squared is now greater than zero. But the p value says the fit is not significantly worse, and the other fit indices say your model still fits approximately the same (i.e., as well as can be expected). The reason this model has 3 df is because you are estimating the effects of 3 covariates on relapse_Y2, but you omit those paths (i.e., fix them to zero) for can_Y2.

The two models give me the same regression estimates for the variables included, but the model fits are different.

If all you are interested in are the regression estimates for the sake of making predictions, then it is not necessary to be concerned with model fit. These fit measures are for using the data to test whether your theory is compatible with what you observe. If you have a theory that suggests certain paths can be constrained to zero, then adding those constraints will make your model over-identified instead of just-identified. You can find more information in path analysis chapters of introductory SEM textbooks -- the folks on SEMNET can volunteer good suggestions, if you don't already have one. SEMNET is a great resource for general questions about SEM.

Regarding lavaan specifically, it is worth noting that you are probably estimating more parameters than you explicitly put in your model syntax. If you use the sem() function, then you are automatically estimating the residual covariance among the Y2 variables. In the lavaan() function, this is turned of (auto.cov.y = FALSE), but you can set it to TRUE to get the same results as from sem(), but there is no real reason to fix that residual covariance to zero -- saying your set of predictors completely explains the relationship between the two Y2 variables is a pretty big claim. You can always see which parameters are being freely estimated by checking the parameter table:

example(cfa)
parTable(fit)

or by checking which elements of each parameter matrix are free (the ones that are not 0):

lavInspect(fit, "free")

Terry

Tabea Schoeler

unread,

Feb 22, 2016, 11:24:41 AM2/22/16

to lavaan

Dear Terry,

Thank you very much indeed for your very helpful answer – that made a lot of sense to me! I now ran the saturated model and then one model in which I dropped all paths that were not significant in the initial one.

I have one more question, which already came up in this group but I haven’t yet managed to solve it: When I include the correlation term (can_Y1 ~~ relapse_Y1),

M1 <- '

# Lagged path

can_Y2 ~ relapse_Y1

relapse_Y2 ~ can_Y1

# Autoregressive path

can_Y2 ~ can_Y1

relapse_Y2 ~ relapse_Y1

# Correlation

can_Y1 ~~ relapse_Y1'

can_Y2 ~~ relapse_Y2'

I get the following error message:

Error in vnames(FLAT, type = "ov.x", ov.x.fatal = TRUE) :

lavaan ERROR: model syntax contains variance/covariance/intercept formulas

involving (an) exogenous variable(s): [relapse_Y1 can_Y1];

Please remove them and try again.

I can’t use fixed.x=FALSE, since I am using estimator="WLSMV".

I was just wondering whether there is a way of including the correlation between can_Y1 and relapse_Y1 in the model?

Many thanks again for your advice.

Kind regards,

Tabea

Terrence Jorgensen

unread,

Feb 23, 2016, 3:44:54 AM2/23/16

to lavaan

I can’t use fixed.x=FALSE, since I am using estimator="WLSMV".

I was just wondering whether there is a way of including the correlation between can_Y1 and relapse_Y1 in the model?

You don't need to include it in the model. They are exogenous, so their correlation is assumed to be a known value (the sample correlation). The fact that they aren't estimated does not mean that the correlation is fixed to zero. It is just that their effects are partialed out. The WLS criterion is effectively minimizing the error variance of the endogenous variables only (just like in regular regression, except OLS regression is unweighted least squares).

Terry

Tabea Schoeler

unread,

Feb 26, 2016, 1:38:09 PM2/26/16

to lavaan

Dear Terry,

Again, many thanks for your very helpful response, that is very much appreciated!