Correlated exogenous variables

Erica Rievrs

unread,

Jan 29, 2023, 6:45:36 AM1/29/23

to lavaan

I have a theoretical question about correlated exogenous variables. I have a model with two exogenous variables and three endogenous, both exogenous are regressed against all endogenous. The exogenous variables are very correlated (>0,7) but I cannot choose between one of them as they represent different facets of what I’m trying to represent. Therefore I wanted to include this information in the model, that the variables are correlated. But I realized that including the covariances ~~ among them didn’t change any of the coefficients so I believe that the model is not dealing with that correlation.

How should I approach the correlation of these exogenous variables? Are there specific guidelines or rationale that I should consider?

Shu Fai Cheung (張樹輝)

unread,

Jan 29, 2023, 7:05:55 AM1/29/23

to lavaan

This may be a lavaan question. Perhaps the covariance is already in your model, though not displayed because by default the exogenous variables are treated as fixed (fixed.x = TRUE) and so the covariance is not a free parameter? This is an example:

library(lavaan)
#> This is lavaan 0.6-13
#> lavaan is FREE software! Please report any bugs.
set.seed(68942)
n <- 100
x <- MASS::mvrnorm(n,
mu = c(x1 = 0, x2 = 0),
Sigma = matrix(c(1, .7, .7, 1), 2, 2))
y <- .3 * x[, 1] + .4 * x[, 2] + rnorm(n, 0, .8)
dat <- data.frame(x, y)
model <-
"
y ~ x1 + x2
"
fit <- sem(model, data = dat)
summary(fit)
#> lavaan 0.6.13 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 3
#>
#> Number of observations 100
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|)
#> y ~
#> x1 0.171 0.104 1.645 0.100
#> x2 0.360 0.110 3.278 0.001
#>
#> Variances:
#> Estimate Std.Err z-value P(>|z|)
#> .y 0.623 0.088 7.071 0.000

parameterEstimates(fit)
#> lhs op rhs est se z pvalue ci.lower ci.upper
#> 1 y ~ x1 0.171 0.104 1.645 0.100 -0.033 0.375
#> 2 y ~ x2 0.360 0.110 3.278 0.001 0.145 0.575
#> 3 y ~~ y 0.623 0.088 7.071 0.000 0.450 0.795
#> 4 x1 ~~ x1 1.025 0.000 NA NA 1.025 1.025
#> 5 x1 ~~ x2 0.642 0.000 NA NA 0.642 0.642
#> 6 x2 ~~ x2 0.919 0.000 NA NA 0.919 0.919

The covariance, x1 ~~ x2, is in the model but is fixed to its sample value (so are the variances of x1 and x2). It is not displayed by summary() but can be found in the output of parameterEstimates().

If you manually add the covariance, e.g., x1 ~~ x2 in the example above, fixed.x will be set to FALSE. The covariance will then be displayed because it is a free parameter:

model2 <-
"
y ~ x1 + x2
x1 ~~ x2
"
fit2 <- sem(model2, data = dat)
summary(fit2)
#> lavaan 0.6.13 ended normally after 15 iterations
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 6
#>
#> Number of observations 100
#>
#> Model Test User Model:
#>
#> Test statistic 0.000
#> Degrees of freedom 0
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|)
#> y ~
#> x1 0.171 0.104 1.645 0.100
#> x2 0.360 0.110 3.278 0.001
#>
#> Covariances:
#> Estimate Std.Err z-value P(>|z|)
#> x1 ~~
#> x2 0.642 0.116 5.516 0.000
#>
#> Variances:
#> Estimate Std.Err z-value P(>|z|)
#> .y 0.623 0.088 7.071 0.000
#> x1 1.025 0.145 7.071 0.000
#> x2 0.919 0.130 7.071 0.000

parameterEstimates(fit2)
#> lhs op rhs est se z pvalue ci.lower ci.upper
#> 1 y ~ x1 0.171 0.104 1.645 0.100 -0.033 0.375
#> 2 y ~ x2 0.360 0.110 3.278 0.001 0.145 0.575
#> 3 x1 ~~ x2 0.642 0.116 5.516 0.000 0.414 0.870
#> 4 y ~~ y 0.623 0.088 7.071 0.000 0.450 0.795
#> 5 x1 ~~ x1 1.025 0.145 7.071 0.000 0.741 1.309
#> 6 x2 ~~ x2 0.919 0.130 7.071 0.000 0.665 1.174

Maybe that is why the results are similar for other parameters in your model?

-- Shu Fai

Erica Rievrs

unread,

Jan 29, 2023, 12:24:15 PM1/29/23

to lavaan

Thank you a lot for your clear answer! Yes, they are present in the model parameters estimates, I had not realized that before.

Does that mean then that lavaan automatically deals with multicollinearity issues in the model?

I had the ideia to run a separate lm between those two exogenous variables and use the model residuos replacing one of them and rerun the models (just to be sure I'd have the same results).

Do you have by any chance any literature on that subject to indicate?

Erica

Yves Rosseel

unread,

Jan 30, 2023, 9:57:07 AM1/30/23

to lav...@googlegroups.com

On 1/29/23 18:24, Erica Rievrs wrote:
> Thank you a lot for your clear answer! Yes, they are present in the
> model parameters estimates, I had not realized that before.
> Does that mean then that lavaan automatically deals with
> multicollinearity issues in the model?

Not quite. lavaan (by default) treats exogenous (observed) 'x'
covariates as fixed. As a result, you don't need to worry about their
correlation structure. It is just not part of the model.

> Do you have by any chance any literature on that subject to indicate?

Think of a regression model Y ~ X1 + X2 + X3. Usually, we treat X1/X2/X3
as 'fixed', and we don't care if they are correlated or not. We just
want to estimate the regression coefficients (and the residual
variance). This corresponds to lavaan's fixed.x = TRUE.

Yves.

Alison Wu

unread,

Aug 21, 2023, 7:26:18 AM8/21/23

to lavaan

Hi,

I also have similar question so really happy to see this conversation.

I wonder if I explicitly set the exogenous variables correlated, what problem it might result in?

My situation is this - I have 3 exogenous variables and they are correlated (rs are around .5), 3 outcomes, 1 mediator, and some control variables (gender, ethnicity, edu etc). originally I didn't make the exogenous variables correlated, and the model fit is just okay (CFI is .899, TFI is under .9, RMSEA and SRMR are both under 0.05). However, if I make the fixed=false and exogenous variables correlated, the model fit measure improve (CFI is 0.93, TFI is 0.90, RMSEA and SRMR are both under 0.05). I am not sure if this is okay?

BW,

Alison

Keith Markus

unread,

Aug 22, 2023, 8:26:21 AM8/22/23

to lavaan

Allison,

I believe that you are describing the phenomenon that serves as the motivation for fixed.x.

Let's consider each of the four possibilities for your analysis:

Combination 1: fixed.x with no covariance between exogenous variables in the model.

This model is properly specified with respect to the exogenous variables because although you do not explicitly include the covariance, lavaan uses the covariance from the sample and does not treat it as a parameter in the model to be estimated. You fit indices do not depend on the covariance between the exogenous variables.

Combination 2: no fixed.x with no covariance between exogenous variables in the model.

This model is potentially misspecified with respect to the covariances between exogenous variables. There are exceptions, such as balanced factorial experiments in which exogenous variables are uncorrelated by design. However, if there are correlations, this model will be misspecified and may produce biased estimates of the parameters.

Combination 3: fixed.x with covariance between exogenous variables in the model.

This is a grammatically incorrect and will generate a syntax error. The covariance cannot be both included and excluded from the model.

Combination 4: no fixed.x with covariance between exogenous variables in the model.

This model represents the traditional analysis and is not misspecified in relation to the covariance between exogenous variables. However, one can argue that this traditional analysis can overstate the fit of the model to the data because the fit is calculated using moments that will generally be fit perfectly owing to saturation of that part of the model. As such, the fixed.x option is an attempt to remove these moments from the evaluation of fit to focus on the parts of the model that pose a meaningful test of the model. The extent to which fit will be overstated will depend on (a) how well specified the model is, (b) the proportion of parameters involving covariances between exogenoous variables, and (c) the size of those covariances.

That is how I understand it,

Keith

------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

roger akanji

unread,

Aug 22, 2023, 9:14:31 AM8/22/23

to lav...@googlegroups.com

Hi Keith

Thank you very much for the detailed write up.

I do appreciate it and follow through your suggestions.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/857ff1f5-0c4d-4134-868a-61c39fdb8871n%40googlegroups.com.

Reply all

Reply to author

Forward