Multi-collinearity and Confirmatory Factor Analysis

Luan Chau

unread,

Jun 14, 2023, 5:09:08 AM6/14/23

to lavaan

Dear all,

I hope you all are having a good day. I am running a CFA-SEM model with 3 latent variables (with 2 observed variables in 1, 3 observed variables in another, and 2 observed variables in the last one).

As one observed variable in one latent variable is highly correlated with the other two variables in another latent variable, I ran into an error message from R: "Warning messages: 1: In sqrt(var.lhs.value * var.rhs.value) : NaNs produced 2: In lav_start_check_cov(lavpartable = lavpartable, start = START) : lavaan WARNING: starting values imply NaN for a correlation value; variables involved are: Variance ReadingTime"

My correlation is not too high (.603 and -.669 respectively for each variable, both significant at 0.01 level).

Is there an option for me to run the CFA anyway despite multi-collinearity?

Thank you for any opinions.

Blessings

Luan Chau

Faith Millongo

unread,

Jun 14, 2023, 5:12:45 AM6/14/23

to lav...@googlegroups.com

Am not sure if I got you correctly.

But if the model is converging, you have to consider specifying that the standard errors have to be robust in model fit..

Alternatively, if the model is not converging, try estimating per latent variable and it's indicators.

Best

Faith

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a31612ac-e908-4dff-9ac1-f014517b320fn%40googlegroups.com.

Edward Rigdon

unread,

Jun 14, 2023, 6:34:46 AM6/14/23

to lav...@googlegroups.com

These are only warnings, not errors, so the problem probably did run. Apply the lavaan summary() function to your output object to see the results. it *may* be possible that the warnings apply to an intermediate result, not the final results--I'm not a programmer, so I don't know if this is true of lavaan, but it may be true of some packages with iterative estimation methods.

The issue is not that correlations are large, but that the correlations are disproportionate--they do not honor the proportionality constraints implied by your factor model. In a model without additional paths or residual covariances, each correlation for indicators of different factors is a product of the two variables' loadings times the correlation of the two factors they load on. (For two indicators of the same factor, the correlation is equal to the product of the two loadings times the factor's variance.)

So if you have these high correlations, then either the loadings are very high and the factor covariance is moderate, or else the loadings are not large and the factor covariance must be extreme. The warning, I think, indicates that lavaan's solution is of the second type. This is a rejection of the model that you have specified--assuming that all data handling and syntax are correct.

If you add residual covariances between the variables involved, this provides a different way for the high correlations to express themselves, but two of those in your model may make the model not "identified," which will make any results meaningless.

If your loadings are also large, so that maybe the high correlation is reasonable after all, then maybe lavaan needs better starting values. It could be that if you specify starting values, then estimation will proceed more smoothly.

--

Shu Fai Cheung (張樹輝)

unread,

Jun 14, 2023, 7:28:24 AM6/14/23

to lavaan

I do not want to imply that the warning can be ignored. I just want to share something that may help diagnosing the problem (if there is a problem).

Based on the provided information, the warning is about the starting values, not about the solution. Did lavaan give any warning on the solution? It automatically does some checks on the solution. Bad starting values are, well, bad. They may lead to "bad" solutions (e.g., inadmissible solution). However, they may also lead to the optimal solution, though may take longer. That is, even if we start far far away from the solution, we may still get there.

This is an illustration. I arbitrarily set some starting values so bad that lavaan gave a warning. However, the solution are virtually identical to those based on the default starting values.

``` r
# From the example of cfa()
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.

# Starting values so bad that a warning is issued
# But there is no warning regarding convergence and the solution
HS.model.bad_start <- ' visual =~ x1 + start(-10) * x2 + start(10) * x3
textual =~ x4 + start(10) * x5 + start(-10) * x6
visual ~~ start(2) * textual'
fit_bad_start <- cfa(HS.model.bad_start, data = HolzingerSwineford1939)
#> Warning in lav_start_check_cov(lavpartable = lavpartable, start = START): lavaan WARNING: starting values imply a correlation larger than 1;
#> variables involved are: visual textual
lavInspect(fit_bad_start, "post.check")
#> [1] TRUE
lavInspect(fit_bad_start, "converged")
#> [1] TRUE

# Default staring value
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
visual ~~ textual'
fit <- cfa(HS.model, data = HolzingerSwineford1939)

# The parameter estimates are virtually the same
all.equal(coef(fit_bad_start), coef(fit), tolerance = 1e-6)
#> [1] TRUE
coef(fit_bad_start) / coef(fit)
#> visual=~x2 visual=~x3 textual=~x5 textual=~x6
#> 1 1 1 1
#> visual~~textual x1~~x1 x2~~x2 x3~~x3
#> 1 1 1 1
#> x4~~x4 x5~~x5 x6~~x6 visual~~visual
#> 1 1 1 1
#> textual~~textual
#> 1
round(cbind(bad = coef(fit_bad_start), default = coef(fit)), 5)
#> bad default
#> visual=~x2 0.55895 0.55895
#> visual=~x3 0.70794 0.70794
#> textual=~x5 1.11097 1.11097
#> textual=~x6 0.92538 0.92538
#> visual~~textual 0.41368 0.41368
#> x1~~x1 0.53641 0.53641
#> x2~~x2 1.12498 1.12498
#> x3~~x3 0.86291 0.86292
#> x4~~x4 0.36942 0.36942
#> x5~~x5 0.44868 0.44868
#> x6~~x6 0.35609 0.35609
#> visual~~visual 0.82196 0.82196
#> textual~~textual 0.98124 0.98124
```

<sup>Created on 2023-06-14 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

So, in the artificial examples above, the warning on starting values can be ignored.

That said, given that there is a warning on starting values and you do not have a better solution to compare against, as Ed suggested, you may try other starting values to see whether lavaan arrives at the same solution.

First, if you somehow specified the starting values yourself, then you can try changing them and fit the model again to compare the results.

Second, If you just used the default, then you can change the default. From the help page of lavOptions (https://rdrr.io/cran/lavaan/man/lavOptions.html), there are two options, "simple" and "Mplus". I don't know which one is the default. You can try both.

Third, you may use the solution from the first run with the warning, randomly change the estimates (multiply each of them by a random number from .5 to 1.5), use them as starting values and fit the model again. See whether the solution is nearly the same.

This is a illustration. Not elegant but should be enough to illustrate the idea:

# Get the parameter estimates

est <- parameterEstimates(fit_bad_start)

# Randomly change them. (It's OK to change the fixed parameters. They will be ignored.)

set.seed(870591)
est$est <- est$est * runif(nrow(est), .5, 1.5)

# Fit the model again, set start to the parameter estimate data frame. The column 'est' will be used as starting values.

fit_again <- cfa(HS.model.bad_start, data = HolzingerSwineford1939,

start = est)

# Compare the results.

all.equal(coef(fit_bad_start), coef(fit_again), tolerance = 1e-6)
round(cbind(bad = coef(fit_bad_start), again = coef(fit_again)), 5)

Certainly, if there are warnings other than that on starting values, you may need to address them. It may not just because of the starting values.