lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others

Thalia Theodoraki

unread,

Oct 17, 2017, 4:44:13 PM10/17/17

to lavaan

Hi everyone

I have been using the lavaan package to apply FIML for my correlational and regression analyses.

When running my models I sometimes get the warning:

In lav_data_full(data = data, group = group, group.label = group.label, :

lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate

I understand that it is warning me that some variances in my model are larger than others but I was wondering what this practically means for my model.Does this mean I should not trust this particular model because these unequal variances make it unreliable?

Presumably, the variances are so large due to the values of the variables so should I transform my variables in some way? Could it also be a result of some of my variables not being normally distributed?

Which brings me to my second, more general question, about how non-normally distributed variables affect regression done using FIML.

I do have variables that are non-normally distributed, specifically my outcomes and one of my predictors are not normal. I know that in normal regression non-normality of the variables is not a serious concern especially with my sample sizes of about 64 to 114 individuals (according to the analysis) but what happens when using FIML? I have read that non-normality can affect the errors and likelihood ratio test when using FIML rather than the parameter estimates themselves. I am not actually using the errors and I am using the Wald test to investigate the fit of the model and the predictors so how concerned should I be about non-normality in my case?

Any help and advice would be much appreciated.

Many thanks

Thalia

Terrence Jorgensen

unread,

Oct 18, 2017, 4:10:53 AM10/18/17

to lavaan

what this practically means for my model. Does this mean I should not trust this particular model because these unequal variances make it unreliable?

The message is a warning because it might be a problem. Estimators try to minimize the discrepancy between the observed and expected (i.e., model-implied) means and (co)variances. If a variable has an observed variance == 1 and the model estimates imply it is only half as large, the discrepancy (residual) is only 1 - 0.5 = 0.5, but if a variable has an observed variance of 100, the same relative discrepancy (half as large: 50) is much larger in absolute magnitude (100-50=50). So variables with much larger variances can dominate the estimation algorithm's search for the best-fitting parameter estimates, paying less attention to important discrepancies among variables with less variance. Again, that is only a possibility, not a guarantee.

Presumably, the variances are so large due to the values of the variables so should I transform my variables in some way?

Sure, you can divide the variable with large variance by 10 to express in larger units, so that the variance is 10^2 = 100 times smaller, making it closer in magnitude to other observed variances. Assuming the estimator was not actually running into problems with the original data, you should get identical model fit, and your point and SE estimates involving the transformed variable should be the same except for a change in decimal place.

Could it also be a result of some of my variables not being normally distributed?

No, nonnormal variables can also have either large or small variance.

I am not actually using the errors and I am using the Wald test to investigate the fit of the model and the predictors so how concerned should I be about non-normality in my case?

Well, the Wald test for parameters also assumes normality of sampling distribution with the estimated SE, which assumes normal data. Like with OLS regression, robustness to nonnormality is also a matter of degree, so it depends how must excess kurtosis the variables have. To be conservative, you can use robust ML to adjust SEs and test statistics for nonnormality. Some are available even with missing data when using FIML.

fit <- sem(model, data, missing = "FIML", estimator = "MLR")

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Thalia Theodoraki

unread,

Oct 23, 2017, 2:54:29 PM10/23/17

to lavaan

Thalia Theodoraki

unread,

Oct 23, 2017, 3:14:41 PM10/23/17

to lavaan

Thanks so much Terrence!!!

I fitted the models the other day with transformed variables (divided by 10) and the fit and parameters were the same, meaning that the larger variances didn't really affect my model's fit. However, I decided to keep the model with the transformed variables because they had smaller errors that were similar for all variables.

I also took your advice and applied MLR to my models and sure enough nothing changed other than the error terms. But obviously, I will use the MLR models as you said the Wald test will be more reliable that way and the errors will be robust.

The final thing I would like to ask is whether in lavaan there is a way to check the regression assumptions: namely if I can produce plots of the residuals with the fitted values or with leverage points similar to those produced with the plot(model) function when conducting normal multiple regression with lm.

Thanks

Thalia

Terrence Jorgensen

unread,

Oct 25, 2017, 4:56:37 AM10/25/17

to lavaan

The final thing I would like to ask is whether in lavaan there is a way to check the regression assumptions: namely if I can produce plots of the residuals with the fitted values or with leverage points similar to those produced with the plot(model) function when conducting normal multiple regression with lm.

lavaan is a covariance-structure analysis program, so residuals in this context are the differences between observed and expected (model-implied) covariance matrices, not observed and expected (predicted) values per individual row of data. You can use the estimated regression slopes to calculate predicted values manually (although you will only get them for rows with complete data on predictors), then calculate residuals manually (for rows without missing predictors).

Reply all

Reply to author

Forward