lavaan WARNING: some estimated ov variances are negative

Tom Carpenter

unread,

Aug 25, 2017, 7:50:29 PM8/25/17

to lavaan

Greetings,

I have a very basic latent growth model.

lg.drinks <- ' i =~ 1*drinks_a1 + 1*drinks_a2 + 1*drinks_a3

s =~ 0*drinks_a1 + 1*drinks_a2 + 2*drinks_a3

'

fit.lg.drinks <- growth(lg.drinks, data=dat)

summary(fit.lg.drinks)

In short, I get an error and negative variances. How do I know if this is a problem with (1) my model or (2) my data, and second, how do I fix this? Is my output valid? Help?

lavaan (0.5-23.1097) converged normally after 74 iterations

Used Total

Number of observations 147 322

Estimator ML

Minimum Function Test Statistic 2.906

Degrees of freedom 1

P-value (Chi-square) 0.088

Parameter Estimates:

Information Expected

Standard Errors Standard

Latent Variables:

Estimate Std.Err z-value P(>|z|)

i =~

drinks_a1 1.000

drinks_a2 1.000

drinks_a3 1.000

s =~

drinks_a1 0.000

drinks_a2 1.000

drinks_a3 2.000

Covariances:

Estimate Std.Err z-value P(>|z|)

i ~~

s -28.197 6.545 -4.308 0.000

Intercepts:

Estimate Std.Err z-value P(>|z|)

.drinks_a1 0.000

.drinks_a2 0.000

.drinks_a3 0.000

i 5.228 0.610 8.575 0.000

s 0.621 0.262 2.373 0.018

Variances:

Estimate Std.Err z-value P(>|z|)

.drinks_a1 -42.452 10.442 -4.065 0.000

.drinks_a2 68.382 9.276 7.372 0.000

.drinks_a3 -3.701 11.236 -0.329 0.742

i 105.024 15.316 6.857 0.000

s 23.257 4.986 4.665 0.000

Message has been deleted

Terrence Jorgensen

unread,

Aug 26, 2017, 7:08:25 AM8/26/17

to lavaan

I get an error and negative variances.

If there was an error, you would not be able to inspect model output because the model fitting would have failed. Do you mean there is a warning message, simply letting you know there are negative variances?

How do I know if this is a problem with (1) my model or (2) my data, and second, how do I fix this?

I have responded to lots of posts about investigating NPD matrices / Heywood cases, you can search the lavaan forum for them. Basically, read this to test whether your out-of-bounds estimate is "significantly" out of bounds, or whether the CI also includes plausible values (which would indicate you cannot rule out sampling error as the cause):

http://journals.sagepub.com/doi/abs/10.1177/0049124112442138

If you can rule out sampling error as the cause, and your chi-squared is significant, then model misfit is probably the cause. If you you cannot rule out sampling error as the cause, and your model fits fine, then it is most certainly just sampling error. If you cannot rule out sampling error as the cause, but your model also does not fit, then the results are ambiguous -- could be either sampling error or misfit that caused the inadmissible value.

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Terrence Jorgensen

unread,

Aug 26, 2017, 7:09:13 AM8/26/17

to lavaan

Are you using the same two variables (drinks_a2, drinks_a3) as indicators of two factors (factor I and factor S),

No, he is not fitting a factor model. He is fitting a growth model.

Tom Carpenter

unread,

Aug 28, 2017, 1:25:05 PM8/28/17

to lavaan

Correct. This is a very standard latent growth model in which (my understanding is that) specification is not really optional. Both slope and intercept load in a predetermined pattern.

For the record, the model more or less made sense given the pattern of the data (sig positive latent slope, which matches overwhelminingly with the visual inspection of the data).

I did try constraining the residual variances to be equal, but that now forces a non-positive-definite latent variable covariance matrix (slope-intercept correlation @ 1.37) and negative eigenvalues. Further, the pattern of the results seems strange.

Thus, I'm at a loss. The model is predetermined, so there is not much else I can do. Is it possible to identify the source of the problem?

ONE other thing: the observed variable is non-normal, but that shouldn't matter? (Count data).

Tom Carpenter

unread,

Aug 28, 2017, 1:27:10 PM8/28/17

to lavaan

I can also confirm there are no data errors. The model is correctly specified.

Possible to know if the latent variable means and sig-tests are valid?

On Saturday, August 26, 2017 at 4:09:13 AM UTC-7, Terrence Jorgensen wrote:

Terrence Jorgensen

unread,

Aug 29, 2017, 5:14:17 AM8/29/17

to lavaan

For the record, the model more or less made sense given the pattern of the data (sig positive latent slope, which matches overwhelminingly with the visual inspection of the data).

Did you look at individual growth plots for each individual, with either splines/lowess curves or linear and quadratic fitted lines, to see whether the linearity assumption was appropriate? Model fit might not be your concern here, but estimation problems are more likely to occur when models do not fit well, so it is still worth investigating. Here is a textbook example for generating empirical growth plots in R (see Figure 2.3 for a useful one to check the linearity assumption):

https://stats.idre.ucla.edu/r/examples/alda/r-applied-longitudinal-data-analysis-ch-2/

I did try constraining the residual variances to be equal, but that now forces a non-positive-definite latent variable covariance matrix (slope-intercept correlation @ 1.37) and negative eigenvalues. Further, the pattern of the results seems strange.

Thus, I'm at a loss. The model is predetermined, so there is not much else I can do. Is it possible to identify the source of the problem?

If you already had NPD issues, fitting a more-constrained error structure will probably not help, as you saw. Did you inspect correlation residuals to see whether the model fails to capture some observed relationships?

resid(fit, type = "cor")

If some variables are more correlated than the model accounts for, you may need to specify an error covariance structure (e.g., autoregressive, allowing adjacent time points to be correlated, which you can also constrain to equality for all adjacent pairs if the time points are equally spaced), instead of assuming independent errors by default (one of the big criticisms of the SEM approach to growth modeling). Curren & Bollen's autoregressive latent trajectory (ALT) model attempts to address this problem, although in my experience it often has estimation difficulties -- perhaps because a simple autoregressive structure won't always be appropriate.

ONE other thing: the observed variable is non-normal, but that shouldn't matter? (Count data).

You can adjust SEs and test statistics for nonnormality by requesting a robust estimator (see ?lavOptions help page, the se, test, and estimator arguments). If your mean count is at least 20, the normal distribution is often a good approximation of Poisson. If your counts are lower (e.g., mean of 10 or lower), then you probably have predicted values that are negative, which makes no sense for counts. Since this is such a basic univariate growth curve, you might want to consider using the MLM framework instead of SEM. In the lme4 package, the glmer() function allows you to specify family = poisson(link = "log"), a more appropriate model for counts that are not overwhelmingly large and approximately normal.

Terrence Jorgensen

unread,

Aug 29, 2017, 5:15:24 AM8/29/17

to lavaan

Possible to know if the latent variable means and sig-tests are valid?

Technically, they are only valid if you are fitting the correct model. See my previous post about assuming linearity and accounting for nonnormality.

Reply all

Reply to author

Forward