# Warning message: "some estimated ov variances are negative"

109 views

Jul 27, 2018, 10:09:48 AM7/27/18
to lavaan
Hi,

I am currently trying to calculate a latent growth model with four time points. The model is pretty simple:

model1<-  '

i =~ 1*t5 + 1*t6 + 1*t7 + 1*t8

s =~ 0*t5 + 1*t6 + 2*t7 + 3*t8

'

fit <- growth(model, data=data, estimator = "ml", missing = "fiml")
summary(fit, fit.measures = TRUE, standardized = TRUE)

When executing the growth command, I am allways getting the error message, that some variances are negative.
The past hours, I tried to solve the problem and searched the internet for a solution but couldn't find anything that could help me.

My data looks pretty much like this simulated one:

data2 <- array(0, dim=c(1000,4))
data2 <- as.data.frame(data)
colnames(data2) <- c("t5","t6","t7","t8")

for (k in 1:1000) {
data2[k,1] <- sample(seq(0, 0.1, by=0.0001),1)
data2[k,2] <- sample(seq(0.1001, 0.2, by=0.0001),1)
data2[k,3] <- sample(seq(0.2001, 0.5, by=0.0001),1)
data2[k,4] <- sample(seq(0.5001, 1, by=0.0001),1)
}

The data2 data frame does also receive the same error message.

I hope that anyone can help me.

Yours,
Jan

### Terrence Jorgensen

Jul 29, 2018, 7:10:27 PM7/29/18
to lavaan
When executing the growth command, I am allways getting the error message, that some variances are negative.
The past hours, I tried to solve the problem and searched the internet for a solution but couldn't find anything that could help me.

Have you tested the null hypothesis that sampling error can account for the degree to which the estimate(s) is(are) negative?

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Jul 30, 2018, 4:42:42 AM7/30/18
to lavaan
Hi Terrence,

thanks for the response! I haven't tested this null hypothesis (yet). How do I do this? As I read in the article which you had sent, I just use the respective estimate of variance and it's standard error and divide these values by each other to get a z-value.
Is this the right approach? Can I use the estimates from the lv fit output? Sorry for asking but I just started to work with lavaan so a lot of questions pop up frequently :)

Thank you and greetings from Hamburg (Germany)
Jan

### Terrence Jorgensen

Jul 31, 2018, 6:23:15 AM7/31/18
to lavaan
I just use the respective estimate of variance and it's standard error and divide these values by each other to get a z-value.
Is this the right approach?

Yes, you can do that.  I am a fan of looking at the CI, though, since that shows the range of null hypotheses that cannot be rejected by the data.  It is also more generalizable to situations like using bootstrapping for robust tests, if the sampling distribution of the statistic is not normal.  If the CI contains plausible (i.e., positive) values, then sampling error cannot be discounted as the cause.

It is not uncommon for small population residual variances to be negative in small samples.  But consistency with sampling error does not dismiss the possibility that model misspecification caused the Heywood case, so assess the fit of your model, too.

Can I use the estimates from the lv fit output?

Yes:

`summary(fit, ci = TRUE)`

Aug 1, 2018, 6:59:39 AM8/1/18
to lavaan
Hi Terrence,

thanks again for answering my question. What do I have to do, when the CIs do not include zero. I assume that my model is not right and that I do have to reconfigure it.
Is there any standardized way on constructing a better fitting model? At first, I designed my model on the basis of the data itself (which I had simulated according to a theoretical model) and now I am wondering why there seems to be no fit.

I am sorry for asking but currently, I am very confused about these insights.

Thank you in advance,
Jan

### Terrence Jorgensen

Aug 1, 2018, 8:05:58 AM8/1/18
to lavaan
Is there any standardized way on constructing a better fitting model?

I would recommend looking at the matrix of correlation & mean residuals to see which relationships (correlations) and means are poorly reproduced by the model.

`resid(fit, type = "cor")`

Hopefully, that will provide clues about how the model could be improved.  Because it is such a simple growth model with 4 observed variables, there is likely little else to do than free residual correlations if the model-implied correlations are not adequate (e.g., within 0.1 of the observed correlations).  But if the mean residuals are large, then it is the linear growth function that is too simple to represent the data.

At first, I designed my model on the basis of the data itself (which I had simulated according to a theoretical model) and now I am wondering why there seems to be no fit.

Your simulated data are not from the model that you are fitting to it, so it is not straight-forward to explain why the simulated data are not captured well by it.  For instance, you are just randomly sampling values in a particular range, independently at each time for each person, so your simulated data have no autocorrelation you might expect in real longitudinal data.  The growth model posits that person i's score at t1 is related to their score at t2, but your simulated data do not provide such data.  Thus, the reason your simulated data produce the error might be completely unrelated to why your real data produce the error.