Nodes produced errors

86 views
Skip to first unread message

Yago Luksevicius de Moraes

unread,
May 29, 2023, 4:24:00 PM5/29/23
to lavaan
Hi all,

I'm exploring how some fit indices are affected by model structure, estimator and misspecification, but sometimes an error occurs and no SimResult is created. The following syntax is one that created this error.

```
Populacao<-'#specific factors
F1=~0.742*I1+0.310*I2+0.600*I3
F2=~0.330*I4+0.686*I5+0.350*I6

#General Factor
G=~0.300*I1+.671*I2+.320*I3+.525*I4+.340*I5+.357*I6

#Factor correlation
F1~~0*F2+0*G; F2~~0*G

#thresholds
I1|1.028*t1; I2|0.539*t1; I3|0.156*t1; I4|(-0.179)*t1; I5|(-0.565)*t1; I6|(-1.067)*t1

#latent mean
F1~0*1; F2~0*1; G~0*1

#latent  variance
F1~~1*F1; F2~~1*F2; G~~1*G'

#Tested Model
bifatorial<-'F1=~I1+I2+I3
F2=~I4+I5+I6
G=~I1+I2+I3+I4+I5+I6'

sim.n1400b<-simsem::sim(nRep = 10, model = bifatorial, n = 1400, generate = Populacao, lavaanfun = "cfa", seed = 123, multicore = TRUE, estimator = "WLSMV", std.lv = TRUE, orthogonal = TRUE, ordered = TRUE)
```
This results in the error:
Error in checkForRemoteErrors(val) :
  10 nodes produced an error: $ operator is invalid for atomic vectors


If I change the multicore parameter to false, the simulation stops in the first replication with the error:
Warning: lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.Warning: lavaan WARNING: could not invert information matrix needed for robust test statistic
Warning: lavaan WARNING: covariance matrix of latent variables
                is not positive definite;
                use lavInspect(fit, "cov.lv") to investigate.Error in TEST[[test.idx]]$stat : $ operator is invalid for atomic vectors


Dr. Terrence Jorgensen told me this could be solved by downloading a development version of lavaan through the command `remotes::install_github("yrosseel/lavaan")` and it indeed solved the problem for this population syntax.

However, the problem still occurs when I change the syntax. Below, there are two examples that give the same error

```
Populacao<-'#specific factors
F1=~.742*I1+.310*I2+.623*I3+.330*I4
F2=~.494*I5+.35*I6+.347*I7

#General Factor
G=~0.300*I1+.683*I2+.320*I3+.560*I4+.340*I5+.424*I6+.36*I7

#Factor correlation
F1~~0*F2+0*G
F2~~0*G

#thresholds
I7|1.150*t1
I6|0.674*t1
I5|0.319*t1
I4|0.000*t1
I3|(-0.319)*t1
I2|(-0.674)*t1
I1|(-1.150)*t1

#média latente
F1~0*1; F2~0*1; G~0*1

#variância latente
F1~~1*F1; F2~~1*F2; G~~1*G'

#Tested Model
bifatorial<-'#estrutura
F1=~I1+I2+I3+I4
F2=~I5+I6+I7
G=~I1+I2+I3+I4+I5+I6+I7'

sim.ord.050<-simsem::sim(nRep = 100, model = bifatorial, n = 50, generate = Populacao,
                         lavaanfun = "cfa", seed = 123, multicore = TRUE,
                         estimator = "WLSMV", std.lv = TRUE, orthogonal = TRUE)
```
Example 2:

```
Populacao<-'#specific factors
F1=~0.742*I1+0.310*I2+0.600*I3
F2=~0.330*I4+0.686*I5+0.350*I6

#General Factor
G=~0.300*I1+.671*I2+.320*I3+.525*I4+.340*I5+.357*I6

#Factor correlation
F1~~0*F2+0*G
F2~~0*G

#thresholds
I6|(-0.130)*t1+(-0.009)*t2+0.292*t3+0.910*t4
I5|(-1.176)*t1+(-0.640)*t2+0.657*t3+1.613*t4
I4|(-1.687)*t1+(-0.970)*t2+(-0.778)*t3+1.267*t4
I3|(-1.237)*t1+(-0.288)*t2+(-0.230)*t3+0.409*t4
I2|(-0.947)*t1+(-0.409)*t2+(-0.286)*t3+(-0.064)*t4
I1|(-0.589)*t1+0.443*t2+0.950*t3+1.760*t4

#média latente
F1~0*1; F2~0*1; G~0*1

#variância latente
F1~~1*F1; F2~~1*F2; G~~1*G'

#Tested Model
bifatorial<-'#estrutura
F1=~I1+I2+I3
F2=~I4+I5+I6
G=~I1+I2+I3+I4+I5+I6'

sim.ord.050<-simsem::sim(nRep = 1e4, model = bifatorial, n = 50, generate = Populacao,
                         lavaanfun = "cfa", seed = 123, multicore = TRUE,
                         estimator = "WLSMV", std.lv = TRUE, orthogonal = TRUE)

```

The same error occurs with MLR and/or first-order bidimensional models too, although it seems much rarer.

Best,
Yago

Shu Fai Cheung

unread,
May 30, 2023, 5:47:02 AM5/30/23
to lav...@googlegroups.com
I have no experience with simsem::sim(). However, similar errors/warnings occurred when lavaan::simulateData() and lavaan::cfa() are used:

> dat <- simulateData(Populacao, sample.nobs = 50, seed = 1234)
> fit <- cfa(bifatorial, dat, estimator = "WLSMV")
Warning messages:
1: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :

  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
2: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats,  :

  lavaan WARNING: could not invert information matrix needed for robust test statistic

3: In lav_object_post_check(object) :
  lavaan WARNING: some estimated ov variances are negative

It may be easier to identify the problem with the population and/or the model using lavaan::simulateData() and lavaan::cfa() this way first.

May two cents.

Regards,
Shu Fai Cheung (張樹輝)


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/b4aba431-f762-4300-8b5e-cc77d44d3d1fn%40googlegroups.com.

Yago Luksevicius de Moraes

unread,
May 30, 2023, 11:31:17 AM5/30/23
to lav...@googlegroups.com

Yeah, from what I understand, it’s not an error in simsem, but in lavaan and, when it occurs, it stops the simsem.

The problem with “identifying the problem” are two:
1) I’ve tried more than 5 different models (all bifactorial, just changing the loadings and intercepts/thresholds) and 10 seeds and this error seems to be the norm rather than the exception, unless I use a sample size higher than 1000 observations, what leads me to the second problem;
2) My goal is actually to do 1e4 replications of each condition and knowing how often errors occur is one of the things I’m investigating. For instance, the first model I mentioned in my first post (6 dichotomous items) failed to converge 100% of the time with WLSMV, but this error happens only after 6000 simulations in another condition. I don’t know if it would happen again if simsem completed all the 1e4, because it stops after this one occurrence.

Altogether, I have 128 conditions (3 population models x 3 tested models x 2 estimators x 7 sample sizes), I can’t just keep changing the model whenever I fail to complete all the 1e4 replications.

Best,

Yago


You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/p97b8ndIE-w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAPVJd2WU%3Dc%3DQPsz8%3D-LcAQzs8QApbe6E0L%2B3tgy3k9gBu%2BayZw%40mail.gmail.com.


--
Yago Luksevicius de Moraes
Master in Experimental Psychology
Bachelor in Psychology

Shu Fai Cheung (張樹輝)

unread,
May 30, 2023, 10:28:26 PM5/30/23
to lavaan
I am not familiar with bifactor models. Just to share my experience in a simulation I did a while ago. Not a formal one but a quick one to test some ideas I was thinking about then.

I noticed that, in some cases, a model like this one from your example can fail to converge or have inadmissible solution even if this model is the "true" model (the data generation model):

bifatorial<-'#estrutura
F1=~I1+I2+I3
F2=~I4+I5+I6
G=~I1+I2+I3+I4+I5+I6'

I used only continuous variables and ML and so the problem was not due to categorical variables nor WLSMV. Then I learned that, due to the combination of population factor loadings I used, this model has an identification problem. This is not easy to spot because we do not have this problem in a typical CFA model without a general factor. Therefore, the failure to convergence actually is normal and expected in that case. If the combination of population factor loadings is similar to that particular combination in proportion, the chance of nonconvergence will not be 1 across replications but can still be high.

How about fitting the example with 6 items in Mplus and see whether the same problem occurs? The demo version can fit a 6-item-2-factor model, I believe.

-- Shu Fai

Keith Markus

unread,
May 31, 2023, 9:11:35 AM5/31/23
to lavaan
Yago,
I have never tried anything like this before.  So take this as speculative but here are two ideas.

Could you write a wrapper function for the lavaan fit function that you are using and use your custom wrapper in simsem as the fit function?  Inside your wrapper function, could you call the lavaan fit function inside a try() function.  Then use an if ... else ... structure.  If there is no error, fit the model and return the lavaan-class object as usual.  If there is an error, then create an empty lavaan-class object and set @fit@converged to FALSE.

Alternatively, if you created your data as a list of data sets in a separate step and then fit the models as a second step, perhaps you could break the second step into two parts.  First, you could write a loop to go through the list and record the result of the fit inside try() for each data set.  You could do that outside of simsem.  Then use the results from the first step to run the simulation through simsem using only the data sets that do not produce the error.

Bi-factor models are notoriously finicky.  So, I am less surprised that they sometimes fail to converge than I am that this causes an error rather than a warning in lavaan.  Of the two suggestions above, I suspect that the second offers the more direct route with fewer potential pitfalls.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Yago Luksevicius de Moraes

unread,
May 31, 2023, 8:13:54 PM5/31/23
to lav...@googlegroups.com
I'm not sure if I understood it right. My skills using packages are better than developing functions.

Feel free to correct me if I'm wrong, but, from what I understand, both alternatives would basically consist on making a simsem-like function to find the samples that simsem can analyze without crashing and then making simsem analyze these samples again to get information about fit indices, number of convergences and so on.

If I can do that (or rather, find someone that can do it), wouldn't it be better to just program this new function to get all the information that simsem would give me?

Best,
Yago

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/p97b8ndIE-w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Keith Markus

unread,
Jun 1, 2023, 10:30:38 AM6/1/23
to lavaan
Yago,
As I understand it, you are not encountering any problem with simsem, you are encountering a problem with lavaan when the model does not converge.  (One way to create that error is to apply the $ operator to NA.)  One thing that might have been helpful to mention in my previous post is that you can obtain a lavaan-class object without fitting a model in lavaan by using the do.fit = FALSE parameter.  So, one option might be to toggle this parameter between TRUE and FALSE depending on whether try(...) produces an error.  The idea is that you would write a custom function to be passed to sim() as the function used to fit the models using the lavaaanfun parameter.

However, if you are new to writing functions in R, my advice would be to go with the second option.  The use of sim() with a list of data sets is well explained in the simsem documentation, including examples.  I think that you could do the whole thing with only procedural code, avoiding the need to write any functions.  When you loop through with try(), see if you can save the result as a logical vector (or convert it after the fact).  If so, then you could just index your data list using this vector.

Here is an illustration of what I mean.  You would create your list of data sets using your simulation model, here I make a simple example manually just for convenience.  Likewise, I manually create the index vector, but yours would come from looping through the list of data sets with try(...) and appending TRUE or FALSE to the vector.  (Tip: You can initiate the vector as NULL before the loop, then use c(newValue, indexVector) inside the loop.)  The result of the last line below is to print only the first and third data sets.

myDataList <- list(set1 = data.frame(1:5),
                   set2 = data.frame(3:8),
                   set3 = data.frame(2:9))
indexVector <- c(TRUE, FALSE, TRUE)
myDataList[indexVector]

I hope that helps,

Yago Luksevicius de Moraes

unread,
Jun 3, 2023, 3:02:47 PM6/3/23
to lavaan
Dear Keith,

Thank you for your suggestion.
I've been trying to implement it, but unfortunately it is not easy to find 1e4 data sets that avoid this error, and it is time consuming to generate this much data sets, test if they give this error and then reanalyze with simsem.
What bothers me the most is I cannot understand why lavaan v. 0.6-16.1864 solves this problem for the first model, but not for all the others.

Best,

Keith Markus

unread,
Jun 4, 2023, 9:04:00 AM6/4/23
to lavaan
Yagol,
I did not mean for you to search for a sequence of data sets that do not produce the error.  What I meant was a procedure like this:
1. Generate the number of data sets stipulated by your simulation design.
2. Put them in a list.
3. Use a for() loop or something like that to loop through the list.  Inside this loop, record whether each data set produces the error or not.  For example, create a vector in which every data set that produces an error is coded FALSE and every data set that does not is coded TRUE.
4. Now index the list of data sets by the error indicator variable and send the result through sim() to generate results with no errors.
5. Report the proportion of samples that produced the error due to non-convergence.
6. Report the results from sim() for the data sets that did not produce an error.

Unless I am missing something, that procedure only requires that you generate a single set of data sets and cycle through them twice to produce usable results.
Reply all
Reply to author
Forward
0 new messages