How to interpret warning messages

Allan Leung

unread,

Mar 15, 2022, 2:15:15 AM3/15/22

to lavaan

Hi all,

Not an academic here, and not a whole lot of experience with SEM.

Just trying to do a practical application of CFA after doing EFA and forcing loadings for a work project, and trying to understand implications of these warning messages I'm getting

If I get warning messages:

'lavaan WARNING: Model estimation FAILED! Returning starting values'

'lavaan WARNING: estimation of the baseline model failed.'

Are these a big deal? I'm trying to do fit.measures=TRUE in my summary and it also tells me 'lavaan WARNING: fit measures not available if model did not converge' - I'm guessing this is to do with my earlier model failure?

For additional context...

I'm doing a very large CFA, testing 30 factors/latent variables, each with 345 variables (very initial iterations, will chop and change this). As this is very large, I've written my equations in a .txt file and used R's readLines function on it. My dataset contains 317k rows, and I'm doing this in local memory. I've tested this on a 100-row subset of the data and I still get the same warnings from earlier

Any help at all is appreciated! Let me know if you'd like any further detail

Thanks,

Allan

Jasper Bogaert

unread,

Mar 15, 2022, 3:50:04 AM3/15/22

to lavaan

Hi Alan,

The errors are a big deal, especially in this case because the model did not converge, or in other words, no solution has been found. By consequence, you will not have fit measures (or any other results). It seems like you have a very complex and large model. This requires a lot of data. Could you try running the analysis again with a much larger subset of the data? Am I correct to assume that 100-row subset of the data is the same as using 100 observations? In that case you do not have enough data in your subset.

If it would still not work with much more data, you might want to have another look at your model (or the EFA). If that does not help you could also try to use regSEM to make your model a bit smaller. These are just some thought that popped up, hope it helps!

Best wishes,

Jasper Bogaert

PhD student and teaching assistant

Department of Data Analysis (PP01)

Faculty of Psychology and Educational Sciences

Ghent University

Allan Leung

unread,

Mar 15, 2022, 7:50:29 AM3/15/22

to lavaan

Thanks for the reply Jasper!

Indeed that was the original intention to have a test run on 100 observations, but even though my data was only 100 rows it still came out to 317k observations. I will investigate why this is not working as intended, and once resolved will test it on a larger subset.

Regarding having another look at my model equations or my EFA, would you have any ideas on potential problems that may have caused these warnings to arise? Trying to narrow down the scope of debugging.

I could send you my model parameters and a bit of my dataset if you would prefer

Thanks,

Allan

Jasper Bogaert

unread,

Mar 15, 2022, 8:59:24 AM3/15/22

to lavaan

Hi Alan

I cannot guarantee finding the solution, but I am willing to have a look at your code (to see what you have done for both EFA and CFA) and to try running it. It would be nice to work with a small part of the dataset, but if that is not possible (or you prefer not to do so) you can just send me the model syntax and I can simulate the data from the model.

Best wishes,

Jasper

Allan Leung

unread,

Mar 16, 2022, 2:39:23 AM3/16/22

to lavaan

Hi Jasper,

I have tested using simulated data myself, to see where it breaks.

It works, as it was running for an hour, however it led to a 'cannot allocate vector of size 105.6gb' error. This was with simulated 50k observations.

With this in mind, I've now drawn 2 conclusions:

1. The problem is with my dataset, not my model

2. Possibly due to the large number of variables (345) for each of the 30 factors, R is unable to process the calculation

My dataset essentially contains variable names as columns, with rows as proportions, and includes zero's

For example,

a b c d ....

0.05541 0.034 0.0001 0

0 0 0 0.0026

...

The variable names match exactly to the names in my model parameter file, is it potentially the zero's? or is it something to do with how I've formatted the file?

Thanks!

Allan

Allan Leung

unread,

Mar 16, 2022, 2:49:17 AM3/16/22

to lavaan

I have looked at the PoliticalDemocracy dataset provided in the package for reference, and my dataset is formatted in a similar way.

My variable names are a bit longer but I dont think that should be a problem

Jasper Bogaert

unread,

Mar 17, 2022, 5:51:49 AM3/17/22

to lavaan

Hi Allan,

I took a look at your model and the subset of the data. I see two problems:

First of all your model is too big (but you already mentioned this yourself). I would suggest refining your research problem and/or question of interest and only use the variables needed. If you struggle with this I would suggest using some form of regularization to obtain a more parsimonious model with only most interesting factors and items.

A second issue could be the data itself. I only have a subset of your data, but it does not look normally distributed. One of the assumptions is that the items are normally distributed. If this is not the case, you may want to use the satorra-bentler correction (it is an option in the cfa and/or sem function).

These are just suggestions, I hope they can help you a bit further.

It might also be worth waiting for someone else to jump in or to have another look.

Best wishes,

Jasper

Op woensdag 16 maart 2022 om 07:49:17 UTC+1 schreef Allan Leung:

Reply all

Reply to author

Forward