Seeking Help with Model Non-Identification and Warnings in lavaan

Dinesha Dissanayake

unread,

Apr 15, 2024, 8:46:09 AM4/15/24

to lavaan

Hi everyone,

I am working on a multigroup SEM using the ESS data from Sweden (2012-2020), and I’ve encountered some persistent issues that I am hoping to get some advice on. The model includes two latent variables representing attitudes towards immigrants and immigration policy, with the former measured by two indicators and the latter by three. After confirming measurement invariance, I introduced demographic (age, gender; categorical variables) and socioeconomic status (education, income; ordinal variables) as formative variables.

The path diagram is as follows.

When fitting the model using WLSMV with ordered categorical variables, I encountered several warnings that I’m struggling to resolve:

Here is the R code and encountered warnings for my model:

M1 <- '
attitude1 =~ imsmetn + impcntr
attitude2 =~ imbgeco + imueclt + imwbcnt
attitude1 ~~ attitude2
demographic <~ agea + gndr + domicil
socioeconomic <~ hinctnta + eisced
attitude1 ~ demographic + socioeconomic
attitude2 ~ demographic + socioeconomic
'

fit_M1 <- sem(M1, data = swedish_df_age, group = "essround",
ordered = c("imsmetn", "impcntr", "imbgeco", "imueclt", "imwbcnt"),
estimator = "WLSMV")

Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$.categorical, :
lavaan WARNING: automatically added intercepts are set to zero:
[demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic]
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
3: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats, :
lavaan WARNING: could not invert information matrix needed for robust test statistic

I’ve checked for multicollinearity, and re-evaluated my formative indicators. However, these issues persist. I would greatly appreciate any insights or suggestions on:

1.How to properly scale or transform variables to address the variance issue.
2.Approaches to confirming correct model specification for formative indicators.
3.Strategies to ensure model identification and address the issues with the W matrix and information matrix inversion.

Thank you for your time and assistance.

Best regards,
Dinesha Dissanayake

Daniel Morillo Cuadrado

unread,

Apr 15, 2024, 9:41:00 AM4/15/24

to lav...@googlegroups.com

Your attitude1 variable is underidentified, as it only has 2 indicators. Try either of:

setting the two loadings to 1 (the first one is set automatically by default in your syntax):

M1 <- '
attitude1 =~ imsmetn + 1*impcntr

attitude2 =~ imbgeco + imueclt + imwbcnt
attitude1 ~~ attitude2
demographic <~ agea + gndr + domicil
socioeconomic <~ hinctnta + eisced
attitude1 ~ demographic + socioeconomic
attitude2 ~ demographic + socioeconomic
'

fixing the variance of attitude1:

M1 <- '
attitude1 =~ imsmetn + impcntr
attitude2 =~ imbgeco + imueclt + imwbcnt
attitude1 ~~ attitude2
demographic <~ agea + gndr + domicil
socioeconomic <~ hinctnta + eisced
attitude1 ~ demographic + socioeconomic
attitude2 ~ demographic + socioeconomic

attitude1 ~~ 1*attitude1

'

That should at least fix the identification problem.

--

Daniel Morillo, Ph.D.

GitHub | ORCID

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/f4f4a961-7ec0-469c-8a74-c2cd6786cb73n%40googlegroups.com.

Message has been deleted

Dinesha Dissanayake

unread,

Apr 15, 2024, 3:56:02 PM4/15/24

to lavaan

Thank you very much for your reply. Unfortunately I get the same warning messages.

> M1 <- '
+ attitude1 =~ imsmetn + 1*impcntr
+ attitude2 =~ imbgeco + imueclt + imwbcnt
+ attitude1 ~~ attitude2
+ demographic <~ agea + gndr + domicil
+ socioeconomic <~ hinctnta + eisced
+ attitude1 ~ demographic + socioeconomic
+ attitude2 ~ demographic + socioeconomic
+ '

> fit_M1 <- sem(M1, data = swedish_df_age, group = "essround",

+ ordered = c("imsmetn", "impcntr", "imbgeco", "imueclt", "imwbcnt"),
+ estimator = "WLSMV")

Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$.categorical, :
lavaan WARNING: automatically added intercepts are set to zero:
[demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic]
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
3: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats, :
lavaan WARNING: could not invert information matrix needed for robust test statistic

> M1 <- '
+ attitude1 =~ imsmetn + impcntr
+ attitude2 =~ imbgeco + imueclt + imwbcnt
+ attitude1 ~~ attitude2
+ demographic <~ agea + gndr + domicil
+ socioeconomic <~ hinctnta + eisced
+ attitude1 ~ demographic + socioeconomic
+ attitude2 ~ demographic + socioeconomic
+ attitude1 ~~ 1*attitude1
+ '

> fit_M1 <- sem(M1, data = swedish_df_age, group = "essround",

+ ordered = c("imsmetn", "impcntr", "imbgeco", "imueclt", "imwbcnt"),
+ estimator = "WLSMV")

Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$.categorical, :
lavaan WARNING: automatically added intercepts are set to zero:
[demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic]
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
3: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats, :
lavaan WARNING: could not invert information matrix needed for robust test statistic

Ingo Man

unread,

Apr 16, 2024, 4:24:17 AM4/16/24

to lavaan

Hello Dinesha,

have you tried a stepwise approach?

First, I would fit a modell without using the grouping variable, see if that works.

Second, to further explore why your model does not fit, more information are needed (i. e.: sample size, distribution of the variables (maybe zero-inflation problems?),...)

Third, what about first fitting single construct CFAs? Inspect the factor loadings, are they substantial?

Fourth, try a different estimator to see if the model fits in general.

According to the figures you posted, demographic and socioeconomic should not be correlated, maybe insert this restriction into your syntax: demographic ~~ 0*socioeconomic

By the way: what does the following part of the syntax mean? (the <~ statement), have not seen this before, maybe you meant =~?

demographic <~ agea + gndr + domicil
socioeconomic <~ hinctnta + eisced

Hope that helps.

Kindly, Marcus

Dinesha Dissanayake

unread,

Apr 16, 2024, 5:46:18 AM4/16/24

to lavaan

Hi Marcus,

Thank you for your thoughtful suggestions.

In my analysis I use year as the grouping variable and for the first model I consider 5 time points. My first approach was to fit the measurement model with only one endogenous latent variable and 6 indicator variables. Because of the multicollinearity issues I had to remove one variable. Then I fitted another multigroup model with 5 indicators and one latent variable. But RMSEA of that model was higher. Then EFA suggested two factors. So I modified my model to two endogenous latent variables and RMSEA value of the model was below 0.08. Model is as follows.

basic_model <- 'attitude1 =~ imsmetn + impcntr

attitude2 =~ imbgeco + imueclt + imwbcnt

attitude1 ~~ attitude2'

I was able to confirm that the model holds measurement invariance. Then I compared latent means and identified the two time points with most noticeable change. My second modeling approach is to assess how latent mean change within those two time periods with the addition of demographic and socioeconomic factors to the model. Now I'm facing warning issues with that model.

I have indeed attempted a stepwise approach, starting without the grouping variable and advancing to a model with a single latent variable. Despite this, the warning messages persist. My initial exploratory factor analysis suggested two factors, which led to the current two-latent-variable model that satisfies measurement invariance with an RMSEA below 0.08.

Regarding your inquiries, the sample size is 7005, and I've ensured the variables are distributed appropriately with no zero-inflation issues. My dataset structure is as follows:

> str(swedish_df_age)
'data.frame': 7005 obs. of 12 variables:
$ essround: int 6 6 6 6 6 6 6 6 6 6 ...
$ imsmetn : int 3 4 4 3 3 2 4 4 3 3 ...
$ imdfetn : int 3 4 4 3 3 2 3 4 2 2 ...
$ impcntr : int 4 4 4 2 3 2 3 4 2 2 ...
$ imbgeco : int 3 5 4 2 4 4 2 4 2 1 ...
$ imueclt : int 4 4 4 4 4 4 4 4 2 4 ...
$ imwbcnt : int 4 3 4 2 4 3 4 4 2 2 ...
$ gndr : int 2 1 1 1 1 1 2 2 2 2 ...
$ agea : int 4 4 4 4 4 4 4 4 4 4 ...
$ domicil : int 2 5 3 3 4 1 1 4 4 3 ...
$ eisced : int 1 1 1 4 3 1 1 1 1 1 ...
$ hinctnta: int 1 1 1 2 2 2 1 1 1 1 ...

> summary(swedish_df_age)
essround imsmetn imdfetn impcntr imbgeco imueclt
Min. : 6.000 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000
1st Qu.: 7.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:2.000 1st Qu.:3.000
Median : 8.000 Median :3.000 Median :3.000 Median :3.00 Median :3.000 Median :4.000
Mean : 8.112 Mean :3.247 Mean :3.169 Mean :3.09 Mean :3.221 Mean :3.783
3rd Qu.:10.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.000 3rd Qu.:4.000
Max. :10.000 Max. :4.000 Max. :4.000 Max. :4.00 Max. :5.000 Max. :5.000
imwbcnt gndr agea domicil eisced hinctnta
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.00 Min. :1.000
1st Qu.:3.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.00 1st Qu.:3.00 1st Qu.:2.000
Median :4.000 Median :1.000 Median :2.000 Median :3.00 Median :4.00 Median :2.000
Mean :3.508 Mean :1.489 Mean :2.335 Mean :2.87 Mean :3.52 Mean :2.208
3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:5.00 3rd Qu.:3.000
Max. :5.000 Max. :2.000 Max. :4.000 Max. :5.00 Max. :5.00 Max. :3.000

I have also attempted fitting single construct CFAs. However, the standard errors couldn't be computed due to the non-inversion of the information matrix (same warning issues appear).

Regarding the correlation between demographic and socioeconomic factors, I inserted the recommended restriction, but it did not resolve the issue.

As for the syntax demographic <~ agea + gndr + domicil, it indicates a formative model. Since I have a formative model associated with demographic and socioeconomic factors, I searched online and found that operator to form the formative relationships.

I appreciate your assistance and am open to any further insights you may have.

Warm regards,
Dinesha Dissanayake

Daniel Morillo Cuadrado

unread,

Apr 16, 2024, 5:49:10 AM4/16/24

to lav...@googlegroups.com

In addition to what Marcus suggests, I'd say you may want to check for data missingness and multicollinearity in your variables.

@Marcus the '<-' operator is for defining a latent variable as a formative construct; refer to Dinesha's diagram for the relationships between those variables, you'll see it clearer.

--

Daniel Morillo, Ph.D.

GitHub | ORCID

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/76b1d887-f26f-4315-ae48-997e5d93af34n%40googlegroups.com.

Dinesha Dissanayake

unread,

Apr 16, 2024, 6:02:31 AM4/16/24

to lavaan

Hi Daniel,

I did check for missing data and multicollinearity at the beginning of my analysis. To address missing values I used listwise deletion. Regarding multicollinearity, I identified and removed one out of six indicators due to a high correlation with the other two variables, which explains why I currently have only two indicators per latent variable for attitudes towards immigrants.

Your insights are much appreciated.

Best,
Dinesha Dissanayake

Daniel Morillo Cuadrado

unread,

Apr 16, 2024, 9:47:29 AM4/16/24

to lav...@googlegroups.com

I'm afraid that's as much as I can help you without seeing the data, I'm so sorry :)

I'd suggest you try Marcus' suggestions or, if you haven't tried it yet, check whether adding back the third indicator for "attitude1" helps solve it, just in case.

Otherwise, I'm sure other participants on this list will be able to give you further suggestions.

I hope you have good luck with your research!

Daniel

--

Daniel Morillo, Ph.D.

GitHub | ORCID

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/f5debd2c-4456-4678-914c-6793e8a73cb6n%40googlegroups.com.

Ingo Man

unread,

Apr 17, 2024, 7:14:10 AM4/17/24

to lavaan

Hello again, nice too see what you have already done with your modelling :-). Without having the data myself, I would at least try to use a different estimator and play around with some settings in the sem-function-call (try lavaans' help-page).

Thanks for introducing the modelling of formative factor models, have not done this before in practice.

Hope you will find a solution.

Marcus

Dinesha Dissanayake

unread,

Apr 17, 2024, 7:39:48 AM4/17/24

to lavaan

Hi Daniel and Marcus ,

I truly appreciate your willingness to help. Based on your advices, I revisited the model specification. Interestingly, transforming formative indicators to reflective ones allowed the model to fit without the non-computing standard errors/ non-inversion of the information matrix issue. However, using the WLSMV estimator (all variables in my dataset is ordinal/ categorical variables with less than 6 levels) resulted in a following non-convergence warning:

Warning message:
In lavaan::lavaan(model = M1, data = swedish_df_age, ordered = TRUE, :
lavaan WARNING:
the optimizer warns that a solution has NOT been found!

Subsequently, when I applied the ML estimator, the model converged without any issues. Yet, due to concerns about not assuming normality, I tested both MLR and MLM estimators. They also fit the model well but flagged an issue:

Warning message:

In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:

The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue
(= -6.064597e-22) is smaller than zero. This may be a symptom that

the model is not identified.

Given the data's ordinal nature, what would you suggest is the best estimator to use in this scenario? Should I be concerned about the warning regarding the variance-covariance matrix when using MLR and MLM estimators?

Thank you once again for your valuable input.

Warm regards,
Dinesha Dissanayake

Ingo Man

unread,

Apr 17, 2024, 10:38:31 AM4/17/24

to lavaan

Dear Dinesha,

fine. You should not worry much about the warning as Terence wrote here:

https://groups.google.com/g/lavaan/c/4y5pmqRz4nk

Concerning the estimator: have you tried other estimators? https://lavaan.ugent.be/tutorial/est.html? There is big discussion which estimator is best, a google search maybe helps here. In the context of structural equation modeling with ordinal data, the MLR (Robust Maximum Likelihood) and WLSMV (Weighted Least Squares with Means and Variances adjusted) estimators have proven to be practicable (see Li, 2016, p. 939).

I think you can try comparing both estimators. Why the model with the wlsmv does not fit is beyond my understanding. Have you played a bit with the settings?

Kindly,

Marcus

Dinesha Dissanayake

unread,

Apr 17, 2024, 1:46:46 PM4/17/24

to lavaan

Dear Marcus,

Thank you for your guidance and the resources. I have identified the cause of the issue—it was the gender variable. After removing this variable, both models, fitted with the WLSMV and MLR estimators, work perfectly fine without any warning messages.

Now, I have one remaining concern regarding the choice between two estimators: MLR (Standard errors: Sandwich) and MLM (Standard errors: Robust SEM). Could you advise on which would be most suitable when all variables are categorical? I checked the link you provided, but it doesn't specifically address this question.

Thank you again for your assistance.

Kind regards,

Dinesha

Ingo Man

unread,

Apr 17, 2024, 7:30:52 PM4/17/24

to lavaan

Hey Dinesha,

great. Consider coding your dichotomous variables always 0/1! Does it fit now?

Can't imagine why there were problems, maybe because each of these 3 variables of the factor has a different metric? If you want to keep gender as a variable, you could z-standardize all variables within this factor and then take these variables into the CFA, maybe then it will work.

Concerning the choice of the estimators: please, search in academic databases concerning this issue, there should be simulation studies. Try out search terms such as: estimator choice categorical variables ... .I really like the MLR because you can use FIML - virtually the best choice when you have missing data --> try it out, because listwise deletion is only pragmatic but not a good choice in dealing with missingness.

Kindly,

Marcus

Dinesha Dissanayake

unread,

Apr 23, 2024, 7:07:27 AM4/23/24

to lavaan

Hi Marcus,

Thank you for your suggestions. I tried coding the dichotomous gender variables as 0/1 and also z-standardizing all variables within the latent factor that includes the gender variable. Unfortunately, neither method resolved the issue. Considering this, I am thinking of removing the gender variable or proceeding with the MLR estimator.

Thank you again for your guidance.

Kind regards,
Dinesha

Dinesha Dissanayake

unread,

May 27, 2024, 5:17:47 AM5/27/24

to lavaan

Hello Daniel , Marcus and everyone,

I am currently working with the lavaan package in R to analyze a dataset containing ordinal variables and I'm considering different estimation methods. I've come across Diagonally Weighted Least Squares (DWLS) and Weighted Least Squares Mean and Variance Adjusted (WLSMV). I would like to understand better:

Are DWLS and WLSMV considered different estimation methods in lavaan?
What are the key differences between these two methods, specifically in their implementation within lavaan?
In what scenarios would one method be preferred over the other?

Any insights or experiences shared would be greatly appreciated!

Thank you!

Reply all

Reply to author

Forward