Seeking Help with Model Non-Identification and Warnings in lavaan

366 views
Skip to first unread message

Dinesha Dissanayake

unread,
Apr 15, 2024, 8:46:09 AM4/15/24
to lavaan
Hi everyone,

I am working on a multigroup SEM using the ESS data from Sweden (2012-2020), and I’ve encountered some persistent issues that I am hoping to get some advice on. The model includes two latent variables representing attitudes towards immigrants and immigration policy, with the former measured by two indicators and the latter by three. After confirming measurement invariance, I introduced demographic (age, gender; categorical variables) and socioeconomic status (education, income; ordinal variables) as formative variables.

The path diagram is as follows.
model 2.png

When fitting the model using WLSMV with ordered categorical variables, I encountered several warnings that I’m struggling to resolve:

Here is the R code and encountered warnings for my model:

M1 <- '
  attitude1 =~ imsmetn + impcntr
  attitude2 =~ imbgeco + imueclt + imwbcnt
  attitude1 ~~ attitude2
  demographic <~ agea + gndr + domicil
  socioeconomic <~ hinctnta + eisced
  attitude1 ~ demographic + socioeconomic
  attitude2 ~ demographic + socioeconomic
'

fit_M1 <- sem(M1, data = swedish_df_age, group = "essround",
              ordered = c("imsmetn", "impcntr", "imbgeco", "imueclt", "imwbcnt"),
              estimator = "WLSMV")

Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$.categorical,  :
  lavaan WARNING: automatically added intercepts are set to zero:
    [demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic]
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
3: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not invert information matrix needed for robust test statistic


I’ve checked for multicollinearity, and re-evaluated my formative indicators. However, these issues persist. I would greatly appreciate any insights or suggestions on:

1.How to properly scale or transform variables to address the variance issue.
2.Approaches to confirming correct model specification for formative indicators.
3.Strategies to ensure model identification and address the issues with the W matrix and information matrix inversion.

Thank you for your time and assistance.

Best regards,
Dinesha Dissanayake

Daniel Morillo Cuadrado

unread,
Apr 15, 2024, 9:41:00 AM4/15/24
to lav...@googlegroups.com
Your attitude1 variable is underidentified, as it only has 2 indicators. Try either of:

setting the two loadings to 1 (the first one is set automatically by default in your syntax):

M1 <- '
  attitude1 =~ imsmetn + 1*impcntr
  attitude2 =~ imbgeco + imueclt + imwbcnt
  attitude1 ~~ attitude2
  demographic <~ agea + gndr + domicil
  socioeconomic <~ hinctnta + eisced
  attitude1 ~ demographic + socioeconomic
  attitude2 ~ demographic + socioeconomic
'

fixing the variance of attitude1:

M1 <- '
  attitude1 =~ imsmetn + impcntr
  attitude2 =~ imbgeco + imueclt + imwbcnt
  attitude1 ~~ attitude2
  demographic <~ agea + gndr + domicil
  socioeconomic <~ hinctnta + eisced
  attitude1 ~ demographic + socioeconomic
  attitude2 ~ demographic + socioeconomic
  attitude1 ~~ 1*attitude1
'
That should at least fix the identification problem.
--
Daniel Morillo, Ph.D.
GitHub | ORCID


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/f4f4a961-7ec0-469c-8a74-c2cd6786cb73n%40googlegroups.com.
Message has been deleted

Dinesha Dissanayake

unread,
Apr 15, 2024, 3:56:02 PM4/15/24
to lavaan
Thank you very much for your reply. Unfortunately I get the same warning messages.

> M1 <- '
+   attitude1 =~ imsmetn + 1*impcntr
+   attitude2 =~ imbgeco + imueclt + imwbcnt
+   attitude1 ~~ attitude2
+   demographic <~ agea + gndr + domicil
+   socioeconomic <~ hinctnta + eisced
+   attitude1 ~ demographic + socioeconomic
+   attitude2 ~ demographic + socioeconomic
+ '

> fit_M1 <- sem(M1, data = swedish_df_age, group = "essround",
+               ordered = c("imsmetn", "impcntr", "imbgeco", "imueclt", "imwbcnt"),
+               estimator = "WLSMV")

Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$.categorical,  :
  lavaan WARNING: automatically added intercepts are set to zero:
    [demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic]
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
3: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not invert information matrix needed for robust test statistic


> M1 <- '
+   attitude1 =~ imsmetn + impcntr
+   attitude2 =~ imbgeco + imueclt + imwbcnt
+   attitude1 ~~ attitude2
+   demographic <~ agea + gndr + domicil
+   socioeconomic <~ hinctnta + eisced
+   attitude1 ~ demographic + socioeconomic
+   attitude2 ~ demographic + socioeconomic
+   attitude1 ~~ 1*attitude1
+ '

> fit_M1 <- sem(M1, data = swedish_df_age, group = "essround",
+               ordered = c("imsmetn", "impcntr", "imbgeco", "imueclt", "imwbcnt"),
+               estimator = "WLSMV")

Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$.categorical,  :
  lavaan WARNING: automatically added intercepts are set to zero:
    [demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic demographic socioeconomic]
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    Could not compute standard errors! The information matrix could
    not be inverted. This may be a symptom that the model is not
    identified.
3: In lav_test_satorra_bentler(lavobject = NULL, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not invert information matrix needed for robust test statistic

Ingo Man

unread,
Apr 16, 2024, 4:24:17 AM4/16/24
to lavaan
Hello Dinesha,

have you tried a stepwise approach?
First, I would fit a modell without using the grouping variable, see if that works.
Second, to further explore why your model does not fit, more information are needed (i. e.: sample size, distribution of the variables (maybe zero-inflation problems?),...)
Third, what about first fitting single construct CFAs? Inspect the factor loadings, are they substantial?
Fourth, try a different estimator to see if the model fits in general.

According to the figures you posted, demographic and socioeconomic should not be correlated, maybe insert this restriction into your syntax: demographic ~~ 0*socioeconomic

By the way: what does the following part of the syntax mean? (the <~ statement), have not seen this before, maybe you meant =~?
demographic <~ agea + gndr + domicil
socioeconomic <~ hinctnta + eisced

Hope that helps.
Kindly, Marcus

Dinesha Dissanayake

unread,
Apr 16, 2024, 5:46:18 AM4/16/24
to lavaan
Hi Marcus,

Thank you for your thoughtful suggestions.

In my analysis I use year as the grouping variable and for the first model I consider 5 time points. My first approach was to fit the measurement model with only one endogenous latent variable and 6 indicator variables. Because of the multicollinearity issues I had to remove one variable. Then I fitted another multigroup model with 5 indicators and one latent variable. But RMSEA of that model was higher. Then EFA suggested two factors. So I modified my model to two endogenous latent variables and RMSEA value of the model was below 0.08. Model is as follows.

basic_model <-  'attitude1 =~ imsmetn  + impcntr
                 attitude2 =~ imbgeco + imueclt + imwbcnt
                 attitude1 ~~  attitude2'

I was able to confirm that the model holds measurement invariance. Then I compared latent means and identified the two time points with most noticeable change. My second modeling approach is to assess how latent mean change within those two time periods with the addition of demographic and socioeconomic factors to the model. Now I'm facing warning issues with that model.

I have indeed attempted a stepwise approach, starting without the grouping variable and advancing to a model with a single latent variable. Despite this, the warning messages persist. My initial exploratory factor analysis suggested two factors, which led to the current two-latent-variable model that satisfies measurement invariance with an RMSEA below 0.08.

Regarding your inquiries, the sample size is 7005, and I've ensured the variables are distributed appropriately with no zero-inflation issues. My dataset structure is as follows:

> str(swedish_df_age)
'data.frame': 7005 obs. of  12 variables:
 $ essround: int  6 6 6 6 6 6 6 6 6 6 ...
 $ imsmetn : int  3 4 4 3 3 2 4 4 3 3 ...
 $ imdfetn : int  3 4 4 3 3 2 3 4 2 2 ...
 $ impcntr : int  4 4 4 2 3 2 3 4 2 2 ...
 $ imbgeco : int  3 5 4 2 4 4 2 4 2 1 ...
 $ imueclt : int  4 4 4 4 4 4 4 4 2 4 ...
 $ imwbcnt : int  4 3 4 2 4 3 4 4 2 2 ...
 $ gndr    : int  2 1 1 1 1 1 2 2 2 2 ...
 $ agea    : int  4 4 4 4 4 4 4 4 4 4 ...
 $ domicil : int  2 5 3 3 4 1 1 4 4 3 ...
 $ eisced  : int  1 1 1 4 3 1 1 1 1 1 ...
 $ hinctnta: int  1 1 1 2 2 2 1 1 1 1 ...

> summary(swedish_df_age)
    essround         imsmetn         imdfetn         impcntr        imbgeco         imueclt    
 Min.   : 6.000   Min.   :1.000   Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000  
 1st Qu.: 7.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.00   1st Qu.:2.000   1st Qu.:3.000  
 Median : 8.000   Median :3.000   Median :3.000   Median :3.00   Median :3.000   Median :4.000  
 Mean   : 8.112   Mean   :3.247   Mean   :3.169   Mean   :3.09   Mean   :3.221   Mean   :3.783  
 3rd Qu.:10.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.00   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :10.000   Max.   :4.000   Max.   :4.000   Max.   :4.00   Max.   :5.000   Max.   :5.000  
    imwbcnt           gndr            agea          domicil         eisced        hinctnta    
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.00   Min.   :1.00   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.00   1st Qu.:3.00   1st Qu.:2.000  
 Median :4.000   Median :1.000   Median :2.000   Median :3.00   Median :4.00   Median :2.000  
 Mean   :3.508   Mean   :1.489   Mean   :2.335   Mean   :2.87   Mean   :3.52   Mean   :2.208  
 3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.:4.00   3rd Qu.:5.00   3rd Qu.:3.000  
 Max.   :5.000   Max.   :2.000   Max.   :4.000   Max.   :5.00   Max.   :5.00   Max.   :3.000


I have also attempted fitting single construct CFAs. However, the standard errors couldn't be computed due to the non-inversion of the information matrix (same warning issues appear).

Regarding the correlation between demographic and socioeconomic factors, I inserted the recommended restriction, but it did not resolve the issue.

As for the syntax demographic <~ agea + gndr + domicil, it indicates a formative model. Since I have a formative model associated with demographic and socioeconomic factors, I searched online and found that operator to form the formative relationships.

I appreciate your assistance and am open to any further insights you may have.

Warm regards,
Dinesha Dissanayake

Daniel Morillo Cuadrado

unread,
Apr 16, 2024, 5:49:10 AM4/16/24
to lav...@googlegroups.com
In addition to what Marcus suggests, I'd say you may want to check for data missingness and multicollinearity in your variables.

@Marcus the '<-' operator is for defining a latent variable as a formative construct; refer to Dinesha's diagram for the relationships between those variables, you'll see it clearer.

--
Daniel Morillo, Ph.D.
GitHub | ORCID

Dinesha Dissanayake

unread,
Apr 16, 2024, 6:02:31 AM4/16/24
to lavaan
Hi Daniel,

I did check for missing data and multicollinearity at the beginning of my analysis. To address missing values I used listwise deletion. Regarding multicollinearity, I identified and removed one out of six indicators due to a high correlation with the other two variables, which explains why I currently have only two indicators per latent variable for attitudes towards immigrants.

Your insights are much appreciated.

Best,
Dinesha Dissanayake

Daniel Morillo Cuadrado

unread,
Apr 16, 2024, 9:47:29 AM4/16/24
to lav...@googlegroups.com
I'm afraid that's as much as I can help you without seeing the data, I'm so sorry :)

I'd suggest you try Marcus' suggestions or, if you haven't tried it yet, check whether adding back the third indicator for "attitude1" helps solve it, just in case.

Otherwise, I'm sure other participants on this list will be able to give you further suggestions.

I hope you have good luck with your research!
Daniel

--
Daniel Morillo, Ph.D.
GitHub | ORCID

Ingo Man

unread,
Apr 17, 2024, 7:14:10 AM4/17/24
to lavaan
Hello again, nice too see what you have already done with your modelling :-). Without having the data myself, I would at least try to use a different estimator and play around with some settings in the sem-function-call (try lavaans' help-page).
Thanks for introducing the modelling of formative factor models, have not done this before in practice.

Hope you will find a solution.
Marcus

Dinesha Dissanayake

unread,
Apr 17, 2024, 7:39:48 AM4/17/24
to lavaan
Hi Daniel and  Marcus  ,

I truly appreciate your willingness to help. Based on your advices, I revisited the model specification. Interestingly, transforming formative indicators to reflective ones allowed the model to fit without the non-computing standard errors/ non-inversion of the information matrix issue.  However,  using the WLSMV estimator (all variables in my dataset is ordinal/ categorical variables with less than 6 levels) resulted in a following non-convergence warning:

Warning message:
In lavaan::lavaan(model = M1, data = swedish_df_age, ordered = TRUE,  :
  lavaan WARNING:
    the optimizer warns that a solution has NOT been found!

Subsequently, when I applied the ML estimator, the model converged without any issues. Yet, due to concerns about not assuming normality, I tested both MLR and MLM estimators. They also fit the model well but flagged an issue:

Warning message:

In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= -6.064597e-22) is smaller than zero. This may be a symptom that

    the model is not identified.

Given the data's ordinal nature, what would you suggest is the best estimator to use in this scenario? Should I be concerned about the warning regarding the variance-covariance matrix when using MLR and MLM estimators?

Thank you once again for your valuable input.

Warm regards,
Dinesha Dissanayake

Ingo Man

unread,
Apr 17, 2024, 10:38:31 AM4/17/24
to lavaan
Dear Dinesha,

fine. You should not worry much about the warning as Terence wrote here:

https://groups.google.com/g/lavaan/c/4y5pmqRz4nk


Concerning the estimator: have you tried other estimators? https://lavaan.ugent.be/tutorial/est.html? There is big discussion which estimator is best, a google search maybe helps here. In the context of structural equation modeling with ordinal data, the MLR (Robust Maximum Likelihood) and WLSMV (Weighted Least Squares with Means and Variances adjusted) estimators have proven to be practicable (see Li, 2016, p. 939).
I think you can try comparing both estimators. Why the model with the wlsmv does not fit is beyond my understanding. Have you played a bit with the settings?

Kindly,
Marcus

Dinesha Dissanayake

unread,
Apr 17, 2024, 1:46:46 PM4/17/24
to lavaan
Dear Marcus,

Thank you for your guidance and the resources. I have identified the cause of the issue—it was the gender variable. After removing this variable, both models, fitted with the WLSMV and MLR estimators, work perfectly fine without any warning messages.

Now, I have one remaining concern regarding the choice between two estimators: MLR (Standard errors: Sandwich) and MLM (Standard errors: Robust SEM). Could you advise on which would be most suitable when all variables are categorical? I checked the link you provided, but it doesn't specifically address this question.

Thank you again for your assistance.

Kind regards,
Dinesha 

Ingo Man

unread,
Apr 17, 2024, 7:30:52 PM4/17/24
to lavaan
Hey Dinesha,

great. Consider coding your dichotomous variables always 0/1! Does it fit now?

Can't imagine why there were problems, maybe because each of these 3 variables of the factor has a different metric? If you want to keep gender as a variable, you could z-standardize all variables within this factor and then take these variables into the CFA, maybe then it will work.

Concerning the choice of the estimators: please, search in academic databases concerning this issue, there should be simulation studies. Try out search terms such as: estimator choice categorical variables ... .I really like the MLR because you can use FIML - virtually the best choice when you have missing data --> try it out, because listwise deletion is only pragmatic but not a good choice in dealing with missingness.

Kindly,
Marcus

Dinesha Dissanayake

unread,
Apr 23, 2024, 7:07:27 AM4/23/24
to lavaan
Hi Marcus,

Thank you for your suggestions. I tried coding the dichotomous gender variables as 0/1 and also z-standardizing all variables within the latent factor that includes the gender variable. Unfortunately, neither method resolved the issue. Considering this, I am thinking of removing the gender variable or proceeding with the MLR estimator.

Thank you again for your guidance.

Kind regards,
Dinesha

Dinesha Dissanayake

unread,
May 27, 2024, 5:17:47 AM5/27/24
to lavaan
Hello Daniel , Marcus and everyone,

I am currently working with the lavaan package in R to analyze a dataset containing ordinal variables and I'm considering different estimation methods. I've come across Diagonally Weighted Least Squares (DWLS) and Weighted Least Squares Mean and Variance Adjusted (WLSMV). I would like to understand better:

  1. Are DWLS and WLSMV considered different estimation methods in lavaan?
  2. What are the key differences between these two methods, specifically in their implementation within lavaan?
  3. In what scenarios would one method be preferred over the other?
Any insights or experiences shared would be greatly appreciated!

Thank you!
Reply all
Reply to author
Forward
0 new messages