lavaan.survey Warning: some of the standard errors may not be trustworthy...

Anna Ramos Chernenko

unread,

Oct 10, 2024, 3:40:13 PMOct 10

to lavaan

Hi,

I am encountering constantly this warning:

Some of the standard errors may not be trustworthy. Some of the observed covariances or means are collinear, and this has generated collinearity in your parameter estimates. This may be a sample size issue, missing data problem, or due to having too few clusters relative to the number of parameters. Problem encountered in group(s) 1

My model is the following:

model_1 <- '
##DIRECT EFFECTS

# regressions - MEDIATORS
BSR_mean ~ a1*NGP + a6*PBVC + a11*NVC
+age+eq_m_income+housing
+years_living
RaoQ_mean ~ a2*NGP + a7*PBVC + a12*NVC
+age+eq_m_income+housing
+years_living
INS ~ a3*NGP + a8*PBVC + a13*NVC
+age+eq_m_income+housing
+years_living

# regressions - WELLBEING OUTCOME
SWL ~ b1*BSR_mean + b2*RaoQ_mean + b3*INS + c1*NGP + c2*PBVC + c3*NVC
+age+eq_m_income+housing
+years_living
EW ~ b4*BSR_mean + b5*RaoQ_mean + b6*INS + c4*NGP + c5*PBVC + c6*NVC
+age+eq_m_income+housing
+years_living
affect_balance ~ b7*BSR_mean + b8*RaoQ_mean + b9*INS + c7*NGP + c8*PBVC + c9*NVC
+age+eq_m_income+housing
+years_living

# variances and covariances
NGP ~~ PBVC
SWL ~~ EW
SWL ~~ affect_balance
EW ~~ affect_balance

##INDIRECT EFFECTS

# MEDIATION - indirect of NGP on SWL
SWL_NGP_BSR:=a1*b1
SWL_NGP_RaoQ:=a2*b2
SWL_NGP_INS:=a3*b3
# MEDIATION - indirect of PBVC on SWL
SWL_PBVC_BSR:=a6*b1
SWL_PBVC_RaoQ:=a7*b2
SWL_PBVC_INS:=a8*b3
# MEDIATION - indirect of NVC on SWL
SWL_NVC_BSR:=a11*b1
SWL_NVC_RaoQ:=a12*b2
SWL_NVC_INS:=a13*b3

# MEDIATION - indirect of NGP on EW
EW_NGP_BSR:=a1*b4
EW_NGP_RaoQ:=a2*b5
EW_NGP_INS:=a3*b6
# MEDIATION - indirect of PBVC on EW
EW_PBVC_BSR:=a6*b4
EW_PBVC_RaoQ:=a7*b5
EW_PBVC_INS:=a8*b6
# MEDIATION - indirect of NVC on EW
EW_NVC_BSR:=a11*b4
EW_NVC_RaoQ:=a12*b5
EW_NVC_INS:=a13*b6

# MEDIATION - indirect of NGP on affect_balance
affect_balance_NGP_BSR:=a1*b7
affect_balance_NGP_RaoQ:=a2*b8
affect_balance_NGP_INS:=a3*b9
# MEDIATION - indirect of PBVC on affect_balance
affect_balance_PBVC_BSR:=a6*b7
affect_balance_PBVC_RaoQ:=a7*b8
affect_balance_PBVC_INS:=a8*b9
# MEDIATION - indirect of NVC on affect_balance
affect_balance_NVC_BSR:=a11*b7
affect_balance_NVC_RaoQ:=a12*b8
affect_balance_NVC_INS:=a13*b9
'

I have sampled 8 cities, and within each city, I have selected 10 squares, and finally within each square, I have sampled 10 respondents. If I am not wrong, this is called as two-stage cluster sampling. That's why I have set my sampling design like this:

scaled_data$weights <- 1 # Create a weights column with value 1

survey.design.imp <- svydesign(ids = ~city + square.code, # Two-stage cluster sampling
strata =~type_of_development, # land-sharing versus land-sparing
nest = TRUE, # Specify nested structure (squares nested within cities)
weights = ~weights,
data = scaled_data)

And then after imputing that survey design on my model (lavaan.survey(lavaan.fit = lavaan.fit_model_1, survey.design = survey.design.imp)), I get the mentioned warning.

I was researching this could be because of the number of parameters being higher than the number of clusters. But this is not my case as I have 71 parameters (see part of the summary of the model below):

lavaan 0.6-18 ended normally after 25 iterations Estimator ML Optimization method NLMINB Number of model parameters 71 Number of observations 923

And 80 clusters in the second stage cluster sampling:

Stratified 2 - level Cluster Sampling design (with replacement) With (16, 80) clusters. svydesign(ids = ~city + square.code, strata = ~type_of_development, nest = TRUE, weights = ~weights, data = scaled_data) Probabilities: Min. 1st Qu. Median Mean 3rd Qu. Max. 1 1 1 1 1 1 First-level Stratum Sizes: land sharing land sparing obs 463 460 design.PSU 8 8 actual.PSU 8 8

I have also read this warning can appear because of a very complex survey design (https://doi.org/10.18637/jss.v057.i01), that I supposse is my case.

So should I worry about this warning or it is not so important? Furthermore, I have seen my fit model indices are very good:

> fitmeasures(survey.fit.imp,c("pvalue","cfi.robust","srmr","rmsea.scaled","rmsea.ci.lower.scaled","rmsea.ci.upper.scaled")) pvalue cfi.robust srmr rmsea.scaled rmsea.ci.lower.scaled 0.003 1.000 0.021 0.000 0.000 rmsea.ci.upper.scaled 0.000

Except for p-value (should be non significant), but it is fixed when correcting for survey design:

> pval.pFsum(survey.fit.imp, survey.design=survey.design.imp) [1] 0.8351549

I will highly appreciate your input on this.

Thanks

Anna

Stas Kolenikov

unread,

Oct 12, 2024, 12:51:05 AM (14 days ago) Oct 12

to lav...@googlegroups.com

1. The limitation is the number of parameters vs. the number of the design degrees of freedom. (The covariance matrix is formed as the sum over the clusters, so its rank is no greater than the # of clusters - # of strata).

2. Did you really sample cities out of the list of cities, or did you select 8 cities for your project?

3. What is the relation of type_of_development to the sampling design? You specify it as strata, and looking at the output, it looks like it is cutting across cities.

4. nested=TRUE is used to indicate that the PSU IDs are recycled within strata. Some survey organizations like to just use structures like

stratum cluster

1 1

1 2

2 1

2 2

etc. instead of numbering the clusters uniquely. For your data structure, it may also be needed if the square.code is the same 1 through 10 within each city, rather than 101 to 110 in city 1, 201 to 210 in city 2, etc.

5. Of course there is no problem in achieving perfect fit when you have 71 parameters to manipulate to arrive at a matrix of rank 15!

-- Stas Kolenikov, PhD, PStat (ASA, SSC)

-- Principal Statistician, NORC @NORCnews

-- Opinions stated in this email are mine only, and do not reflect the position of my employer

-- Social media: @StatStas [ Twitter | mastodon.online ]

-- http://stas.kolenikov.name

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/8d6e121d-118f-465c-87e7-976b4690cecen%40googlegroups.com.

Anna Ramos Chernenko

unread,

Oct 17, 2024, 6:29:33 PM (8 days ago) Oct 17

to lavaan

Hi Stas Kolenikov,

thanks very much for the very informative answer. So I have shifted from the lavaan.survey package to lavaan package, and thus, I am using the cluster argument in the sem() function. In the cluster argument I have put the square.code variable. And for now, I am not receiving those warnings, only this one:

lavaan->lav_model_vcov(): The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The smallest eigenvalue (= -5.207304e-17) is smaller than zero. This may be a symptom that the model is not identified.

That I have understood is one warning I do not have to be very much concerned about, as Terence wrote here:

https://groups.google.com/g/lavaan/c/4y5pmqRz4nk

Kind regards,

Anna

Stas Kolenikov

unread,

Oct 17, 2024, 7:09:49 PM (8 days ago) Oct 17

to lav...@googlegroups.com

It is exactly the same issue. You cannot meaningfully avoid it.

-- Stas Kolenikov, PhD, PStat (ASA, SSC)

-- Principal Statistician, NORC @NORCnews

-- Opinions stated in this email are mine only, and do not reflect the position of my employer

-- Social media: @StatStas [ Twitter | mastodon.online ]

-- http://stas.kolenikov.name

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/b0a96bec-8ca2-4d07-a79e-c0b25e75eabbn%40googlegroups.com.

Anna Ramos Chernenko

unread,

Oct 22, 2024, 3:20:44 PM (3 days ago) Oct 22

to lavaan

Hi here again dear Stas Kolenikov,

I am quite confussed, as some forum entries as this one I have told you about (https://groups.google.com/g/lavaan/c/4y5pmqRz4nk) tell that warning (lavaan->lav_model_vcov(): The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The smallest eigenvalue (= -5.207304e-17) is smaller than zero. This may be a symptom that the model is not identified.) is a warning one has not to be very worried about. Does then my model has some serious issues as you see it? I would be very grateful if you can share your thoughts with me.

Here is my updated model:

model_1 <- '
##DIRECT EFFECTS

# regressions - MEDIATORS

BSR_mean ~ a1*VSP + a4*POV + a7*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
RaoQ_mean ~ a2*VSP + a5*POV + a8*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
INS ~ a3*VSP + a6*POV + a9*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living

# regressions - WELLBEING OUTCOME

SWL ~ b1*BSR_mean + b8*RaoQ_mean + b15*INS + c1*VSP + c2*POV + c3*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
EW ~ b2*BSR_mean + b9*RaoQ_mean + b16*INS + c4*VSP + c5*POV + c6*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
affect_balance ~ b3*BSR_mean + b10*RaoQ_mean + b17*INS + c7*VSP + c8*POV + c9*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
PWI ~ b4*BSR_mean + b11*RaoQ_mean + b18*INS + c10*VSP + c14*POV + c18*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
free_time ~ b5*BSR_mean + b12*RaoQ_mean + b19*INS + c11*VSP + c15*POV + c19*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
environment ~ b6*BSR_mean + b13*RaoQ_mean + b20*INS + c12*VSP + c16*POV + c20*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
job_studies ~ b7*BSR_mean + b14*RaoQ_mean + b21*INS + c13*VSP + c17*POV + c21*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living

# variances and covariances
VSP ~~ POV

SWL ~~ EW
SWL ~~ affect_balance

SWL ~~ PWI
SWL ~~ free_time
SWL ~~ environment
SWL ~~ job_studies

EW ~~ affect_balance
EW ~~ PWI
EW ~~ free_time
EW ~~ environment
EW ~~ job_studies

affect_balance ~~ PWI
affect_balance ~~ free_time
affect_balance ~~ environment
affect_balance ~~ job_studies

PWI ~~ free_time
PWI ~~ environment
PWI ~~ job_studies

free_time ~~ environment
free_time ~~ job_studies

environment ~~ job_studies

##INDIRECT EFFECTS

# MEDIATION - indirect of VSP on SWL
SWL_VSP_BSR:=a1*b1
SWL_VSP_RaoQ:=a2*b8
SWL_VSP_INS:=a3*b15
# MEDIATION - indirect of POV on SWL
SWL_POV_BSR:=a4*b1
SWL_POV_RaoQ:=a5*b8
SWL_POV_INS:=a6*b15

# MEDIATION - indirect of NVC on SWL

SWL_NVC_BSR:=a7*b1
SWL_NVC_RaoQ:=a8*b8
SWL_NVC_INS:=a9*b15

# MEDIATION - indirect of VSP on EW
EW_VSP_BSR:=a1*b2
EW_VSP_RaoQ:=a2*b9
EW_VSP_INS:=a3*b16
# MEDIATION - indirect of POV on EW
EW_POV_BSR:=a4*b2
EW_POV_RaoQ:=a5*b9
EW_POV_INS:=a6*b16

# MEDIATION - indirect of NVC on EW

EW_NVC_BSR:=a7*b2
EW_NVC_RaoQ:=a8*b9
EW_NVC_INS:=a9*b16

# MEDIATION - indirect of VSP on affect_balance
affect_balance_VSP_BSR:=a1*b3
affect_balance_VSP_RaoQ:=a2*b10
affect_balance_VSP_INS:=a3*b17
# MEDIATION - indirect of POV on affect_balance
affect_balance_POV_BSR:=a4*b3
affect_balance_POV_RaoQ:=a5*b10
affect_balance_POV_INS:=a6*b17

# MEDIATION - indirect of NVC on affect_balance

affect_balance_NVC_BSR:=a7*b3
affect_balance_NVC_RaoQ:=a8*b10
affect_balance_NVC_INS:=a9*b17

# MEDIATION - indirect of VSP on PWI
PWI_VSP_BSR:=a1*b4
PWI_VSP_RaoQ:=a2*b11
PWI_VSP_INS:=a3*b18
# MEDIATION - indirect of POV on PWI
PWI_POV_BSR:=a4*b4
PWI_POV_RaoQ:=a5*b11
PWI_POV_INS:=a6*b18
# MEDIATION - indirect of NVC on PWI
PWI_NVC_BSR:=a7*b4
PWI_NVC_RaoQ:=a8*b11
PWI_NVC_INS:=a9*b18

# MEDIATION - indirect of VSP on free_time
free_time_VSP_BSR:=a1*b5
free_time_VSP_RaoQ:=a2*b12
free_time_VSP_INS:=a3*b19
# MEDIATION - indirect of POV on free_time
free_time_POV_BSR:=a4*b5
free_time_POV_RaoQ:=a5*b12
free_time_POV_INS:=a6*b19
# MEDIATION - indirect of NVC on free_time
free_time_NVC_BSR:=a7*b5
free_time_NVC_RaoQ:=a8*b12
free_time_NVC_INS:=a9*b19

# MEDIATION - indirect of VSP on environment
environment_VSP_BSR:=a1*b6
environment_VSP_RaoQ:=a2*b13
environment_VSP_INS:=a3*b20
# MEDIATION - indirect of POV on environment
environment_POV_BSR:=a4*b6
environment_POV_RaoQ:=a5*b13
environment_POV_INS:=a6*b20
# MEDIATION - indirect of NVC on environment
environment_NVC_BSR:=a7*b6
environment_NVC_RaoQ:=a8*b13
environment_NVC_INS:=a9*b20

# MEDIATION - indirect of VSP on job_studies
job_studies_VSP_BSR:=a1*b7
job_studies_VSP_RaoQ:=a2*b14
job_studies_VSP_INS:=a3*b21
# MEDIATION - indirect of POV on job_studies
job_studies_POV_BSR:=a4*b7
job_studies_POV_RaoQ:=a5*b14
job_studies_POV_INS:=a6*b21
# MEDIATION - indirect of NVC on job_studies
job_studies_NVC_BSR:=a7*b7
job_studies_NVC_RaoQ:=a8*b14
job_studies_NVC_INS:=a9*b21
'

And here is the code I run in R:

lavaan.fit_model_1 <- sem(model_1,
data = scaled_data,
cluster = "square.code",
estimator = "MLM",
se = "boot", bootstrap = 9999)

It gives me this warning as usual:

lavaan->lav_model_vcov(): The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The smallest eigenvalue (= -6.042179e-17) is smaller than zero. This may be a symptom that the model is not identified.

And here is part of the model's summary output, where the information on the number of parameters as well as the number of clusters and number of degrees of freedom, is given:

lavaan 0.6-18 ended normally after 43 iterations Estimator ML Optimization method NLMINB Number of model parameters 177 Number of observations 923 Number of clusters [square.code] 80

Model Test User Model: Standard Scaled Test Statistic 178.986 42.667 Degrees of freedom 21 21 P-value (Chi-square) 0.000 0.003 Scaling correction factor 4.195 Satorra-Bentler correction Parameter Estimates: Standard errors Robust.cluster.sem Information Expected Information saturated (h1) model Structured

In relation to your questions in one of your emails:

1. The limitation is the number of parameters vs. the number of the design degrees of freedom. (The covariance matrix is formed as the sum over the clusters, so its rank is no greater than the # of clusters - # of strata): I am not very sure if I fully understand this statement. If I am not wrong, the issue with my model is that there is much more parameters than degrees of freedom, right? Although I should say that now I do not include strata in my structure design, only cluster. Should I then reduce my model complexity (i.e. remove some variables)?

2. Did you really sample cities out of the list of cities, or did you select 8 cities for your project?: I have selected 8 cities for my project.

3. What is the relation of type_of_development to the sampling design? You specify it as strata, and looking at the output, it looks like it is cutting across cities: so I have two different strata. Each one is representing a type of urban development: land-sharing and land-sparing. In each one of the 8 cities, I have selected 5 land-sharing squares, and 5 land-sparing squares. So in each city we can find both strata represented. Anyways, I have removed strata from my model specification and now only include the cluster argument = square.code

4. nested=TRUE is used to indicate that the PSU IDs are recycled within strata. Some survey organizations like to just use structures like

stratum cluster

1 1

1 2

2 1

2 2

etc. instead of numbering the clusters uniquely. For your data structure, it may also be needed if the square.code is the same 1 through 10 within each city, rather than 101 to 110 in city 1, 201 to 210 in city 2, etc: I have removed the nested argument from my model specification. Now I only use the square.code variable (which is a unique ID for each square) as a cluster argument. In that way I suppose I do not have to use the strata argument neither.

5. Of course there is no problem in achieving perfect fit when you have 71 parameters to manipulate to arrive at a matrix of rank 15!: so I cannot rely then on my model's fit measures? Now the model parameters I have are a total of 177 while the rank of the matrix I suppose is 80, as that is the number of clusters I have.