Hi here again dear Stas Kolenikov,
I am quite confussed, as some forum entries as this one I have told you about (
https://groups.google.com/g/lavaan/c/4y5pmqRz4nk) tell that warning (
lavaan->lav_model_vcov(): The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The smallest eigenvalue (= -5.207304e-17) is smaller than zero. This may be a symptom that the model is not identified.) is a warning one has not to be very worried about. Does then my model has some serious issues as you see it? I would be very grateful if you can share your thoughts with me.
Here is my updated model:
model_1 <- '
##DIRECT EFFECTS
# regressions - MEDIATORS
BSR_mean ~ a1*VSP + a4*POV + a7*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
RaoQ_mean ~ a2*VSP + a5*POV + a8*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
INS ~ a3*VSP + a6*POV + a9*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
# regressions - WELLBEING OUTCOME
SWL ~ b1*BSR_mean + b8*RaoQ_mean + b15*INS + c1*VSP + c2*POV + c3*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
EW ~ b2*BSR_mean + b9*RaoQ_mean + b16*INS + c4*VSP + c5*POV + c6*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
affect_balance ~ b3*BSR_mean + b10*RaoQ_mean + b17*INS + c7*VSP + c8*POV + c9*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
PWI ~ b4*BSR_mean + b11*RaoQ_mean + b18*INS + c10*VSP + c14*POV + c18*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
free_time ~ b5*BSR_mean + b12*RaoQ_mean + b19*INS + c11*VSP + c15*POV + c19*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
environment ~ b6*BSR_mean + b13*RaoQ_mean + b20*INS + c12*VSP + c16*POV + c20*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
job_studies ~ b7*BSR_mean + b14*RaoQ_mean + b21*INS + c13*VSP + c17*POV + c21*NVC
+age+eq_m_income+gender+housing+latitude
+relationship_status+second_home_code+years_living
# variances and covariances
VSP ~~ POV
SWL ~~ EW
SWL ~~ affect_balance
SWL ~~ PWI
SWL ~~ free_time
SWL ~~ environment
SWL ~~ job_studies
EW ~~ affect_balance
EW ~~ PWI
EW ~~ free_time
EW ~~ environment
EW ~~ job_studies
affect_balance ~~ PWI
affect_balance ~~ free_time
affect_balance ~~ environment
affect_balance ~~ job_studies
PWI ~~ free_time
PWI ~~ environment
PWI ~~ job_studies
free_time ~~ environment
free_time ~~ job_studies
environment ~~ job_studies
##INDIRECT EFFECTS
# MEDIATION - indirect of VSP on SWL
SWL_VSP_BSR:=a1*b1
SWL_VSP_RaoQ:=a2*b8
SWL_VSP_INS:=a3*b15
# MEDIATION - indirect of POV on SWL
SWL_POV_BSR:=a4*b1
SWL_POV_RaoQ:=a5*b8
SWL_POV_INS:=a6*b15
# MEDIATION - indirect of NVC on SWL
SWL_NVC_BSR:=a7*b1
SWL_NVC_RaoQ:=a8*b8
SWL_NVC_INS:=a9*b15
# MEDIATION - indirect of VSP on EW
EW_VSP_BSR:=a1*b2
EW_VSP_RaoQ:=a2*b9
EW_VSP_INS:=a3*b16
# MEDIATION - indirect of POV on EW
EW_POV_BSR:=a4*b2
EW_POV_RaoQ:=a5*b9
EW_POV_INS:=a6*b16
# MEDIATION - indirect of NVC on EW
EW_NVC_BSR:=a7*b2
EW_NVC_RaoQ:=a8*b9
EW_NVC_INS:=a9*b16
# MEDIATION - indirect of VSP on affect_balance
affect_balance_VSP_BSR:=a1*b3
affect_balance_VSP_RaoQ:=a2*b10
affect_balance_VSP_INS:=a3*b17
# MEDIATION - indirect of POV on affect_balance
affect_balance_POV_BSR:=a4*b3
affect_balance_POV_RaoQ:=a5*b10
affect_balance_POV_INS:=a6*b17
# MEDIATION - indirect of NVC on affect_balance
affect_balance_NVC_BSR:=a7*b3
affect_balance_NVC_RaoQ:=a8*b10
affect_balance_NVC_INS:=a9*b17
# MEDIATION - indirect of VSP on PWI
PWI_VSP_BSR:=a1*b4
PWI_VSP_RaoQ:=a2*b11
PWI_VSP_INS:=a3*b18
# MEDIATION - indirect of POV on PWI
PWI_POV_BSR:=a4*b4
PWI_POV_RaoQ:=a5*b11
PWI_POV_INS:=a6*b18
# MEDIATION - indirect of NVC on PWI
PWI_NVC_BSR:=a7*b4
PWI_NVC_RaoQ:=a8*b11
PWI_NVC_INS:=a9*b18
# MEDIATION - indirect of VSP on free_time
free_time_VSP_BSR:=a1*b5
free_time_VSP_RaoQ:=a2*b12
free_time_VSP_INS:=a3*b19
# MEDIATION - indirect of POV on free_time
free_time_POV_BSR:=a4*b5
free_time_POV_RaoQ:=a5*b12
free_time_POV_INS:=a6*b19
# MEDIATION - indirect of NVC on free_time
free_time_NVC_BSR:=a7*b5
free_time_NVC_RaoQ:=a8*b12
free_time_NVC_INS:=a9*b19
# MEDIATION - indirect of VSP on environment
environment_VSP_BSR:=a1*b6
environment_VSP_RaoQ:=a2*b13
environment_VSP_INS:=a3*b20
# MEDIATION - indirect of POV on environment
environment_POV_BSR:=a4*b6
environment_POV_RaoQ:=a5*b13
environment_POV_INS:=a6*b20
# MEDIATION - indirect of NVC on environment
environment_NVC_BSR:=a7*b6
environment_NVC_RaoQ:=a8*b13
environment_NVC_INS:=a9*b20
# MEDIATION - indirect of VSP on job_studies
job_studies_VSP_BSR:=a1*b7
job_studies_VSP_RaoQ:=a2*b14
job_studies_VSP_INS:=a3*b21
# MEDIATION - indirect of POV on job_studies
job_studies_POV_BSR:=a4*b7
job_studies_POV_RaoQ:=a5*b14
job_studies_POV_INS:=a6*b21
# MEDIATION - indirect of NVC on job_studies
job_studies_NVC_BSR:=a7*b7
job_studies_NVC_RaoQ:=a8*b14
job_studies_NVC_INS:=a9*b21
'
And here is the code I run in R:
lavaan.fit_model_1 <- sem(model_1,
data = scaled_data,
cluster = "square.code",
estimator = "MLM",
se = "boot", bootstrap = 9999)
It gives me this warning as usual:
lavaan->lav_model_vcov():
The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The
smallest eigenvalue (= -6.042179e-17) is smaller than zero. This may be a symptom that the model is not identified.
And here is part of the model's summary output, where the information on the number of parameters as well as the number of clusters and number of degrees of freedom, is given:
lavaan 0.6-18 ended normally after 43 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 177
Number of observations 923
Number of clusters [square.code] 80
Model Test User Model:
Standard Scaled
Test Statistic 178.986 42.667
Degrees of freedom 21 21
P-value (Chi-square) 0.000 0.003
Scaling correction factor 4.195
Satorra-Bentler correction
Parameter Estimates:
Standard errors Robust.cluster.sem
Information Expected
Information saturated (h1) model Structured
In relation to your questions in one of your emails:
1. The limitation is the number of parameters vs. the number of the design degrees of freedom. (The covariance matrix is formed as the sum over the clusters, so its rank is no greater than the # of clusters - # of strata): I am not very sure if I fully understand this statement. If I am not wrong, the issue with my model is that there is much more parameters than degrees of freedom, right? Although I should say that now I do not include strata in my structure design, only cluster. Should I then reduce my model complexity (i.e. remove some variables)?
2. Did you really sample cities out of the list of cities, or did you select 8 cities for your project?: I have selected 8 cities for my project.
3. What is the relation of type_of_development to the sampling design? You specify it as strata, and looking at the output, it looks like it is cutting across cities: so I have two different strata. Each one is representing a type of urban development: land-sharing and land-sparing. In each one of the 8 cities, I have selected 5 land-sharing squares, and 5 land-sparing squares. So in each city we can find both strata represented. Anyways, I have removed strata from my model specification and now only include the cluster argument = square.code
4. nested=TRUE is used to indicate that the PSU IDs are recycled within strata. Some survey organizations like to just use structures like
stratum cluster
1 1
etc. instead of numbering the clusters uniquely. For your data structure, it may also be needed if the square.code is the same 1 through 10 within each city, rather than 101 to 110 in city 1, 201 to 210 in city 2, etc: I have removed the nested argument from my model specification. Now I only use the square.code variable (which is a unique ID for each square) as a cluster argument. In that way I suppose I do not have to use the strata argument neither.
5. Of course there is no problem in achieving perfect fit when you have 71 parameters to manipulate to arrive at a matrix of rank 15!: so I cannot rely then on my model's fit measures? Now the model parameters I have are a total of 177 while the rank of the matrix I suppose is 80, as that is the number of clusters I have.
Kind regards,
Anna