lavaan ERROR: initial model-implied matrix (Sigma) is not positive definite; check your model and/or starting parameters .

José Antonio Moreira de Rezende

unread,

Nov 30, 2024, 7:55:27 AM11/30/24

to lavaan

Hello, everyone.

I decided to open a new issue, in order to better organize my ask for help. I am available to provide further clarification if needed.

I received this error when I was trying to use lavaan package:

1: lavaan->lav_model_estimate():

initial model-implied matrix (Sigma) is not positive definite; check your model and/or starting parameters .

2: lavaan->lav_model_estimate():

initial model-implied matrix (Sigma) is not positive definite; check your model and/or starting parameters .

3: lavaan->lav_model_estimate():

initial model-implied matrix (Sigma) is not positive definite; check your model and/or starting parameters .

4: lavaan->lav_model_estimate():

initial model-implied matrix (Sigma) is not positive definite; check your model and/or starting parameters .

5: lavaan->lav_lavaan_step11_estoptim():
Model estimation FAILED! Returning starting values.

Here is the model:

model <- '
# measurement model
F1 =~ V8 + V9
F2 =~ V10 + V11 + V12 + V16 + V17 + V18 + V19 + V20 + V21 + V22
F3 =~ V13

# regressions
V3 ~ V4
V7 ~ V1 + F1
V8 ~ F1
V9 ~ F1
V10 ~ F2
V11 ~ F2
V12 ~ F2
V16 ~ F2
V17 ~ F2
V18 ~ F2
V19 ~ F2
V20 ~ F2
V21 ~ F2
V22 ~ F2
V13 ~ F3
V15 ~ V6 + V7 + F1
F1 ~ V1 + V3 + V4
F2 ~ V5
F3 ~ V5

# variances
F2 ~~ F3
V8 ~~ V9
V15 ~~ F3
V15 ~~ F2
'

The data used in my analysis comes from the 2010 demographic census conducted by IBGE, the Brazilian agency responsible for that year’s census survey. The data of interest is associated with a specific municipality in the state of Minas Gerais (Brazil), and my research aims to explore causal relationships within families whose per capita household income is up to half the minimum wage.

To create a dataset suitable for use with the lavaan package, the data was sourced from two files. The first file contains census information about households, and the second includes data about individuals surveyed. The CSV file I shared is the result of a probabilistic match, where each individual was randomly assigned to a household within the same income category. The resulting data was then adjusted to a normal distribution and scaled to have a mean of zero and a variance of one.

Thus, the CSV file contains both continuous, binary and categorical data. The categorical variables were adjusted following the approach detailed by Bollen (2014) in Chapter 9 (reference in the end of this message). As mentioned in my message from november 28th, variables V2 and V14 were excluded from the model because they resulted in zero variance during the data filtering process. Rows with missing data were also discarded.

Below is the description of each observable and latent variable:

V1: Age (integer)
V3: Highest level of academic education (categorical: 1 to 14)
V4: Ethnical (categorical: 1, 2, 3, 4, 5, 9)
V5: Household in urban or rural area (binary: 1, 2)
V6: Lives with a spouse or partner (categorical: 1, 2, 3)
V7: Total number of living children as of July 31, 2010 (integer)
V8: Employment status (categorical: 1 to 7)
V9: Was contributing to an official social security institution for any job held during the week of July 25–31, 2010 (categorical: 1, 2, 3)
V10: Presence of a refrigerator in the household (binary: 1, 2)
V11: Presence of a television in the household (binary: 1, 2)
V12: Number of bathrooms in the household (integer: 0 to 9)
V13: Source of water supply in the household (categorical: 1 to 10)
V15: Monthly per capita income in terms of minimum wages (real number)
V16: Presence of an electricity meter in the household (categorical: 1, 2, 3)
V17: Presence of a radio (binary: 1, 2)
V18: Presence of a washing machine (binary: 1, 2)
V19: Presence of a mobile phone (binary: 1, 2)
V20: Presence of a landline phone (binary: 1, 2)
V21: Presence of a computer (binary: 1, 2)
V22: Presence of a computer with internet access (binary: 1, 2)

The reference of Bollen (2014) is: Bollen, K. A. (2014). Structural Equations with Latent Variables. Wiley.

I hope this additional information helps clarify my query, and I am available to discuss any aspect of the dataset or model further. Thank you in advance for your assistance and have a nice weekend.

Best regards!

José Antonio Moreira de Rezende

unread,

Nov 30, 2024, 8:07:13 AM11/30/24

to lavaan

The description of latent variables is:

F1: Type of employment status
F2: Level of household comfort
F3: Basic sanitation

Chesnut, Ryan

unread,

Dec 1, 2024, 9:54:18 AM12/1/24

to lav...@googlegroups.com

One problem you might have is that you have a latent variable (F3) that is defined by a single indicator with no additional constraints. Latent variables with no additional constraints require three indicators to be identified.

Sent from my iPhone

On Dec 1, 2024, at 3:59 AM, José Antonio Moreira de Rezende <joseam...@gmail.com> wrote:

You don't often get email from joseam...@gmail.com. Learn why this is important

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/75483425-93dc-40f3-b900-6706c5db81edn%40googlegroups.com.

Message has been deleted

José Antonio Moreira de Rezende

unread,

Dec 1, 2024, 6:36:32 PM12/1/24

to lavaan

Interesting. But, how should I defnine the set of constraints for the latent variable F3?

Jasper Bogaert

unread,

Dec 3, 2024, 3:02:51 AM12/3/24

to lavaan

Hi, maybe you can have a look at the single indicator approach. See paper from Savalei (2019): http://dx.doi.org/10.1037/met0000181. If this does not resolve the problem, you could try using different starting values with the rstarts option (see information below which I copy pasted from the lavaan tutorial). Let's reevaluate afterwards.

rstarts: Integer. The number of refits that lavaan should try with random starting values. Random starting values are computed by drawing random numbers from a uniform distribution. Correlations are drawn from the interval [-0.5, +0.5] and then converted to covariances. Lower and upper bounds for (residual) variances are computed just like the standard bounds in bounded estimation. Random starting values are not computed for regression coefficients (which are always zero) and factor loadings of higher-order constructs (which are always unity). From all the runs that converged, the final solution is the one that resulted in the smallest value for the discrepancy function.

Best wishes,

Jasper

Op maandag 2 december 2024 om 00:36:32 UTC+1 schreef joseam...@gmail.com:

Reply all

Reply to author

Forward