lavaan estimates under "underindentification

55 views
Skip to first unread message

Michael Filsecker

unread,
May 17, 2025, 4:54:35 PM5/17/25
to lav...@googlegroups.com
Dear all,

I intentionally specified a model so that it was "underidentified". Then I got the error message:
"Warnung: lavaan->lav_model_vcov():  
   Could not compute standard errors! The information matrix could not be inverted. This may be a  
   symptom that the model is not identified"

If underidentification means not enough information to produce unique estimates, what is the link between this and "the information matrix could not be inverted"?

Second, I then did "summary(fit)" and for my surprise lavaan provides estimates anyways. So if underidentification is the problem of not being able to produce unique estimates, what are these estimates that lavaan provides anyways? it is just a random set of results? But after re-run the analysis I obtained the same estimates...

Third, typically textbook explains the issue of underidentification as follows:

(1) a + b = 4; a and b "parameters"
In Eq. (1) we have two parameters and just one equation, so impossible to get unique values for a and b. Got it. But:

(2) y = a + bx
Eq. (2) represents a straight line or a simple regression. Here we have also 2 parameters and 1 equation, just like Eq. (1). How come we call this Eq. (2) "just identified" and not underidentified? the same for a multiple regression:
(3) y = a + b1x1 + b2x2
here we have 3 parameter and one equation....- like Eq (1) more parameters than equations, but we call this "just-identified", and not "underidentified". Why?

Thank you!!!

Kind regards,
Michael.

Jeremy Miles

unread,
May 18, 2025, 12:58:19 AM5/18/25
to lav...@googlegroups.com
=========

Hi Michael,

Here's my attempt to explain it.  Hopefully it's clear. :)

My additions are between rows of equals signs. (Like this one).

========


Dear all,

I intentionally specified a model so that it was "underidentified". Then I got the error message:
"Warnung: lavaan->lav_model_vcov():  
   Could not compute standard errors! The information matrix could not be inverted. This may be a  
   symptom that the model is not identified"

If underidentification means not enough information to produce unique estimates, what is the link between this and "the information matrix could not be inverted"?

=======

Here's my understanding:

It is not always clear to the program if the model is underidentified or not. If the information matrix cannot be inverted, that might be a sign that the model is not identified. (But it might not).

=======


Second, I then did "summary(fit)" and for my surprise lavaan provides estimates anyways. So if underidentification is the problem of not being able to produce unique estimates, what are these estimates that lavaan provides anyways? it is just a random set of results? But after re-run the analysis I obtained the same estimates...

=======

You would expect to obtain the same estimates from the same model and data. It's deterministic - you gave the same program the same data, you expect the same results.  These estimates are lavaan's best attempt at getting an answer.

It's also useful for diagnosing issues in the model - you look at the parameter estimates, and you see something you don't expect, you know you have written the model incorrectly (e.g. you expected two values to be equal, and they're not, you expect a value to be fixed to zero, and it's not).

To test for identification, try different start values. 

Here's an example. 

library(dplyr)
library(lavaan)

set.seed(42)

d <- data.frame(F = rnorm(1000)) %>%
dplyr::mutate(
y1 = rnorm(1000) + F,
y2 = rnorm(1000) + F
) %>% dplyr::select (-F)

model <- "F =~ y1 + y2"

fit <- lavaan::cfa(model, data = d)
print(summary(fit))


This model is not identified (it has -1 df), so I get the warning:
“lavaan->lav_model_vcov(): Could not compute standard errors! The information matrix could not be inverted. This may be a symptom that the model is not identified.”

And every time I run it I get the following estimates:

Latent Variables: Estimate Std.Err z-value P(>|z|) F =~ y1 1.000 y2 0.933 NA Variances: Estimate Std.Err z-value P(>|z|) .y1 0.951 NA .y2 1.110 NA F 1.044 NA

But if I add a start value I get different parameter estimates.


model_2 <- "F =~ y1 + start(0) * y2"

fit_2 <- lavaan::cfa(model_2, data = d)
print(summary(fit_2))



Latent Variables: Estimate Std.Err z-value P(>|z|) F =~ y1 1.000 y2 0.952 NA Variances: Estimate Std.Err z-value P(>|z|) .y1 0.972 NA .y2 1.092 NA F 1.023 NA

But if I fit a model that is just identified, with 0 df:
model_3 <- "F =~ y1 + y2 + y3"

fit_3 <- lavaan::cfa(model_3, data = d)
print(summary(fit_3))

Or if I fit the same model, with different start values:

model_4 <- "F =~ y1 + start(0) * y2 + start(0) * y3"


It takes one more iteration, but gets the same estimates - so the model is identified.

Latent Variables: Estimate Std.Err z-value P(>|z|) F =~ y1 1.000 y2 0.897 0.057 15.867 0.000 y3 1.001 0.063 16.018 0.000 Variances: Estimate Std.Err z-value P(>|z|) .y1 0.910 0.071 12.740 0.000 .y2 1.145 0.070 16.438 0.000 .y3 0.924 0.072 12.849 0.000 F 1.085 0.099 10.994 0.000




=======

Third, typically textbook explains the issue of underidentification as follows:

(1) a + b = 4; a and b "parameters"
In Eq. (1) we have two parameters and just one equation, so impossible to get unique values for a and b. Got it. But:

(2) y = a + bx
Eq. (2) represents a straight line or a simple regression. Here we have also 2 parameters and 1 equation, just like Eq. (1). How come we call this Eq. (2) "just identified" and not underidentified? the same for a multiple regression:
(3) y = a + b1x1 + b2x2
here we have 3 parameter and one equation....- like Eq (1) more parameters than equations, but we call this "just-identified", and not "underidentified". Why?



======

When you fit a model like:

(2) y = a + bx

The data contain one covariance, and two means. You are estimating one intercept and one slope (and also a mean of x, but regression models typically don't show you that because it's just the mean of x). So you put three pieces of information in, and you get three pieces out, so it's just identified.  

(3) y = a + b1x1 + b2x2


Think about you you would write this in lavaan:

y ~ 1
y ~ x1
y ~ x2
y ~~ y

You're getting one intercept, two slopes and a variance.  (that's 4 pieces of information)

Implicitly, you're also getting the covariance of x1 and x2 (which is ignored in regression), the means of x1 and x2, the variances of x1 and x3 - that's 5 more pieces of information, so you're getting 9 pieces of information from the model.

You're giving it the covariance matrix of x1, x2 and y. That's three variances, and 3 covariances (6 pieces of information), and 3 means (mean of x1, x2 and y). That's 9 pieces of information. So it's just identified.

Jeremy
 

Michael Filsecker

unread,
May 19, 2025, 2:13:41 PM5/19/25
to lav...@googlegroups.com
Thank you Jeremy, really appreciate your effort to explain and exemplify this for me:)
What a great community is this lavaan group btw.

Kind regards
Michael.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/CAMtGSxm9WpzxXhqoKD%2BGamq3B5mgYUML2k%2BT3pbeRxP1pBcBKA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages