implied versus observed covariance & Estimation information

Michael Filsecker

unread,

Apr 23, 2025, 6:35:22 AM4/23/25

to lav...@googlegroups.com

Dear all,

For didactical purposes, I am trying to get from lavaan, three things:

1) implied covariance matrix (which you can get with the function fitted, if I understand correctly)

2) The observed covariances (I thought it was the function "vcov", but that spit out a weird information. If I got it right, it is the covariances of the estimates, all of them...for what could we use this info? But more importantly how can I get the observed covariance??

3) the measurement errors of the measurement model (e.g., CFA), 'lavInspect, what="est"' provides the $theta matrix = variances of observed variables. But how can I get the "residuals", also observed variable = Loading + error (where can I get these error for each indicator?

Thank you!

Michael.

Message has been deleted

Felipe Vieira

unread,

Apr 23, 2025, 7:56:47 AM4/23/25

to lav...@googlegroups.com

Hi Michael,

1) "fitted(fit)$cov" will indeed give you the (cov)ariance matrix implied by the model.

2) Just use "cov(Data)" (from base R). "vcov" as in "lavInspect(fit, "vcov")" will give you a matrix that contains the (cov)ariance matrix of the estimated model parameters. This information could be useful for computing the SEs of certain parameters, for instance.

This link (https://rdrr.io/cran/lavaan/man/lavInspect.html) contains the answer to your questions and more. For instance, "lavInspect(fit, "cov.ov")" would give you the same as "fitted(fit)$cov".

3) The theta matrix contains the (cov)ariances of the residuals (measurement errors) for the observed variables. I may have misinterpreted your question, but if you want the difference between the observed (cov)ariance matrix and the model implied (cov)variance matrix, then "lavResiduals(fit)" should do it.

Check this link as well: https://rdrr.io/cran/lavaan/man/lavResiduals.html

Best,

Felipe.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/CACXLW89cv_bcv%3D1ktT5Capg1BE1%2BMUpbhUrfLTPopcPgUFx%2BHg%40mail.gmail.com.

Michael Filsecker

unread,

Apr 23, 2025, 10:56:25 AM4/23/25

to lav...@googlegroups.com

Thank you Felipe, that was very helpful!

As for questions 3) I am interested in the errors themselves not their var/cov...how can I get those from lavaan?

Thank you!

Michael.

Am Mi., 23. Apr. 2025 um 13:54 Uhr schrieb Felipe Vieira <felip...@gmail.com>:

Hi Michael,

1) "fitted(fit)$cov" will indeed give you the covariance matrix implied by the model.
2) Just use "cov(Data)" (from base R). "vcov" as in "lavInspect(fit, "vcov")" will give you a matrix that contains the cov(ariance) matrix of the estimated model parameters. This information in "vcov" could be useful for standard errors of certain parameters, for instance.

This link (https://rdrr.io/cran/lavaan/man/lavInspect.html) contains the answer to your questions and more. For instance, "lavInspect(fit, "cov.ov")" would give you the same as "fitted(fit)$cov".

3) The theta matrix contains the (cov)ariances of the residuals (measurement errors) for the observed variables. I may have misinterpreted your question, but if you want the difference between the observed (cov)ariance matrix and the model implied (cov)variance matrix, then "lavResiduals(fit)" should do it.

Check this link as well: https://rdrr.io/cran/lavaan/man/lavResiduals.html

Best,
Felipe.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/lavaan/CAAngqr%2B%2B2mQ3W9rj2bCPwfFe-Gz0rY5ZqHmJJdfJqxfPmhbYFQ%40mail.gmail.com.

Jeremy Miles

unread,

Apr 23, 2025, 2:35:39 PM4/23/25

to lav...@googlegroups.com

What do you mean by the errors themselves? The errors are (or can be thought of as) latent variables. They don't "exist", but they have means, variances, and covariances. Variances and covariances are in the theta matrix, means are zero.

Jeremy

To view this discussion visit https://groups.google.com/d/msgid/lavaan/CACXLW8_Of0rpnjEP_o3kK-SbVcuKJVcaPEiKAG3jM%3DjQ0g9avA%40mail.gmail.com.

Michael Filsecker

unread,

Apr 23, 2025, 5:12:25 PM4/23/25

to lav...@googlegroups.com

good question Jeremy, I need to think about it and I will get back to you:)

Michael.

To view this discussion visit https://groups.google.com/d/msgid/lavaan/CAMtGSxkCDAaQiYfj%3D6zzdJfYr71HctSGgubQC-9KT8K4HpLN5A%40mail.gmail.com.

Felipe Vieira

unread,

Apr 24, 2025, 3:23:44 AM4/24/25

to lav...@googlegroups.com

I am glad it helped!

Related to my previous answer, I forgot about "lavInspect(fit_sem, "sampstat")$cov" for the sample (co)variance matrix. It is important to know that this will return the "unscaled" covariance matrix as in "cov()" unless you do something like:

S <- cov(sem_data)*(N-1)/N
fit <- sem(model, sample.cov = S, sample.nobs = N, sample.cov.rescale=FALSE)
#lavInspect(fit, "sampstat")$cov

or:

S <- cov(sem_data)
fit <- sem(model, sample.cov = S, sample.nobs = N, sample.cov.rescale=TRUE)
#lavInspect(fit, "sampstat")$cov

From the lavaan tutorial, concerning the sample.cov.rescale: "If the estimator is ML (the default), then the sample variance-covariance matrix will be rescaled by a factor (N-1)/N. The reasoning is the following: the elements in a sample variance-covariance matrix have (usually) been divided by N-1. But the (normal-based) ML estimator would divide the elements by N. Therefore, we need to rescale. If you don’t want this to happen (for example in a simulation study), you can provide the argument sample.cov.rescale = FALSE."

Concerning your last question, you might be thinking of "casewise" discrepancies between observed and predicted values (i.e., sum of squared residuals) as in regression, but that is not what we minimize here. You could technically have something similar to the regression scenario using factor scores (but I don't think that is what you want from my understanding of your question + factor scores are indeterminate). I would suggest you check this line of work - which I will not comment further because I am not fully familiar - mentioned by Terrence in the answer(s): https://stats.stackexchange.com/questions/610751/how-to-plot-individual-case-residuals-icrs-from-lavaan-model.

Best,

Felipe.

To view this discussion visit https://groups.google.com/d/msgid/lavaan/CACXLW8_Of0rpnjEP_o3kK-SbVcuKJVcaPEiKAG3jM%3DjQ0g9avA%40mail.gmail.com.

Michael Filsecker

unread,

Apr 24, 2025, 6:49:33 AM4/24/25

to lav...@googlegroups.com

Hey Felipe,

that was awesome! I will check it out:) BTW, I wonder what are the reasons, I am not a statistician, that this re-scaling (N-1/N) seems to be SO important?? and the other similar questions why seems to be a big deal if you divided by N-1 versus N??

My last question I can not understand clearly, why a just identified model has a perfect fit? I just do not get the link between the two :(

Thank you so much for your help.

Kind regards,

Michael.

To view this discussion visit https://groups.google.com/d/msgid/lavaan/CAAngqrKNzXgR-bBS8DfsnHQ-9e%2BXO99N77AVskmMuPSLDgK7qQ%40mail.gmail.com.

Jeremy Miles

unread,

Apr 24, 2025, 12:20:25 PM4/24/25

to lav...@googlegroups.com

On Thu, 24 Apr 2025 at 03:49, Michael Filsecker <filsec...@gmail.com> wrote:

Hey Felipe,

that was awesome! I will check it out:) BTW, I wonder what are the reasons, I am not a statistician, that this re-scaling (N-1/N) seems to be SO important?? and the other similar questions why seems to be a big deal if you divided by N-1 versus N??

I think it's debated. Some SEM programs do it, and some do not.

My last question I can not understand clearly, why a just identified model has a perfect fit? I just do not get the link between the two :(

There isn't a link between the two. They are the same thing.

In a SEM you are testing whether you can reproduce the sample covariance matrix (and means, but we'll ignore that).

Think of a super simple SEM, with two variables: y1 and y2. They have variance = 1 and covariance = 0.3

My model, in Lavaan, is:

y1 ~~ y2

I've put one covariance into the model, and I've asked the model to estimate one covariance - that of y1 and y2. The model is saturated; it has 0 degrees of freedom and will have perfect fit (chi-square = 0), because it will exactly reproduce the data. It might be an interesting finding, but it is not an interesting structural equation model.

You can think of lots of statistical methods: t-test, anova, regression, ancova, manova, logistic regression, etc as structural equation models. They are all saturated and have perfect fit - so we don't worry about what it is (although sometimes we think about the log likelihood and calculate things like AIC). In those models we are only interested in the parameter estimates (and their CIs, etc), in SEMs we are interested in fit and parameters.

I hope that helps, a little.

Jeremy

P.S. I wrote a paper once about the equivalence of SEMs and other techniques - calculating power in SEMs is easier than calculating power in (say) MANOVA, so we can take advantage of the fact that the models are the same: https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-3-27.

Yves Rosseel

unread,

Apr 26, 2025, 8:14:34 AM4/26/25

to lav...@googlegroups.com

On 4/24/25 12:49, Michael Filsecker wrote:
> reasons, I am not a statistician, that this re-scaling (N-1/N) seems to
> be SO important?? and the other similar questions why seems to be a big
> deal if you divided by N-1 versus N??

In the early days of SEM, when SEM was just covariance structure
analysis, they were using maximum likelihood estimation based on the
Wishart distribution of the sample covariance matrix ('S'). In this
setting, theory dictates you need to use N-1.

But the Wishart distribution only applies when data is complete. If you
have missing data, we need to switch to the normal distribution. But the
normal theory dictates we should use N (instead of N-1).

Software (like lavaan and Mplus) that use the normal distribution for
the missing values case, also use the normal distribution (per default)
for the complete case. That just seems more consistent. Hence, lavaan
(and Mplus) insist on using 'N' whenever maximum likelihood is used.
This includes the computation of the sample covariance matrix.

(BTW, if your data is complete, you can use the likelihood = "Wishart"
option to switch to the Wishart setting, if you prefer this)

Yves.

--
Yves Rosseel
Department of Data Analysis, Ghent University

Michael Filsecker

unread,

Apr 26, 2025, 1:35:17 PM4/26/25

to lav...@googlegroups.com

thank you very much for the detailed explanation!:)

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/lavaan/8c8f47d1-2f9d-4a15-8f69-67d23cf2fd94%40gmail.com.

Reply all

Reply to author

Forward