# sampstat, sample statistic varies from input covariance matrix

51 views

### Helene von Gugelberg

May 4, 2022, 7:43:55 AMMay 4
to lavaan
Hello everyone

When I provide a covariance matrix for my model, and I later request the sample statistic for the observed data, there are notable discrepancies (after the thrid decimal).

I define the model as follows:
fit_kongenerisch <- cfa(kongenerisch, sample.cov = input_matrix, sample.nobs = 203, estimator = "ML" )

then get the sample statistic:
inspect(fit_kongenerisch, "sampstat")\$cov

As I understand it, the sampstat should provide the covariance matrix from the observed data. When I directly provide the covariance matrix when running the model (which I do with sample.cov = input_matrix // input_matrix is a probability based covariance matrix I calculated), shouldnt input_matrix and the sampstat matrix be  identical ?

Or am I missing something?
What creates these discrepancies?

kind regards,
Helene

### Shu Fai Cheung

May 4, 2022, 8:11:15 AMMay 4
Do they differ by a factor of (N - 1) / N, which is (203 - 1) / 203 in your case, as in the following example?

``````library(lavaan)
#> This is lavaan 0.6-11
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9 '
dat <- HolzingerSwineford1939[, paste0("x", 1:9)]
dat_cov <- cov(dat)
n <- nrow(dat)
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = n)
fit
#> lavaan 0.6-11 ended normally after 35 iterations
#>
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        21
#>
#>   Number of observations                           301
#>
#> Model Test User Model:
#>
#>   Test statistic                                85.306
#>   Degrees of freedom                                24
#>   P-value (Chi-square)                           0.000
fit_cov <- lavInspect(fit, "sampstat")\$cov
as.matrix(fit_cov / dat_cov)
#>    x1    x2    x3    x4    x5    x6    x7    x8    x9
#> x1 0.997
#> x2 0.997 0.997
#> x3 0.997 0.997 0.997
#> x4 0.997 0.997 0.997 0.997
#> x5 0.997 0.997 0.997 0.997 0.997
#> x6 0.997 0.997 0.997 0.997 0.997 0.997
#> x7 0.997 0.997 0.997 0.997 0.997 0.997 0.997
#> x8 0.997 0.997 0.997 0.997 0.997 0.997 0.997 0.997
#> x9 0.997 0.997 0.997 0.997 0.997 0.997 0.997 0.997 0.997
(n - 1) / n
#>  0.9966777``````
Regards,
Shu Fai Cheung (張樹輝)

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

### Helene von Gugelberg

May 4, 2022, 9:53:00 AMMay 4
to lavaan
I just checked, and is excatly the same as in your example above. All values differ by the factor (N - 1) / N.

What is the reason for this deviation?

Kind Regards,
Helene

### Terrence Jorgensen

May 4, 2022, 12:33:24 PMMay 4
to lavaan
All values differ by the factor (N - 1) / N.

What is the reason for this deviation?

Normal-theory ML is based on asymptotic theory.  A sample covariance matrix (e.g., returned by cov() function) divides by N − 1 to obtain an unbiased finite-sample estimate of the population covariances.  The practical effect of relying on asymptotic theory is to assume you are fitting the model to the population (or an infinite sample), so no finite-sample adjustment (divide by N, as returned by cov.wt() with argument method = "ML").  By default, lavaan will internally rescale your sample covariance matrix to conform to the population formula when likelihood = "normal".  To turn off this behavior, you can set sample.cov.rescale = FALSE, or instead rely on likelhood = "wishart" for models without a mean-structure, which uses N − 1 rather than N even in formulas for chi-squared, RMSEA, etc.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Message has been deleted

### Helene von Gugelberg

May 6, 2022, 5:14:09 AMMay 6
to lavaan
Is there a way to check what formula / type of correlation was used by lavaan when calculating the sample statisic for the cfa?

Thank you very much for the explanations so far. They are very helpful.