sampstat, sample statistic varies from input covariance matrix

80 views
Skip to first unread message

Helene von Gugelberg

unread,
May 4, 2022, 7:43:55 AM5/4/22
to lavaan
Hello everyone

When I provide a covariance matrix for my model, and I later request the sample statistic for the observed data, there are notable discrepancies (after the thrid decimal).

I define the model as follows:
fit_kongenerisch <- cfa(kongenerisch, sample.cov = input_matrix, sample.nobs = 203, estimator = "ML" )

then get the sample statistic:
inspect(fit_kongenerisch, "sampstat")$cov

As I understand it, the sampstat should provide the covariance matrix from the observed data. When I directly provide the covariance matrix when running the model (which I do with sample.cov = input_matrix // input_matrix is a probability based covariance matrix I calculated), shouldnt input_matrix and the sampstat matrix be  identical ?

Or am I missing something?
What creates these discrepancies?

kind regards,
Helene

Shu Fai Cheung

unread,
May 4, 2022, 8:11:15 AM5/4/22
to lav...@googlegroups.com
Do they differ by a factor of (N - 1) / N, which is (203 - 1) / 203 in your case, as in the following example?

library(lavaan)
#> This is lavaan 0.6-11
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
dat <- HolzingerSwineford1939[, paste0("x", 1:9)]
dat_cov <- cov(dat)
n <- nrow(dat)
fit <- cfa(HS.model, sample.cov = dat_cov, sample.nobs = n)
fit
#> lavaan 0.6-11 ended normally after 35 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        21
#>                                                       
#>   Number of observations                           301
#>                                                       
#> Model Test User Model:
#>                                                       
#>   Test statistic                                85.306
#>   Degrees of freedom                                24
#>   P-value (Chi-square)                           0.000
fit_cov <- lavInspect(fit, "sampstat")$cov
as.matrix(fit_cov / dat_cov)
#>    x1    x2    x3    x4    x5    x6    x7    x8    x9   
#> x1 0.997                                                
#> x2 0.997 0.997                                          
#> x3 0.997 0.997 0.997                                    
#> x4 0.997 0.997 0.997 0.997                              
#> x5 0.997 0.997 0.997 0.997 0.997                        
#> x6 0.997 0.997 0.997 0.997 0.997 0.997                  
#> x7 0.997 0.997 0.997 0.997 0.997 0.997 0.997            
#> x8 0.997 0.997 0.997 0.997 0.997 0.997 0.997 0.997      
#> x9 0.997 0.997 0.997 0.997 0.997 0.997 0.997 0.997 0.997
(n - 1) / n
#> [1] 0.9966777
Regards,
Shu Fai Cheung (張樹輝)


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/30cfc14c-31f7-49e7-bfcb-120d9528c3a9n%40googlegroups.com.

Helene von Gugelberg

unread,
May 4, 2022, 9:53:00 AM5/4/22
to lavaan
I just checked, and is excatly the same as in your example above. All values differ by the factor (N - 1) / N.

What is the reason for this deviation?

Kind Regards,
Helene

Terrence Jorgensen

unread,
May 4, 2022, 12:33:24 PM5/4/22
to lavaan
All values differ by the factor (N - 1) / N.

What is the reason for this deviation?

Normal-theory ML is based on asymptotic theory.  A sample covariance matrix (e.g., returned by cov() function) divides by N − 1 to obtain an unbiased finite-sample estimate of the population covariances.  The practical effect of relying on asymptotic theory is to assume you are fitting the model to the population (or an infinite sample), so no finite-sample adjustment (divide by N, as returned by cov.wt() with argument method = "ML").  By default, lavaan will internally rescale your sample covariance matrix to conform to the population formula when likelihood = "normal".  To turn off this behavior, you can set sample.cov.rescale = FALSE, or instead rely on likelhood = "wishart" for models without a mean-structure, which uses N − 1 rather than N even in formulas for chi-squared, RMSEA, etc.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Message has been deleted

Helene von Gugelberg

unread,
May 6, 2022, 5:14:09 AM5/6/22
to lavaan
Is there a way to check what formula / type of correlation was used by lavaan when calculating the sample statisic for the cfa?

Thank you very much for the explanations so far. They are very helpful.

Terrence Jorgensen

unread,
May 6, 2022, 1:25:24 PM5/6/22
to lavaan
The default is likelihood = "normal" with sample.cov.rescale = TRUE.  If for some reason you don't know whether you changed the default, you can check the options in the fitted model using lavInspect(fit, "options")[c("likelihood","sample.cov.rescale")]
Reply all
Reply to author
Forward
0 new messages