ML discrepancy function

394 views
Skip to first unread message

Terrence Jorgensen

unread,
Dec 5, 2012, 1:59:06 PM12/5/12
to lav...@googlegroups.com
Yves,

I've been curious about something for a while now, and I haven't known where to look for an answer.  In case anyone else is interested, I decided to post the question here.

I was under the impression that the chi-squared value provided under ML estimation is calculated by multiplying the sample size (or N - 1 in LISREL) by the ML discrepancy function.  I'm also under the impression that the ML discrepancy function = -2LL (negative 2 times the log-likelihood), which is equal to the formula provided in many sources (e.g., Bollen, 1989; Browne & Cudeck, 1992):

ML-discFunc = log( | Sigma-hat | ) - log( | S | ) + trace(S * inverse-Sigma-hat) - p

where S is the observed covariance matrix, Sigma-hat is the model-implied covariance matrix, and p = number of observed variables in your model.  An additional term can be added to include mean structure.

lavaan output provides a log-likelihood value, but I don't see how lavaan's "logl" is related to the "chisq" value in the same output.  Here is reproducible R syntax/output with comments to illustrate my curiosity.  Could you explain what I fail to understand, or provide a citation that will provide some illumination, please?

Thank you in advance,

Terry


> ## The famous Holzinger and Swineford (1939) example
> HS.model <- ' visual  =~ x1 + x2 + x3
+               textual =~ x4 + x5 + x6
+               speed   =~ x7 + x8 + x9 '
>
> fit <- cfa(HS.model, data=HolzingerSwineford1939)
> x <- inspect(fit, "fit")[c("chisq", "ntotal", "logl")]
> x
      chisq      ntotal        logl
   85.30552   301.00000 -3737.74493
>
> ## these should both yield the same ML discrepancy value
> x["chisq"] / x["ntotal"]
   chisq
0.283407
> -2 * x["logl"]
   logl
7475.49
>
> ## these should yield the same chi-squared value
> x["chisq"]
   chisq
85.30552
> -2 * x["logl"] * x["ntotal"]
   logl
2250122
>
> ## using Bollen's (1989) formula for the discrepancy function
> psi <- inspect(fit, "coef")$psi
> lambda <- inspect(fit, "coef")$lambda
> theta <- inspect(fit, "coef")$theta
>
> SigmaHat <- (lambda %*% psi %*% t(lambda)) + theta
> S <- cov(HolzingerSwineford1939[ , paste0("x", 1:9)])
> myTrace <- sum(diag(S %*% solve(SigmaHat)))
> p <- nrow(S)
>
> round(SigmaHat, 3)
      x1    x2    x3    x4    x5    x6    x7    x8    x9
x1 1.358 0.448 0.590 0.408 0.454 0.378 0.262 0.309 0.284
x2 0.448 1.382 0.327 0.226 0.252 0.209 0.145 0.171 0.157
x3 0.590 0.327 1.275 0.298 0.331 0.276 0.191 0.226 0.207
x4 0.408 0.226 0.298 1.351 1.090 0.907 0.173 0.205 0.188
x5 0.454 0.252 0.331 1.090 1.660 1.010 0.193 0.228 0.209
x6 0.378 0.209 0.276 0.907 1.010 1.196 0.161 0.190 0.174
x7 0.262 0.145 0.191 0.173 0.193 0.161 1.183 0.453 0.415
x8 0.309 0.171 0.226 0.205 0.228 0.190 0.453 1.022 0.490
x9 0.284 0.157 0.207 0.188 0.209 0.174 0.415 0.490 1.015
> round(S, 3)
      x1     x2    x3    x4    x5    x6     x7    x8    x9
x1 1.363  0.409 0.582 0.507 0.442 0.456  0.085 0.265 0.460
x2 0.409  1.386 0.453 0.210 0.212 0.248 -0.097 0.110 0.245
x3 0.582  0.453 1.279 0.209 0.113 0.245  0.089 0.213 0.375
x4 0.507  0.210 0.209 1.355 1.101 0.899  0.220 0.126 0.244
x5 0.442  0.212 0.113 1.101 1.665 1.018  0.143 0.181 0.296
x6 0.456  0.248 0.245 0.899 1.018 1.200  0.145 0.166 0.237
x7 0.085 -0.097 0.089 0.220 0.143 0.145  1.187 0.537 0.375
x8 0.265  0.110 0.213 0.126 0.181 0.166  0.537 1.025 0.459
x9 0.460  0.245 0.375 0.244 0.296 0.237  0.375 0.459 1.018
> myTrace
[1] 9.029998
> p
[1] 9
>
> MLdisc <- log(det(SigmaHat)) - log(det(S)) + myTrace - p
> MLdisc
[1] 0.2834569
>
> ## manually calculating Chi-Squared from the ML-discrepancy
> ## yields (close to) the same chi-squared that lavaan provides
> myChi <- MLdisc * x["ntotal"]
> myChi
  ntotal
85.32054
> x["chisq"]
   chisq
85.30552
>

yrosseel

unread,
Dec 5, 2012, 3:46:24 PM12/5/12
to lav...@googlegroups.com
> I was under the impression that the chi-squared value provided under ML
> estimation is calculated by multiplying the sample size (or N - 1 in
> LISREL) by the ML discrepancy function.

That is correct. Also in lavaan.

I'm also under the impression
> that the ML discrepancy function = -2LL (negative 2 times the
> log-likelihood), which is equal to the formula provided in many sources
> (e.g., Bollen, 1989; Browne & Cudeck, 1992):
>
> ML-discFunc = log( | Sigma-hat | ) - log( | S | ) + trace(S *
> inverse-Sigma-hat) - p

Not in lavaan, as you have observed.

There are many ways to compute a (log)likelihood. What lavaan is doing
is this: we simply calculate the (log)likelihood of the data, assuming a
multivariate normal distribution with Sigma equal to the model-implied
variance covariance matrix, and Mu equal to either the observed mean
vector (if meanstructure=FALSE) or the model-implied meanstructure (if
meanstructure=TRUE).

For example,

library(lavaan)
example(cfa)
fitMeasures(fit, "logl")
# logl
#-3737.745

Now, we need three components: the Data, the model-implied
variance/covariance matrix, and the (observed) mean vector:

Data <- HolzingerSwineford1939[,7:15]
Sigma <- unclass(fitted(fit)$cov)
Mean <- as.numeric(inspect(fit, "sampstat")$mean)

Using 'dmnorm' from the mnormt package to compute likelihoods under the
multivariate normal:

liks <- dmnorm(as.matrix(Data), mean=as.numeric(Mean), varcov=Sigma)

Taking logs, and sum over all N cases:

logl <- sum(log(liks))
# -3737.745

Now, suppose you compute the same logl under the unrestricted model
(using the observed variance-covariance matrix, and observed mean
vector), you get:

Sigma.obs <- unclass(inspect(fit, "sampstat")$cov)
Mean.obs <- as.numeric(inspect(fit, "sampstat")$mean)

liks.h1 <- dmnorm(as.matrix(Data), mean=as.numeric(Mean.obs),
varcov=Sigma.obs)
logl.h1 <- sum(log(liks.h1))
# -3695.092

Now take the difference between logl and logl.h1, and multiply by (-2):

-2 * (logl - logl.h1)
# 85.30552

confirming that the chi-squared statistic is indeed the likelihood ratio
statistic.


>>## using Bollen's (1989) formula for the discrepancy function
>> psi <- inspect(fit, "coef")$psi
>> lambda <- inspect(fit, "coef")$lambda
>> theta <- inspect(fit, "coef")$theta
>>
>> SigmaHat <- (lambda %*% psi %*% t(lambda)) + theta
>> S <- cov(HolzingerSwineford1939[ , paste0("x", 1:9)])

If likelihood="normal" (which is the default in lavaan), you need to use

S <- unclass(inspect(fit, "sampstat")$cov)

This will give you 85.30552 exactly

(the difference is that the internal version of S is rescaled by (N-1)/N
to make sure the covariance elements are divided by N, and not by N-1.
The ML estimator assumes N...

Hope this helps.

Yves.

Terrence D. Jorgensen

unread,
Dec 5, 2012, 4:08:45 PM12/5/12
to lav...@googlegroups.com
Yves,

That is very clear, thank you!  The textbooks/articles I've read on model fit have not discussed these different methods, and I'm not sure whether to look in computing or statistical journal for more information.

Thanks again,

Terry



--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan?hl=en.



yrosseel

unread,
Dec 5, 2012, 4:38:29 PM12/5/12
to lav...@googlegroups.com
On 12/05/2012 10:08 PM, Terrence D. Jorgensen wrote:
> Yves,
>
> That is very clear, thank you! The textbooks/articles I've read on
> model fit have not discussed these different methods, and I'm not sure
> whether to look in computing or statistical journal for more information.

One day, lavaan will have technical documentation. And this will part of it.

Yves.

Reply all
Reply to author
Forward
0 new messages