logLik() and loglik.casewise by lavInspect() under missing = "fiml.x" and fixed.x = TRUE

47 views
Skip to first unread message

Shu Fai Cheung

unread,
Dec 16, 2021, 5:06:59 AM12/16/21
to lavaan
I noticed that, when missing = "fiml.x" and fixed.x = TRUE (the default), the sum of casewise log likelihood is not equal to the log likelihood of the final solution if there are cases with missing data on x:

library(lavaan)
#> This is lavaan 0.6-9
#> lavaan is FREE software! Please report any bugs.
mod <-
"
x1 ~ x4 + x5
x7 ~ x1
"
dat <- HolzingerSwineford1939
dat[1:10, "x1"] <- NA
dat[11:20, "x5"] <- NA
dat[21:30, "x7"] <- NA
head(dat[, c("x1", "x7", "x4", "x5")], 30)
#>          x1       x7       x4   x5
#> 1        NA 3.391304 2.333333 5.75
#> 2        NA 3.782609 1.666667 3.00
#> 3        NA 3.260870 1.000000 1.75
#> 4        NA 3.000000 2.666667 4.50
#> 5        NA 3.695652 2.666667 4.00
#> 6        NA 4.347826 1.000000 3.00
#> 7        NA 4.695652 3.333333 6.00
#> 8        NA 3.391304 3.666667 4.25
#> 9        NA 4.521739 2.666667 5.75
#> 10       NA 4.130435 2.666667 5.00
#> 11 3.666667 3.739130 2.000000   NA
#> 12 5.833333 3.695652 2.666667   NA
#> 13 5.666667 5.869565 2.666667   NA
#> 14 6.000000 5.130435 4.666667   NA
#> 15 5.833333 4.000000 5.000000   NA
#> 16 4.666667 4.086957 2.666667   NA
#> 17 4.333333 3.695652 2.000000   NA
#> 18 5.000000 4.000000 2.000000   NA
#> 19 5.666667 3.913044 4.333333   NA
#> 20 6.333333 3.478261 3.666667   NA
#> 21 5.833333       NA 1.666667 2.50
#> 22 6.666667       NA 2.000000 3.25
#> 23 5.000000       NA 3.333333 5.75
#> 24 3.833333       NA 2.666667 3.00
#> 25 5.666667       NA 2.333333 3.75
#> 26 5.333333       NA 1.666667 3.50
#> 27 5.500000       NA 2.666667 2.25
#> 28 6.000000       NA 1.666667 3.00
#> 29 4.666667       NA 2.000000 3.00
#> 30 5.000000       NA 2.666667 3.25
fit_fixed_x <- sem(mod, data = dat, missing = "fiml.x", fixed.x = TRUE)
loglik_i <- lavInspect(fit_fixed_x, "loglik.casewise")
sum(loglik_i)
#> [1] -871.7387
logLik(fit_fixed_x)
#> 'log Lik.' -858.8469 (df=7)
Created on 2021-12-16 by the reprex package (v2.0.1)

I believe the log likelihood given by logLik() is correct. Does this mean that the casewise log likelihood given by lavInspect() should not be used when  missing = "fiml.x" and fixed.x = TRUE and there are cases with missing data on x? Or is this discrepancy normal in this case?

-- Shu Fai

Ed Merkle

unread,
Jan 6, 2022, 9:54:24 AM1/6/22
to lavaan
Shu Fai,

I am thinking there should not be a discrepancy here but will have to look at it some more. I am thinking that logLik() uses some sufficient statistic of x variables, which does not immediately translate to cases with missing x.

You might already know, but if you use missing = "ml" (which removes cases with missing x) or if you use fixed.x = FALSE, then everything matches.

Ed

Ed Merkle

unread,
Jan 6, 2022, 2:55:54 PM1/6/22
to lavaan
Just to follow up, I looked at this a bit more, and I found that logLik() uses a covariance matrix of the x variables which seems not to be sufficient for fixed.x = TRUE. So I think that the sum of casewise log-likelihoods is the correct value (the -871.7387), as opposed to the value from logLik(). I will file an issue about it on github.

Ed

Shu Fai Cheung

unread,
Jan 6, 2022, 9:32:45 PM1/6/22
to lavaan
Thanks a lot! I also checked those functions yesterday out of curiosity and also found that my assumption might be wrong. If I read the code correctly, the call to lav_mvnorm_h1_loglik_samplestats() in lav_mvnorm_missing_loglik_samplestats() (used by logLik()) does not take into account the missing patterns in x variables, while lav_mvnorm_missing_llik_casewise() (used by lavInspect()) does. This may be the source of the discrepancy.

I will wait for the comments at GitHub as I am not familiar with the code of lavaan. Thanks for following up this issue.

-- Shu Fai

Terrence Jorgensen

unread,
Jan 10, 2022, 5:04:50 PM1/10/22
to lavaan
For anyone interested in updates, the GitHub issue is here: https://github.com/yrosseel/lavaan/issues/226

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Reply all
Reply to author
Forward
0 new messages