Different distributions, missing data, and small sample size

51 views
Skip to first unread message

CSM

unread,
Jan 26, 2019, 2:41:55 PM1/26/19
to lavaan
Dear all,
I have a simple question about the arguments to be included in the 'sem' function. The question is that I have some variables with very different distributions. In particular, some of them are clearly nonnormal. There are also some missings.

As far as I understood from 

it is reasonable to include
missing="fiml" , estimator="MLR" 
in the fit code, in order to obtain robust standard errors.

My question is: taking into account the distributional differences and the small sample size (115), do you recommend further arguments in the fit code (test, se,...), or even an additional bootstrap / or a Bayesian comparison?

Kind regards,
csm

Terrence Jorgensen

unread,
Jan 27, 2019, 6:53:13 AM1/27/19
to lavaan
taking into account the distributional differences and the small sample size (115), do you recommend further arguments in the fit code (test, se,...),

Setting estimator = "MLR" is a shortcut for setting 3 options:

estimator = "ml"
test
= "Yuan.Bentler.Mplus"
se
= "robust.huber.white"

See the ?lavOptions help page for details.

or even an additional bootstrap / or a Bayesian comparison?

N is pretty small, so the asymptotic properties might not kick in, but your Type I error rates should be only slightly inflated.  Bootstrapping treats the sample as though it were the population, so it only works as well as the sample resembles the population.  In small samples, differences between sample and population (sampling error) is greater, so in practice, bootstrapping kind of relies on an asymptotic assumption too.  Bayesian does not rely on asymptotic/large-N theory, and incorporating any available prior information could make your estimates more efficient.  But accounting for the nonnormality in a Bayesian model will require careful thought about specifying a proper distribution for your data/errors.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

CSM

unread,
Jan 28, 2019, 5:54:29 AM1/28/19
to lavaan
Dear Professor Terrence Jorgensen,

Thank you very much for your clear and complete answer. I just would like to know if you could provide me a reference supporting your first statement:

your Type I error rates should be only slightly inflated.


Also, could this fact justify the importance of reporting marginally significant results, in this particular study?

Kind regards, csm


---

Terrence Jorgensen:

Terrence Jorgensen

unread,
Jan 29, 2019, 7:52:01 AM1/29/19
to lavaan

I just would like to know if you could provide me a reference supporting your first statement:

your Type I error rates should be only slightly inflated.


As always, a more accurate answer would have been "it depends".  Here is some reading about the standard SB correction (MLM with complete data)



Actually, it seems MLR has deflated errors at small N:


Also, could this fact justify the importance of reporting marginally significant results, in this particular study?


If errors are inflated, you would need to use a smaller alpha to compensate.  But it seems MLR might have deflated errors (I only found one study, which might not have simulated data like you have).  

This is a general issue, not a lavaan issue, so you could post on SEMNET for more guidance.

Célia Sofia Moreira

unread,
Jan 29, 2019, 9:31:57 AM1/29/19
to lav...@googlegroups.com
Professor Terrence Jorgensen,
Thank you very much for your valuable guidance.
Thank you very much for being always there.
Kind regards,
csm
Reply all
Reply to author
Forward
0 new messages