WLSMV estimator for a likert scale but with lots of missing data--what to do?

206 views
Skip to first unread message

Anh Hua

unread,
Aug 9, 2019, 10:07:45 PM8/9/19
to lavaan

Hi all,


I have been trying to fit a cfa model on a likert scale with 5 categories with over 50 items and 8 latent factors using a sample size of over 28,000.   I used WLSMV as the estimator as this has been recommended for CFAs with ordinal data. 


 In my first preliminary round of CFAs, I used listwise deletion because that was the default method of handling missing data in lavaan.  After listwise deletion, my sample size dropped down by roughly 20%.  This is a substantial amount of missing data but ultimately, not a big deal as my sample size is still quite large.  However, I understand that listwise deletion is not ideal as it assumes that the missing data are MCAR (I conducted LittleMCAR's test in SPSS and unfortunately, this test came out statistically significant, and therefore, list wise deletion is not a viable approach). 


Unfortunately,  according to Timothy Brown, there’re issues with pairwise deletion too (e.g., if data are MAR, parameter estimates as well as standard errors are severely biased, etc. ).  


So I have two questions:  Is WLSMV still the best estimator given the nature of the missing data I have in my sample?  If not, is there another estimator that can handle ordinal data and missing data well?  If there’s no other estimator that can handle ordinal data like WLSMV, do people recommend data imputation as the next step?  If so, which imputation method? I really hope I don’t have to do data imputations but at the same time, I will do it if that’s the best way of handling it. 


Many thanks for those who have been helping me with my journey with CFAs.  I hope to continue getting valuable insights from you and the rest of lavaan community!


Anh


Alex Uzdavines

unread,
Aug 10, 2019, 6:38:20 PM8/10/19
to lavaan
Hi Anh!

I'd strongly recommend looking into the mice library. It's excellent for multiple imputation and is one of the options in the runMI() function for lavaan/semTools. You can learn the basics of mice (and multiple imputation/missing data analysis) from the vignettes, here: https://www.gerkovink.com/miceVignettes/

-Alex U

Anh Hua

unread,
Aug 11, 2019, 4:25:28 PM8/11/19
to lavaan
Hi Alex,

Thank you for your helpful suggestions.  Multiple imputation looks promising and less mysterious :), now that I have checked out the mice package. 

I wonder what you think about the assumptions of multiple imputations within the context of my data.  I read into the literature after reading your response and learned that this method of imputation is appropriate if the data are MCAR or MAR. In particular, Timothy Brown (2005) wrote:  "When either MCAR or MAR is true (and the data have a multivariate normal distribution, ML and multiple imputation produce parameter estimates, standard errors, and test statystics (e.g., model x^2) that are consistent and efficient (Confirmatory Factor Analysis for Applied Research, 2nd edition pg. 337). 

Because my data are ordinal, they do not have a mutlivarate normal distribution, so I wonder if multiple imputation is still the best approach?

Also, how should I efficiently test MCAR and MAR in R, especially when I have close to 60 variables in my datasets?  I tried to use LittleMCAR function within the BaylorEdPsych but it only allows analysis up to 50 variables (hence, I had to switch to SPSS).  Additionally, how do I test MAR in R?  In searching for this latter question, I came across another package called  "MissMech" but I believe that it only tests for MCAR.  

I'd appreciate any suggestions you may have.  Thank you so much!

Anh

Terrence Jorgensen

unread,
Aug 12, 2019, 7:40:13 AM8/12/19
to lavaan
Because my data are ordinal, they do not have a mutlivarate normal distribution, so I wonder if multiple imputation is still the best approach?

The whole point of mice is that you can select the appropriate imputation model for each variable (e.g., ordinal logistic regression for incomplete ordinal variables). 

how should I efficiently test MCAR and MAR in R, especially when I have close to 60 variables in my datasets?  

Don't.  Any "MCAR test" is deceptively labeled.  Rejecting the H0 of MCAR is good because you found variables that are related to missingness, which makes MAR more easily justified.  Failing to reject H0 of MCAR is consistent with both MCAR and MNAR.  That goes for Little's (1988) test (which I've heard him say is useless at a conference presentation) as well as newer developments like Li & Yu (2015), whose Discussion makes the same point.


Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Terrence Jorgensen

unread,
Aug 12, 2019, 7:46:50 AM8/12/19
to lavaan
Sorry, hit "post" too soon.  I was going to finish by saying that "testing" for M(C)AR is fraught with the problems of significance testing.  You don't want to miss out on an important opportunity to make your results less biased because you simply lack the power to detect a predictor of missingness as "significant".  Better to rely on theory, try to think of why data may have gone missing, and what you measured that would at least correlate with those causes.  The default agnostic advice is to use everything you observed as a potential predictor of missing values, so you don't miss anything.  That can get problematic when you have more variables than rows in your data, in which case you need to thoughtfully choose your imputation models (also something you can do in mice for each incomplete variable).

Alex Uzdavines

unread,
Aug 12, 2019, 9:32:55 AM8/12/19
to lavaan
To add to what Terrence said, my general read on the multiple imputation lit a couple years ago when I was still learning is that an agnostic or theoretically driven, even if imperfect, imputation model is probably going to lead to less biased estimates than listwise deletion.

With an N ~ 28000 you're not in danger of having more variables than rows, so doing a "throw everything in the hopper, run the imputation" approach would probably be okay. One of the imputation options in the mice package allows for setting a minimum correlation threshold between variables to include in an automated imputation model (defaults to .1) which can save some computation time. But all that said, it's still best to take a look from a theoretical perspective and see what would plausibly help explain the missingness.

Anh Hua

unread,
Aug 14, 2019, 1:18:03 PM8/14/19
to lavaan
Hi Terrence and Alex,

Your responses have been extremely helpful! Thank you so very much!  I have been looking into the definitions of MAR, MCAR, and MNAR in the past few days, and I've learned that it's impossible to test for MAR and MNAR directly, so the agnostic approach that Alex mentioned seems to make the most sense. But Terrence also made a good point about making an effort to understand why the data are missing.  I'd need to talk to the people who directly collect the data to gain some insights on why the data are missing. 

Alex, do you have a reference that states what you said, i..e, "even if imperfect, imputation model is probably going to lead to less biased estimates than listwise deletion?"  

Thank you again for all your helpful adivce!

Anh

Terrence Jorgensen

unread,
Aug 14, 2019, 4:12:33 PM8/14/19
to lavaan
do you have a reference that states what you said, i..e, "even if imperfect, imputation model is probably going to lead to less biased estimates than listwise deletion?"  

Try Craig Enders' (2010) book.  Imputation models are no different than substantive/target models.  They are never going to be absolutely correct, only approximations to the real data-generating (or missing-data) processes.  All missing data is probably some combination of MCAR, MAR, and MNAR.  Any imputation model can account for some predictability of missingness, but is unlikely to account for all of it.  Accounting for some part of this missingness process is better than accounting for none of it.

Anh Hua

unread,
Aug 14, 2019, 5:08:24 PM8/14/19
to lavaan
Thanks Terrence.  As usual, your advice is so helpful!

Anh

Alex Uzdavines

unread,
Aug 15, 2019, 8:31:00 AM8/15/19
to lavaan
Ya, like Terrence said that's the takeaway I got from looking into how to deal with complex missing data issues for my MA project.

Anh Hua

unread,
Aug 15, 2019, 1:38:32 PM8/15/19
to lavaan
Sounds good, Alex!  Thanks again for sharing your perspectives on this. 

Anh
Reply all
Reply to author
Forward
0 new messages