Measurement invariance for ordered categorical data (Likert) and missing item responses

75 views
Skip to first unread message

christophe...@nicholls.edu

unread,
Jun 27, 2018, 2:54:45 PM6/27/18
to lavaan
Hi folks,

I’m currently revisiting some prior analytical work in light of decisions I made in the past (e.g., dealing with missing data, examining measurement invariance). I’m basically examining whether psychological tests that have been administered under different methodological conditions (e.g., randomized items vs. non-randomized items) satisfy constraints for invariance. The datasets are small (N ~ 150) and contain missing data, so I'm trying to make every captured datapoint count.

I'm not entirely sure were I should begin. The various semTools functions, specifically the measurementInvarianceCat() and cfa.mi() functions, seem to be helpful but I have not gotten very far. While the former can deal with missing data using pairwise deletion, I would much prefer to impute missing data and then analyze that dataset. The latter does not seem to test measurement invariance. 

Additionally, the data that I’ve collected are highly skewed. In fact, when I fit a model to my data using DWLS estimation, lavaan notifies me that certain categories of variables are empty in a certain group. Often, it is the case that these are response categories toward the low end of the scale (e.g., "strongly disagree" and "disagree"). I’ve decided to collapse responses for low base-rate response categories (e.g., “strongly disagree” and “strongly agree”), which addresses the error. Admittedly, I’m unsure of exactly what this implies for my model or if there are better ways of handling such skewed data.

What advice would you have to give on this? I’ve been working at this for several hours now and am at a loss as to where to begin.

Thanks in advance, 

Chris

Terrence Jorgensen

unread,
Jun 29, 2018, 11:15:04 AM6/29/18
to lavaan
the measurementInvarianceCat() and cfa.mi() functions, seem to be helpful but I have not gotten very far. While the former can deal with missing data using pairwise deletion, I would much prefer to impute missing data and then analyze that dataset. The latter does not seem to test measurement invariance. 

measurementInvarianceCat() simply automates the fitting of measurement-invariance models (using a very trouble-prone identification method that I am working on getting rid of).  If you use it to specify your set of models, you can extract the parameter tables to use in place of syntax when you fit each model with cfa.mi().  

mi <- measurementInvarianceCat(model = <your syntax>, data = <original missing data>, ...)
## extract parameter tables
PT
.config <- mi$fit.configural
PT
.loadings <- mi$fit.loadings
PT
.thresholds <- mi$fit.thresholds
## fit models to imputations
fit
.configural <- cfa.mi(model = PT.config, data = <list of imputed data sets>, ...)
fit
.loadings <- cfa.mi(model = PT.loadings, data = <list of imputed data sets>, ...)
fit
.thresholds <- cfa.mi(model = PT.thresholds, data = <list of imputed data sets>, ...)


I’ve decided to collapse responses for low base-rate response categories (e.g., “strongly disagree” and “strongly agree”), which addresses the error. Admittedly, I’m unsure of exactly what this implies for my model or if there are better ways of handling such skewed data.

Yes, that is pretty much the only thing you can do about that, unless you can gather more data.  Without enough information to estimate the threshold between those categories in one group, you can't test whether that parameter is equivalent in the other group, so you just have to assume it.  There's nothing inherently wrong with collapsing categories, it just ignores some individual differences, but you don't have enough people responding to each category to draw inferences about those individual differences anyway.  (Actually, your sample is so small that you can't really trust DWLS results much anyway -- they don't seem to stabilize until N ~ 500 in most simulation studies I have read, even under ideal conditions.)  The thresholds you can estimate still have the same interpretation after collapsing those 2 categories.  The highest category in the data you model, then, would now be "agree or strongly agree", and the threshold tells you the level of the latent item response at which people respond at least as high as "agree", but no more detail about when they not only agree but strongly agree.  

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

christophe...@nicholls.edu

unread,
Jun 30, 2018, 10:03:17 AM6/30/18
to lavaan
Thanks Terrence. This is helpful. 

As for the sample size issue, I agree that my sample size is rather small. I do have more data on these scales (there were multiple studies), but I'm not at that point in my analysis as of yet. 
Reply all
Reply to author
Forward
0 new messages