Dear all,
First of all, I'd like to thank Yves Rosseel and Terrence Jorgensen for providing the lavaan package and for maintaining this community here. I have been reading a lot of the postings in this google group and they have been tremendously helpful.
I know there have been a ton of postings about measurement invariance with categorical variables on here. Unfortunately, being rather a novice at CFA, I am struggling to make sure that I understand all the recommendations correctly. I would immensely appreciate if someone could help me out and confirm that my model specifications make sense.
So, I am using a dataset with several dozen countries and several tens of thousands of observations. I had the following simple model (Y1-4 are categorical variables with three categories):
model <- 'F1 =~ Y1 + Y2 + Y3 + Y4'
fit<- cfa(model, data=data, ordered = c("Y1", "Y2", "Y3", "Y4"),
std.lv = T, parameterization = "delta", estimator = "WLSMV")
The overall fit of this model was not very good, especially the robust RMSEA was 0.160 (I understand from this conversation -
https://groups.google.com/g/lavaan/c/pfVem3X_N9A/m/INnOcZWSAgAJ - that the robust RMSEA is the only relevant one for WLSMV). When I tried to test for (configural) measurement invariance, I ran this model in each country separately and found the fit measures to be even worse in quite a lot of countries.
Therefore, I looked at the modification indices and found that several were quite high. Y3 ~~ Y4 was the highest (very large actually, but I assumed this is not worrisome because my n is very large). I know there is some discussion on whether it is okay to respecify the model just on the basic of the modification indices - but since the two variables are measured on the same scale (they are both "agree"/"hard to say"/"disagree"), I thought I could free this parameter and argue that the two variables have a similar type of measurement error. Here are my two first questions:
1. Is this a sound way of proceeding? Especially since
2. Y1 and Y2 also have the same scale as Y3 and Y4? Should I therefore also free Y1 ~~ Y2?
Sticking with Y3 ~~ Y4 for the moment, my next question is:
3. Do I understand correctly that I cannot include this while using delta parameterization and that I need to switch to theta parameterization? Hence my code would be:
model <- 'F1 =~ Y1 + Y2 + Y3 + Y4
Y3 ~~ Y4'
fit<- cfa(model, data=data, ordered = c("Y1", "Y2", "Y3", "Y4"),
std.lv = T, parameterization = "theta", estimator = "WLSMV")
Is this correct?
As I move on to test measurement invariance, I had wanted to use the recommendations given by Wu & Estabrook (2016) using the paper by Svetina et al. (Svetina, D., Rutkowski, L., & Rutkowski, D. (2020). Multiple-Group Invariance with Categorical Outcomes Using Updated Guidelines: An Illustration Using M plus and the lavaan/semTools Packages. Structural Equation Modeling: A Multidisciplinary Journal, 27(1), 111–130.
https://doi.org/10.1080/10705511.2019.1602776). However,
4. Am I correct in assuming that this is no longer possible given that I use theta parameterization? Do I need to I go with the Millsap & Tein (2004) procedure?
Any advice would be greatly appreciated! Thank you so much for your time.