Although I keep the imputed datasets the same, I get different model fit indices and test statistics every time I run the cfa.mi models with imputed categorical Data.

198 views
Skip to first unread message

Enes Bayrakoglu

unread,
Feb 14, 2024, 3:25:54 PMFeb 14
to lavaan
Hello everyone,
 
I am trying to test the Measurement Invariance in my one-latent-factor model. I have 24 questions coded "correct" and "incorrect" and 3 different groups in my data (Sample sizes are approximately 200 for each group). Since there is some missing data, as far as I know, I can't run the CFA models with the WLSMV estimator. So, I first imputed the data with Amelia package and then tried to run the cfa.mi function in the semTools package as stated in the runMI function description.
My problem is that every time I run the configural or metric model, fit indices (CFI, TLI, RMSEA) and the model test statistics yield very different results (I keep the imputed datasets the same).

Here's the code I am running:
###############################################################
data_mck <- MCK_DATA_
data_mck <- data_mck[,1:32] 
set.seed(12345)
Amelia.imputation <- amelia(data_mck[,-c(14,15,16,17,18)],  m = 20, idvars = "ID",  ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)
imps_mck <- Amelia.imputation$imputations  
## here I had 20 imputed datasets with 24 binary questions ###
for (i in 1: 20) { imps_mck[[i]]$COUNTRY <- factor(imps_mck[[i]]$COUNTRY) }  
### There are 3 different groups (countries) in the dataset ###
model_mck <- 'MCK =~ Q1+Q2A+Q3A+Q3B+Q3C+Q5+Q6+Q8A+Q8B+Q8C+Q10+Q12B+Q13A+Q13B+Q13C+Q17A+Q18B+Q19A+Q19B+Q19C+Q19D+Q20A+Q22+Q23'
cfa.configural <- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE)
summary(cfa.configural, fit.measures = TRUE, standardized = TRUE)

cfa.metric <- cfa.mi(model_mck, data =  imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE,
group.equal = "loadings")
summary(cfa.metric.amelia_mck_wout_Q9, fit.measures = TRUE, standardized = TRUE)

cfa.scalar<- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE,
group.equal = c("loadings","intercepts"))
summary(cfa.scalar, fit.measures = TRUE, standardized = TRUE)

### All models give very different and inconsistent fit indices and test statistics when I delete them from the R environment and re-run them ###

lavTestLRT.mi(cfa.metric, h1=cfa.configural)
###############################################################

Is there anything I am doing wrong with the codes? I couldn't find any information in the group chats and other platforms. It would be very appreciated if someone could take a look and help. 
Many thanks in advance. I wish everyone a productive and beautiful day.
Best Wishes, 
Enes.   

Enes Bayrakoglu

unread,
Feb 16, 2024, 11:04:00 AMFeb 16
to lavaan
Might it be because of the fact that my model is too wide? I mean, too many questions exist for such small sample sizes. 
Might it be because I did something wrong with the imputation method? 
Or is it something like a program bug? 
When I run the CFA models with 20 imputed datasets, sometimes the model converges on all 20 imputed data sets, and sometimes it doesn't converge for every imputed data set.
I also tried to run the model with fewer imputed datasets (m=5 or m=10), but it keeps giving inconsistent model fit results. I can't figure this problem out even if I tried so many different ways. 
Could someone please assist me?

14 Şubat 2024 Çarşamba tarihinde saat 21:25:54 UTC+1 itibarıyla Enes Bayrakoglu şunları yazdı:

Gavin T. L. Brown

unread,
Feb 16, 2024, 5:13:54 PMFeb 16
to lav...@googlegroups.com
Enes
but getting different results seems logical to me. The formulae are probabilistic. It's the logic of doing multiple iterations to see what the central tendency and spread of the values are. That seems like the logic of simulation research, does it not. When N is small, the standard errors will be large so the resulting output will vary. I don't think your result is an error.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/0672c4d8-5b58-4f52-847c-f0cda8829938n%40googlegroups.com.

Enes Bayrakoglu

unread,
Feb 16, 2024, 6:27:59 PMFeb 16
to lavaan
Dear Prof. Brown, 

First of all, thank you for taking the time to respond. I understand that it's a simulation research and the reason for iterations is to minimise the standard errors. However, this much variation in the results makes me unconfident about running the models and presenting the results. For example, when I run 2 identical configural models with the same imputed datasets at different times, one model gives test statistics of 1475.530 with an 1131 df, and 0.558 CFI, 0.524 TLI, 0.039 RMSEA fit statistics whereas the other model gives the test statistics of 1279.201 with 1131 df, and 0.906 CFI, 0.899 TLI, 0.026 RMSEA fit statistics. Is this much variation normal?
On this point, should I interpret these results as cfa models are not reliable when used with my data, thus it is not convenient to use it to test the measurement invariance?
Or should I just pick the best-fit model and report its results in my thesis?
My overall aim, by the way, is to try to establish a partial invariance and compare groups' mathematical content knowledge with this partial invariant model (some items will be constrained and some items will be freely estimated).

Thanks a lot again for your time and precious comments. It helps me a lot.
Yours sincerely,
Enes.

17 Şubat 2024 Cumartesi tarihinde saat 01:13:54 UTC+3 itibarıyla Prof. Gavin Brown şunları yazdı:

Gavin T. L. Brown

unread,
Feb 17, 2024, 3:22:15 PMFeb 17
to lav...@googlegroups.com
dear Enes
the difference in x2 of 200 points seems like a lot. But again I remember you said n=200 so this might not be too much given the size of the standard errors. Search out what other people have found in simulation studies with n of that size.
It seems that RMSEA and CFI are sensitive to model specification, model complexity, and sample sizes. So the problem could be related to those being unstable estimators. see
Fan, X., & Sivo, S. A. (2007). Sensitivity of fit indices to model misspecification and model types. Multivariate Behavioral Research, 42(3), 509–529. https://doi.org/10.1080/00273170701382864  
i would calculate the M and SD of the parameters across the multiple runs, to see where the central tendency lies. If the 95%CI does not include positive values then maybe the problem is N or it could be your model or it could be you need to run 1000 iterations. 
You can't reach any conclusions based on the 2 data points you report.
best wishes  

Shu Fai Cheung (張樹輝)

unread,
Feb 17, 2024, 8:22:23 PMFeb 17
to lavaan
I am interested in the programming side of this problem but I am not sure if I understood the scenario correctly.

You have a dataset with missing data. So, you did multiple imputation using Amelia and then use cfa.mi(). But you found the model fit statistics change every time you ran the code. Do I understand the case correctly?

This is strange because you used set.seed(). The imputation, though "random," should still be reproducible. I simulated a dataset (though see my comment on COUNTRY, which I do not know how it was stored in the original data) and ran a modified version of your code:

###############################################################
# Based on the code from https://groups.google.com/g/lavaan/c/KyPkZUM-NnQ/m/NSvYEBlJAAAJ
# My comments start with 'SF: '
# data_mck <- MCK_DATA_
# data_mck <- data_mck[,1:32]
# SF: Create simulated data
vnames <- c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23")
p <- length(vnames)
set.seed(12345)
rho <- matrix(.6, p, p)
diag(rho) <- 1
data_mck <- MASS::mvrnorm(600, rep(0, p), rho)
data_mck[data_mck > 0] <- 1
data_mck[data_mck <= 0] <- 0
data_mck[sample(length(data_mck), round(length(data_mck) * .05))] <- NA
data_mck <- as.data.frame(data_mck)
colnames(data_mck) <- vnames
data_mck <- cbind(ID = seq_len(nrow(data_mck)),
                  data_mck)
# SF: Not sure how COUNTRY was stored in the original data
data_mck$COUNTRY <- sample(c(1, 2, 3), nrow(data_mck), replace = TRUE)
head(data_mck)
library(Amelia)
library(semTools)
library(lavaan)
# Amelia.imputation <- amelia(data_mck[,-c(14,15,16,17,18)],  m = 20, idvars = "ID",  ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)
set.seed(12345)
Amelia.imputation <- amelia(data_mck,  m = 20, idvars = "ID",  ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)

imps_mck <- Amelia.imputation$imputations
## here I had 20 imputed datasets with 24 binary questions ###
for (i in 1: 20) { imps_mck[[i]]$COUNTRY <- factor(imps_mck[[i]]$COUNTRY) }
### There are 3 different groups (countries) in the dataset ###
model_mck <- 'MCK =~ Q1+Q2A+Q3A+Q3B+Q3C+Q5+Q6+Q8A+Q8B+Q8C+Q10+Q12B+Q13A+Q13B+Q13C+Q17A+Q18B+Q19A+Q19B+Q19C+Q19D+Q20A+Q22+Q23'
cfa.configural <- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE)
# SF: There is an error but it may not be related to the estimation.

summary(cfa.configural, fit.measures = TRUE, standardized = TRUE)
###############################################################

I ran it several times, each time in a fresh session of R, and the results are the same, as expected due to the use of set.seed():

#> Model Test User Model:
#>
#>                                               Standard      Scaled
#>   Test statistic                               423.524     754.047
#>   Degrees of freedom                               756         756
#>   P-value                                        1.000       0.513
#>   Scaling correction factor                                  1.015


May you share more about your dataset such that we can accurately simulate the format (not the data) of the dataset you are using, so we can see why the results changed every time you run the code?

If I misunderstood your scenario, sorry about that.

-- Shu Fai

P.S.: The test statistics you got "may" be different from mine even if you run the same code in your compute because it may depend on the random number generator. Butt the results should still be reproducible when run several times on the same machine.

Enes Bayrakoglu

unread,
Feb 19, 2024, 3:29:25 PMFeb 19
to lavaan
Dear Prof. Brown,

Thanks a lot again for your comments. Apparently, the relatively small sample size causes a big variation in the test statistics. It is still not very understandable to me that the program gives varying test statistics in every run. Your advice is very valuable; however, I am not that experienced in applying that methodologically. I tried to set the number of iterations to 1000, as Yves once described in a Lavaan group discussion (https://groups.google.com/g/lavaan/c/3zCqlADa0k0).
I used the code # control=list(iter.max=1000) #. It doesn't make a difference, though; it gives the same results as the model without the specified iteration. As far as I understood from the discussion, the default for maximum iterations used in CFA model estimations is already more than 1000.  Is there any other way to set the number of iterations?

Thank you again for your help.
Yours sincerely,
Enes.

18 Şubat 2024 Pazar tarihinde saat 02:22:23 UTC+1 itibarıyla shufai...@gmail.com şunları yazdı:

Enes Bayrakoglu

unread,
Feb 19, 2024, 3:48:30 PMFeb 19
to lavaan
Dear Prof. Cheung, 

Thank you for your comment and your interest. That's correct; the test statistics and model fit indices change significantly each time I re-run the models. That is actually the problem. Even if I could find some positive results,  I couldn't report them since they were inconsistent. I mean, I expect some variation, but this much variation is too much to report the results confidently. Prof. Brown also stated that I can't reach any conclusions with these results.
About the sample:
There are 3 groups. The sample sizes are 144, 291, and 165, respectively. The missing data is in the last group. As stated above, the data is binary, with around 24 independent variables. I can also share the data with you. 

I appreciate you for your time and effort to help.
Yours sincerely, 
Enes. 
19 Şubat 2024 Pazartesi tarihinde saat 21:29:25 UTC+1 itibarıyla Enes Bayrakoglu şunları yazdı:

Shu Fai Cheung (張樹輝)

unread,
Feb 20, 2024, 9:08:09 AMFeb 20
to lavaan
Thanks for the clarification. This is strange because, with set.seed() used, the imputation results are supposed to be reproducible. For cfa.mi(), same data, same results. There should be no random process in the optimization in lavaan.

When you run the code, these two lines should always be run together:

set.seed(12345)
Amelia.imputation <- amelia(data_mck[,-c(14,15,16,17,18)],  m = 20, idvars = "ID",  ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)

Before you run cfa.mi(), you can check the imputed datasets to make sure they are indeed the same dataset. E.g., this checks the first imputed dataset:

summary(imps_mck[[1]])

You did mention that you kept the imputed datasets the same but it does no harm to check again before running cfa.mi(), just in case.

If imps_mck is indeed the same before running cfa.mi(), then, as far as I understand, for the same computer, you should see the same results.

If you confirmed that, right before running cfa.mi(), imps_mck is the same, but then the cfa.mi() results are different, then something's wrong.

-- Shu Fai

Enes Bayrakoglu

unread,
Mar 3, 2024, 3:38:43 PMMar 3
to lavaan
Dear Prof. Cheung, 

Thanks a lot again for your time and effort to help. You are completely right, I made a mistake while using the set.seed function. It is still quite strange that the model gives such difficult results every time I run although I don't change any settings for the imputations. As Prof. Brown said the sample size is small and the standard errors are large. Probably this explains the problem. 

Thanks though, 
Have a productive day. 
Yours sincerely, 
Enes.  

20 Şubat 2024 Salı tarihinde saat 15:08:09 UTC+1 itibarıyla shufai...@gmail.com şunları yazdı:

Terrence Jorgensen

unread,
Jun 24, 2024, 10:51:16 AM (9 days ago) Jun 24
to lavaan
Can you try this again after installing the new lavaan.mi package (along with the latest lavaan and semTools from GitHub)?
 
remotes::install_github("yrosseel/lavaan") 
remotes::install_github("TDJorgensen/lavaan.mi") 
remotes::install_github("simsem/semTools/semTools")

I deprecated runMI() functionality from semTools, and the lavaan.mi package is now updated with many bug-fixes and new features. Read the README and NEWS files closely before proceeding, because small changes have been made to the user interface (e.g., some argument names).

https://github.com/TDJorgensen/lavaan.mi

https://github.com/TDJorgensen/lavaan.mi/blob/main/NEWS.md

I hope to send this to CRAN by the end of the month or early July, and would appreciate the opportunity to debug these software updates before sending to CRAN.

Thanks,

Terrence D. Jorgensen    (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen
Reply all
Reply to author
Forward
0 new messages