I am interested in the programming side of this problem but I am not sure if I understood the scenario correctly.
You have a dataset with missing data. So, you did multiple imputation using Amelia and then use cfa.mi(). But you found the model fit statistics change every time you ran the code. Do I understand the case correctly?
This is strange because you used set.seed(). The imputation, though "random," should still be reproducible. I simulated a dataset (though see my comment on COUNTRY, which I do not know how it was stored in the original data) and ran a modified version of your code:
# My comments start with 'SF: '
# data_mck <- MCK_DATA_
# data_mck <- data_mck[,1:32]
# SF: Create simulated data
vnames <- c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23")
p <- length(vnames)
set.seed(12345)
rho <- matrix(.6, p, p)
diag(rho) <- 1
data_mck <- MASS::mvrnorm(600, rep(0, p), rho)
data_mck[data_mck > 0] <- 1
data_mck[data_mck <= 0] <- 0
data_mck[sample(length(data_mck), round(length(data_mck) * .05))] <- NA
data_mck <- as.data.frame(data_mck)
colnames(data_mck) <- vnames
data_mck <- cbind(ID = seq_len(nrow(data_mck)),
data_mck)
# SF: Not sure how COUNTRY was stored in the original data
data_mck$COUNTRY <- sample(c(1, 2, 3), nrow(data_mck), replace = TRUE)
head(data_mck)
library(Amelia)
library(semTools)
library(lavaan)
# Amelia.imputation <- amelia(data_mck[,-c(14,15,16,17,18)], m = 20, idvars = "ID", ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)
set.seed(12345)
Amelia.imputation <- amelia(data_mck, m = 20, idvars = "ID", ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)
imps_mck <- Amelia.imputation$imputations
## here I had 20 imputed datasets with 24 binary questions ###
for (i in 1: 20) { imps_mck[[i]]$COUNTRY <- factor(imps_mck[[i]]$COUNTRY) }
### There are 3 different groups (countries) in the dataset ###
model_mck <- 'MCK =~ Q1+Q2A+Q3A+Q3B+Q3C+Q5+Q6+Q8A+Q8B+Q8C+Q10+Q12B+Q13A+Q13B+Q13C+Q17A+Q18B+Q19A+Q19B+Q19C+Q19D+Q20A+Q22+Q23'
cfa.configural <- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE)
# SF: There is an error but it may not be related to the estimation.
summary(cfa.configural, fit.measures = TRUE, standardized = TRUE)
###############################################################
I ran it several times, each time in a fresh session of R, and the results are the same, as expected due to the use of set.seed():
#> Model Test User Model:
#>
#> Standard Scaled
#> Test statistic 423.524 754.047
#> Degrees of freedom 756 756
#> P-value 1.000 0.513
#> Scaling correction factor 1.015
May you share more about your dataset such that we can accurately simulate the format (not the data) of the dataset you are using, so we can see why the results changed every time you run the code?
If I misunderstood your scenario, sorry about that.
-- Shu Fai
P.S.: The test statistics you got "may" be different from mine even if you run the same code in your compute because it may depend on the random number generator. Butt the results should still be reproducible when run several times on the same machine.