Hi,
I try to understand a procedure behind building a CFA model and testing its measurement invariance using
lavaan,
semTools, and
simsem packages.
My dataset is composed of eight 5-point Likert variables. In addition, I have a categorical variable “sex” refering to gender (1 = male, 2 = female). Dataset is unbalanced according to the "sex" variable; I have 66 males and 297 females.
I found an interesting paper written by Yoon and Lai “Testing Factorial Invariance With Unbalanced Samples”. Authors proposed a simple subsampling procedure when groups are of different sizes.
However, let me first introduce my baseline model. According to the theory I proposed a one-factor model with covariances between two pairs of variables:
model <- 'f1 =~ ss1 + ss2 + ss3 + ss4 + ss5 + ss6 + ss7 + ss8
ss2 ~~ ss4
ss2 ~~ ss3'
Next, I prepared a list structure in which each component contains a balanced data according to the sex variable; in my case 66 + 66 = 132 cases. R code goes as follows:
# Age variable is in the 9-th column
group_col <- 9
# Prepare 100 random partitions
nsam <- 100
data_list <- list()
id <- dat[, group_col]
min_n <- min(table(id))
max_n <- max(table(id))
ids <- split(seq_along(id), id)
for (i in seq_len(nsam)) {
new_id <- unlist(lapply(ids, sample, min_n))
dat_sam <- dat[new_id, ]
data_list[[i]] <- dat_sam
}
Finally, I used sim()
function from simsem package to perform Monte Carlo simulation to test for measurement invariance:
out1 <- sim(nRep = NULL, model = baseline, rawData = data_list,
group = "sex", std.lv = TRUE, lavaanfun = "cfa")
where baseline object is defined as:
baseline <- measEq.syntax(
configural.model = model,
ordered = c("ss1","ss2","ss3","ss4","ss5","ss6","ss7", "ss8"),
parameterization = "delta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "sex",
group.equal = "configural")
baseline <- as.character(baseline)
Second model is defined as:
prop4 <- measEq.syntax(
configural.model = model,
ordered = c("ss1","ss2","ss3","ss4","ss5","ss6","ss7", "ss8"),
parameterization = "delta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "sex",
group.equal = c("thresholds"))
prop4 <- as.character(prop4)
out2 <- sim(nRep = NULL, model = prop4, rawData = data_list,
group = "sex", std.lv = TRUE, lavaanfun = "cfa")
And now an issue and a question. When I compared
out1 and
out2 summaries I get the same results:
> out1
RESULT OBJECT
[1] "lavaan"
Model Type: lavaan
Convergence 100 / 100
Sample size: 132
Percent Completely Missing at Random: 0
Percent Missing at Random: 0
========= Fit Indices Cutoffs ============
Alpha
Fit Indices 0.05 Mean SD
chisq 51.944 38.691 8.298
aic 3330.531 3283.751 29.411
bic 3480.437 3433.657 29.411
rmsea 0.082 0.033 0.030
cfi 0.947 0.984 0.021
tli 0.917 0.986 0.045
srmr 0.063 0.054 0.006
NOTE: The data generation model is not the same as the analysis model. See the summary of the population underlying data generation by the summaryPopulation function.
I kindly ask for any help, particularly for an explanation how to properly specify my model.
Thanks in advance.
Best, Andrej