Measurement invariance with unbalanced samples

133 views
Skip to first unread message

Andrej

unread,
Jul 24, 2019, 6:15:45 AM7/24/19
to lavaan
Hi,

I try to understand a procedure behind building a CFA model and testing its measurement invariance using lavaan, semTools, and simsem packages.

My dataset is composed of eight 5-point Likert variables. In addition, I have a categorical variable “sex” refering to gender (1 = male, 2 = female). Dataset is unbalanced according to the "sex" variable; I have 66 males and  297 females.

I found an interesting paper written by Yoon and Lai “Testing Factorial Invariance With Unbalanced Samples”. Authors proposed a simple subsampling procedure when groups are of different sizes.

However, let me first introduce my baseline model. According to the theory I proposed a one-factor model with covariances between two pairs of variables:

model <- 'f1 =~ ss1 + ss2 + ss3 + ss4 + ss5 + ss6 + ss7 + ss8
          ss2 ~~ ss4
          ss2 ~~ ss3'

Next, I prepared a list structure in which each component contains a balanced data according to the sex variable; in my case 66 + 66 = 132 cases. R code goes as follows:

# Age variable is in the 9-th column
group_col
<- 9
# Prepare 100 random partitions
nsam
<- 100
data_list
<- list()

id
<- dat[, group_col]
min_n
<- min(table(id))
max_n
<- max(table(id))
ids
<- split(seq_along(id), id)
for (i in seq_len(nsam)) {
  new_id
<- unlist(lapply(ids, sample, min_n))
  dat_sam
<- dat[new_id, ]
  data_list
[[i]] <- dat_sam
}

Finally, I used sim() function from simsem package to perform Monte Carlo simulation to test for measurement invariance:

out1 <- sim(nRep = NULL, model = baseline, rawData = data_list,
           
group = "sex", std.lv = TRUE, lavaanfun = "cfa")

where baseline object is defined as:

baseline <- measEq.syntax(
  configural
.model = model,      
  ordered
=  c("ss1","ss2","ss3","ss4","ss5","ss6","ss7", "ss8"),
  parameterization
= "delta",
  ID
.fac = "std.lv",
  ID
.cat = "Wu.Estabrook.2016",
 
group = "sex",
 
group.equal = "configural")

baseline
<- as.character(baseline)

Second model is defined as:

prop4 <- measEq.syntax(
  configural
.model = model,
  ordered
=  c("ss1","ss2","ss3","ss4","ss5","ss6","ss7", "ss8"),
  parameterization
= "delta",
  ID
.fac = "std.lv",
  ID
.cat = "Wu.Estabrook.2016",
 
group = "sex",
 
group.equal = c("thresholds"))

prop4
<- as.character(prop4)

out2
<- sim(nRep = NULL, model = prop4, rawData = data_list,
           
group = "sex", std.lv = TRUE, lavaanfun = "cfa")

And now an issue and a question. When I compared out1 and out2 summaries I get the same results:

> out1
RESULT OBJECT
[1] "lavaan"
Model Type: lavaan
Convergence 100 / 100
Sample size: 132
Percent Completely Missing at Random: 0
Percent Missing at Random: 0
========= Fit Indices Cutoffs ============
           
Alpha
Fit Indices     0.05     Mean     SD
      chisq  
51.944   38.691  8.298
      aic  
3330.531 3283.751 29.411
      bic  
3480.437 3433.657 29.411
      rmsea    
0.082    0.033  0.030
      cfi      
0.947    0.984  0.021
      tli      
0.917    0.986  0.045
      srmr    
0.063    0.054  0.006
NOTE
: The data generation model is not the same as the analysis model. See the summary of the population underlying data generation by the summaryPopulation function.

I kindly ask for any help, particularly for an explanation how to properly specify my model.

Thanks in advance.

Best, Andrej

Terrence Jorgensen

unread,
Aug 6, 2019, 11:33:23 AM8/6/19
to lavaan
This line doesn't do anything, just don't specify group.equal if there are no parameters to equate.
 
  group.equal = "configural")

I don't think there is any reason to use simsem.  Look into ?cfaList.  You can specify the data-generating function as the dataFunction= argument and save the fit stats/indices using the FUN= argument.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Reply all
Reply to author
Forward
0 new messages