Monte Carlo Simulation for optimal sample size (CFA)

374 views
Skip to first unread message

Vivi

unread,
May 10, 2023, 10:36:49 AM5/10/23
to lavaan

Hello everybody!

Our research team is currently working on coding a Monte Carlo Simulation to determine the optimal sample size for a CFA with a power of .80 in R.
We specified a theoretical population model, as well as an analysis model with the empirical factor loadings extracted from a prior EFA. We also accounted for the non- normality of the data by including the empirical kurtosis and skewness values of each indicator to the analysis model.
Based on the analysis model, we ran sequential simulations for different sample sizes (n= 100 to 1000). The output shows the requested sample size to estimate the respective parameters with a power of .80. Our problem lies in the output for the residuals, which shows “NA”, implying, that no sample size from n= 100 to n= 1000 is deemed as sufficient to result in reliable estimates for the parameters. This doesn´t sound realistic to us.
Our question is whether the values for the residuals are actually important for determining an appropriate sample size for CFA. We couldn´t find any sources, which specified residuals in the analysis model, but merely specified factor loadings and covariances between the factors, which we did.

Thanks for every helpful input!

Our R code:

install.packages("lavaan")
library(lavaan)

# specify population model
pop.model <-"
F1 =~ 0.62*x1 + 0.54*x3 + 0.66*x10 + 0.36*x11 + 0.48*x15 + 0.55*x20 + 0.43*x21 +
0.54*x26 + 0.74*x29 + 0.59*x32 + 0.75*x33 + 0.75*x35 + 0.30*x36 + 0.65*x37
F2 =~ 0.78*x4 + 0.51*x7 + 0.74*x13
F3 =~ 0.65*x5 + 0.52*x6 + 0.66*x16 + 0.59*x22 + 0.68*x25 + 0.58*x27 + 0.74*x31
F4 =~ 0.73*x8 + 0.32*x12 + 0.57*x14 + 0.69*x18 + 0.60*x24 + 0.54*x38
F1 ~~ 0.21*F2 + 0.64*F3 + 0.66*F4
F2 ~~ 0.20*F3 + 0.04*F4
F3 ~~ 0.59*F4"

#test model for robustness
pop.fit <- sem(pop.model, fixed.x= FALSE)
summary(pop.fit, standardized= TRUE)

# model implied covariances
pop.cov <- fitted(pop.fit)$cov
cov2cor(pop.cov)

# include non- normality of items
install.packages("simsem")
install.packages("snow")
library(simsem)
library(snow)
help("bindDist")

distrib <- bindDist(skewness = c(-0.08305893, -0.78280374, -0.44110141, -0.09056704, 0.09063894, -0.08775931, -0.14590414, 0.43980384, -0.19083498, -0.06874235, 0.22079645, -0.51512445, -0.05187143, -0.18728330, -0.03846664, 0.04027715, 0.43868066, 0.76877483, 0.13465496, 0.48089062, 0.96015386, 0.73063389, 0.83554706, 0.88319190, -0.32420421, 0.11271975, -0.09386470, -0.05251572, -0.07685611, -0.36717964), kurtosis = c(2.028326, 2.899795, 2.428532, 2.136235, 2.066293, 2.135203, 1.878323, 2.293114, 2.167396, 2.030732, 2.107740, 2.357917, 1.920932, 2.115568,1.963168, 1.983064, 1.973506,2.659004, 1.899194, 2.097104, 2.816803, 2.491650, 2.560447, 2.458451, 2.597595, 2.163946, 1.981656, 2.228974, 1.974136, 2.216078))

analysis.model <-"
F1 =~ x1 + x3 + x10 + x11 + x15 + x20 + x21 +
x26 + x29 + x32 + x33 + x35 + x36 + x37
F2 =~ x4 + x7 + x13
F3 =~ x5 + x6 + x16 + x22 + x25 + x27 + x31
F4 =~ x8 + x12 + x14 + x18 + x24 + x38
F1 ~~ F2 + F3 + F4
F2 ~~ F3 + F4
F3 ~~ F4"

# simulate data with non-normal Y
install.packages("simsem")
library(simsem)
analysis.nn.237 <- sim(model=analysis.model, n=seq(100, 1000, 50), pmMCAR=c(0, 0.1, 0.2),generate=pop.model, lavaanfun = "cfa", indDist=distrib, seed=565, multicore=FALSE)

summaryParam(analysis.nn.237, detail = TRUE, alpha = 0.05)

plotCutoff(analysis.nn.237, alpha = 0.05)

# find n for power of .80
power.n <- getPower(analysis.nn.237, alpha=.05) 

findPower(power.n, "iv.N", power=0.80) 


Terrence Jorgensen

unread,
May 12, 2023, 7:49:43 AM5/12/23
to lavaan
For anything computationally intensive, please use saveRDS() to attach a file we can simply import the sim() output.  Posting your output probably would have been sufficient in this case, too.  getPower() takes a long time (~19,000 conditions) when you use the default (seq() from the min to max value by 1), so I ran it with your specified N and MCAR values to save time.  From what I see, your residual variances are not NA, then are Inf.   That means they were significant under all conditions (N and MCAR), not none (see ?findPower help-page Details).  Your intercepts are all NA because the are zero in your population.

Terrence D. Jorgensen    (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen



power.n <- getPower(analysis.nn.237, alpha=.05,
                    nVal = seq(100, 1000, by = 50), # or by=10 for precision
                    pmMCARval = c(0, 0.1, 0.2)) 
findPower(power.n, "iv.N", power=0.80)  

     iv.MCAR F1=~x3 F1=~x10 F1=~x11 F1=~x15 F1=~x20 F1=~x21 F1=~x26 F1=~x29 F1=~x32
[1,]     0.0    Inf     Inf     Inf     Inf     Inf     200     Inf     Inf     Inf
[2,]     0.1    Inf     Inf     100     Inf     Inf     100     Inf     Inf     Inf
[3,]     0.2    Inf     Inf     150     Inf     Inf     Inf     Inf     Inf     Inf
     F1=~x33 F1=~x35 F1=~x36 F1=~x37 F2=~x7 F2=~x13 F3=~x6 F3=~x16 F3=~x22 F3=~x25
[1,]     Inf     Inf     150     Inf    150     150    Inf     Inf     Inf     Inf
[2,]     Inf     Inf     150     Inf    200     150    Inf     Inf     Inf     Inf
[3,]     Inf     Inf     150     Inf    250     150    Inf     Inf     Inf     Inf
     F3=~x27 F3=~x31 F4=~x12 F4=~x14 F4=~x18 F4=~x24 F4=~x38 F1~~F2 F1~~F3 F1~~F4
[1,]     Inf     Inf     150     Inf     Inf     150     150    350    Inf    150
[2,]     100     Inf     200     Inf     Inf     100     100    350    100    100
[3,]     150     Inf     200     Inf     Inf     Inf     Inf    400    150    Inf
     F2~~F3 F2~~F4 F3~~F4 x1~~x1 x3~~x3 x10~~x10 x11~~x11 x15~~x15 x20~~x20 x21~~x21
[1,]    400     NA    150    Inf    Inf      Inf      Inf      Inf      Inf      Inf
[2,]    400     NA    150    Inf    Inf      Inf      Inf      Inf      Inf      Inf
[3,]    400     NA    150    Inf    Inf      Inf      Inf      Inf      Inf      Inf
     x26~~x26 x29~~x29 x32~~x32 x33~~x33 x35~~x35 x36~~x36 x37~~x37 x4~~x4 x7~~x7
[1,]      Inf      Inf      Inf      Inf      Inf      Inf      Inf    100    150
[2,]      Inf      Inf      Inf      Inf      Inf      Inf      Inf    100    150
[3,]      Inf      Inf      Inf      Inf      Inf      Inf      Inf    300    150
     x13~~x13 x5~~x5 x6~~x6 x16~~x16 x22~~x22 x25~~x25 x27~~x27 x31~~x31 x8~~x8
[1,]      Inf    Inf    Inf      Inf      Inf      Inf      Inf      Inf    Inf
[2,]      Inf    Inf    Inf      Inf      Inf      Inf      Inf      Inf    Inf
[3,]      Inf    Inf    Inf      Inf      Inf      Inf      Inf      Inf    Inf
     x12~~x12 x14~~x14 x18~~x18 x24~~x24 x38~~x38 F1~~F1 F2~~F2 F3~~F3 F4~~F4 x1~1
[1,]      Inf      Inf      Inf      Inf      Inf    Inf    100    Inf    150   NA
[2,]      Inf      Inf      Inf      Inf      Inf    Inf    150    100    Inf   NA
[3,]      Inf      Inf      Inf      Inf      Inf    Inf    200    150    100   NA
     x3~1 x10~1 x11~1 x15~1 x20~1 x21~1 x26~1 x29~1 x32~1 x33~1 x35~1 x36~1 x37~1 x4~1
[1,]   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA
[2,]   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA
[3,]   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA
     x7~1 x13~1 x5~1 x6~1 x16~1 x22~1 x25~1 x27~1 x31~1 x8~1 x12~1 x14~1 x18~1 x24~1
[1,]   NA    NA   NA   NA    NA    NA    NA    NA    NA   NA    NA    NA    NA    NA
[2,]   NA    NA   NA   NA    NA    NA    NA    NA    NA   NA    NA    NA    NA    NA
[3,]   NA    NA   NA   NA    NA    NA    NA    NA    NA   NA    NA    NA    NA    NA
     x38~1
[1,]    NA
[2,]    NA
[3,]    NA  
Reply all
Reply to author
Forward
Message has been deleted
0 new messages