Monte Carlo Simulation for optimal sample size (CFA)

Vivi

unread,

May 10, 2023, 10:36:49 AM5/10/23

to lavaan

Hello everybody!

Our research team is currently working on coding a Monte Carlo Simulation to determine the optimal sample size for a CFA with a power of .80 in R.
We specified a theoretical population model, as well as an analysis model with the empirical factor loadings extracted from a prior EFA. We also accounted for the non- normality of the data by including the empirical kurtosis and skewness values of each indicator to the analysis model.
Based on the analysis model, we ran sequential simulations for different sample sizes (n= 100 to 1000). The output shows the requested sample size to estimate the respective parameters with a power of .80. Our problem lies in the output for the residuals, which shows “NA”, implying, that no sample size from n= 100 to n= 1000 is deemed as sufficient to result in reliable estimates for the parameters. This doesn´t sound realistic to us.
Our question is whether the values for the residuals are actually important for determining an appropriate sample size for CFA. We couldn´t find any sources, which specified residuals in the analysis model, but merely specified factor loadings and covariances between the factors, which we did.

Thanks for every helpful input!

Our R code:

install.packages("lavaan")
library(lavaan)

# specify population model
pop.model <-"
F1 =~ 0.62*x1 + 0.54*x3 + 0.66*x10 + 0.36*x11 + 0.48*x15 + 0.55*x20 + 0.43*x21 +
0.54*x26 + 0.74*x29 + 0.59*x32 + 0.75*x33 + 0.75*x35 + 0.30*x36 + 0.65*x37
F2 =~ 0.78*x4 + 0.51*x7 + 0.74*x13
F3 =~ 0.65*x5 + 0.52*x6 + 0.66*x16 + 0.59*x22 + 0.68*x25 + 0.58*x27 + 0.74*x31
F4 =~ 0.73*x8 + 0.32*x12 + 0.57*x14 + 0.69*x18 + 0.60*x24 + 0.54*x38
F1 ~~ 0.21*F2 + 0.64*F3 + 0.66*F4
F2 ~~ 0.20*F3 + 0.04*F4
F3 ~~ 0.59*F4"

#test model for robustness
pop.fit <- sem(pop.model, fixed.x= FALSE)
summary(pop.fit, standardized= TRUE)

# model implied covariances
pop.cov <- fitted(pop.fit)$cov
cov2cor(pop.cov)

# include non- normality of items
install.packages("simsem")
install.packages("snow")
library(simsem)
library(snow)
help("bindDist")

distrib <- bindDist(skewness = c(-0.08305893, -0.78280374, -0.44110141, -0.09056704, 0.09063894, -0.08775931, -0.14590414, 0.43980384, -0.19083498, -0.06874235, 0.22079645, -0.51512445, -0.05187143, -0.18728330, -0.03846664, 0.04027715, 0.43868066, 0.76877483, 0.13465496, 0.48089062, 0.96015386, 0.73063389, 0.83554706, 0.88319190, -0.32420421, 0.11271975, -0.09386470, -0.05251572, -0.07685611, -0.36717964), kurtosis = c(2.028326, 2.899795, 2.428532, 2.136235, 2.066293, 2.135203, 1.878323, 2.293114, 2.167396, 2.030732, 2.107740, 2.357917, 1.920932, 2.115568,1.963168, 1.983064, 1.973506,2.659004, 1.899194, 2.097104, 2.816803, 2.491650, 2.560447, 2.458451, 2.597595, 2.163946, 1.981656, 2.228974, 1.974136, 2.216078))

analysis.model <-"
F1 =~ x1 + x3 + x10 + x11 + x15 + x20 + x21 +
x26 + x29 + x32 + x33 + x35 + x36 + x37
F2 =~ x4 + x7 + x13
F3 =~ x5 + x6 + x16 + x22 + x25 + x27 + x31
F4 =~ x8 + x12 + x14 + x18 + x24 + x38
F1 ~~ F2 + F3 + F4
F2 ~~ F3 + F4
F3 ~~ F4"

# simulate data with non-normal Y
install.packages("simsem")
library(simsem)
analysis.nn.237 <- sim(model=analysis.model, n=seq(100, 1000, 50), pmMCAR=c(0, 0.1, 0.2),generate=pop.model, lavaanfun = "cfa", indDist=distrib, seed=565, multicore=FALSE)

summaryParam(analysis.nn.237, detail = TRUE, alpha = 0.05)

plotCutoff(analysis.nn.237, alpha = 0.05)

# find n for power of .80
power.n <- getPower(analysis.nn.237, alpha=.05)

findPower(power.n, "iv.N", power=0.80)

Terrence Jorgensen

unread,

May 12, 2023, 7:49:43 AM5/12/23

to lavaan

For anything computationally intensive, please use saveRDS() to attach a file we can simply import the sim() output. Posting your output probably would have been sufficient in this case, too. getPower() takes a long time (~19,000 conditions) when you use the default (seq() from the min to max value by 1), so I ran it with your specified N and MCAR values to save time. From what I see, your residual variances are not NA, then are Inf. That means they were significant under all conditions (N and MCAR), not none (see ?findPower help-page Details). Your intercepts are all NA because the are zero in your population.

Terrence D. Jorgensen (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

power.n <- getPower(analysis.nn.237, alpha=.05,

nVal = seq(100, 1000, by = 50), # or by=10 for precision

pmMCARval = c(0, 0.1, 0.2))

findPower(power.n, "iv.N", power=0.80)

iv.MCAR F1=~x3 F1=~x10 F1=~x11 F1=~x15 F1=~x20 F1=~x21 F1=~x26 F1=~x29 F1=~x32
[1,] 0.0 Inf Inf Inf Inf Inf 200 Inf Inf Inf
[2,] 0.1 Inf Inf 100 Inf Inf 100 Inf Inf Inf
[3,] 0.2 Inf Inf 150 Inf Inf Inf Inf Inf Inf
F1=~x33 F1=~x35 F1=~x36 F1=~x37 F2=~x7 F2=~x13 F3=~x6 F3=~x16 F3=~x22 F3=~x25
[1,] Inf Inf 150 Inf 150 150 Inf Inf Inf Inf
[2,] Inf Inf 150 Inf 200 150 Inf Inf Inf Inf
[3,] Inf Inf 150 Inf 250 150 Inf Inf Inf Inf
F3=~x27 F3=~x31 F4=~x12 F4=~x14 F4=~x18 F4=~x24 F4=~x38 F1~~F2 F1~~F3 F1~~F4
[1,] Inf Inf 150 Inf Inf 150 150 350 Inf 150
[2,] 100 Inf 200 Inf Inf 100 100 350 100 100
[3,] 150 Inf 200 Inf Inf Inf Inf 400 150 Inf
F2~~F3 F2~~F4 F3~~F4 x1~~x1 x3~~x3 x10~~x10 x11~~x11 x15~~x15 x20~~x20 x21~~x21
[1,] 400 NA 150 Inf Inf Inf Inf Inf Inf Inf
[2,] 400 NA 150 Inf Inf Inf Inf Inf Inf Inf
[3,] 400 NA 150 Inf Inf Inf Inf Inf Inf Inf
x26~~x26 x29~~x29 x32~~x32 x33~~x33 x35~~x35 x36~~x36 x37~~x37 x4~~x4 x7~~x7
[1,] Inf Inf Inf Inf Inf Inf Inf 100 150
[2,] Inf Inf Inf Inf Inf Inf Inf 100 150
[3,] Inf Inf Inf Inf Inf Inf Inf 300 150
x13~~x13 x5~~x5 x6~~x6 x16~~x16 x22~~x22 x25~~x25 x27~~x27 x31~~x31 x8~~x8
[1,] Inf Inf Inf Inf Inf Inf Inf Inf Inf
[2,] Inf Inf Inf Inf Inf Inf Inf Inf Inf
[3,] Inf Inf Inf Inf Inf Inf Inf Inf Inf
x12~~x12 x14~~x14 x18~~x18 x24~~x24 x38~~x38 F1~~F1 F2~~F2 F3~~F3 F4~~F4 x1~1
[1,] Inf Inf Inf Inf Inf Inf 100 Inf 150 NA
[2,] Inf Inf Inf Inf Inf Inf 150 100 Inf NA
[3,] Inf Inf Inf Inf Inf Inf 200 150 100 NA
x3~1 x10~1 x11~1 x15~1 x20~1 x21~1 x26~1 x29~1 x32~1 x33~1 x35~1 x36~1 x37~1 x4~1
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[2,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[3,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
x7~1 x13~1 x5~1 x6~1 x16~1 x22~1 x25~1 x27~1 x31~1 x8~1 x12~1 x14~1 x18~1 x24~1
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[2,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[3,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
x38~1
[1,] NA
[2,] NA

[3,] NA

Reply all

Reply to author

Forward

Message has been deleted