Hello
everybody!
Our research team is currently working on coding a Monte Carlo Simulation to
determine the optimal sample size for a CFA with a power of .80 in R.
We specified a theoretical population model, as well as an analysis model with
the empirical factor loadings extracted from a prior EFA. We also accounted for
the non- normality of the data by including the empirical kurtosis and skewness
values of each indicator to the analysis model.
Based on the analysis model, we ran sequential simulations for different sample
sizes (n= 100 to 1000). The output shows the requested sample size to estimate
the respective parameters with a power of .80. Our problem lies in the output
for the residuals, which shows “NA”, implying, that no sample size from n= 100
to n= 1000 is deemed as sufficient to result in reliable estimates for the
parameters. This doesn´t sound realistic to us.
Our question is whether the values for the residuals are actually important for
determining an appropriate sample size for CFA. We couldn´t find any sources,
which specified residuals in the analysis model, but merely specified factor
loadings and covariances between the factors, which we did.
Thanks for every helpful input!
Our R code:
install.packages("lavaan")
library(lavaan)
# specify population model
pop.model <-"
F1 =~ 0.62*x1 + 0.54*x3 + 0.66*x10 + 0.36*x11 + 0.48*x15 + 0.55*x20 + 0.43*x21 +
0.54*x26 + 0.74*x29 + 0.59*x32 + 0.75*x33 + 0.75*x35 + 0.30*x36 + 0.65*x37
F2 =~ 0.78*x4 + 0.51*x7 + 0.74*x13
F3 =~ 0.65*x5 + 0.52*x6 + 0.66*x16 + 0.59*x22 + 0.68*x25 + 0.58*x27 + 0.74*x31
F4 =~ 0.73*x8 + 0.32*x12 + 0.57*x14 + 0.69*x18 + 0.60*x24 + 0.54*x38
F1 ~~ 0.21*F2 + 0.64*F3 + 0.66*F4
F2 ~~ 0.20*F3 + 0.04*F4
F3 ~~ 0.59*F4"
#test model for robustness
pop.fit <- sem(pop.model, fixed.x= FALSE)
summary(pop.fit, standardized= TRUE)
# model implied covariances
pop.cov <- fitted(pop.fit)$cov
cov2cor(pop.cov)
# include non- normality of items
install.packages("simsem")
install.packages("snow")
library(simsem)
library(snow)
help("bindDist")
distrib <- bindDist(skewness = c(-0.08305893, -0.78280374, -0.44110141, -0.09056704, 0.09063894, -0.08775931, -0.14590414, 0.43980384, -0.19083498, -0.06874235, 0.22079645, -0.51512445, -0.05187143, -0.18728330, -0.03846664, 0.04027715, 0.43868066, 0.76877483, 0.13465496, 0.48089062, 0.96015386, 0.73063389, 0.83554706, 0.88319190, -0.32420421, 0.11271975, -0.09386470, -0.05251572, -0.07685611, -0.36717964), kurtosis = c(2.028326, 2.899795, 2.428532, 2.136235, 2.066293, 2.135203, 1.878323, 2.293114, 2.167396, 2.030732, 2.107740, 2.357917, 1.920932, 2.115568,1.963168, 1.983064, 1.973506,2.659004, 1.899194, 2.097104, 2.816803, 2.491650, 2.560447, 2.458451, 2.597595, 2.163946, 1.981656, 2.228974, 1.974136, 2.216078))
analysis.model <-"
F1 =~ x1 + x3 + x10 + x11 + x15 + x20 + x21 +
x26 + x29 + x32 + x33 + x35 + x36 + x37
F2 =~ x4 + x7 + x13
F3 =~ x5 + x6 + x16 + x22 + x25 + x27 + x31
F4 =~ x8 + x12 + x14 + x18 + x24 + x38
F1 ~~ F2 + F3 + F4
F2 ~~ F3 + F4
F3 ~~ F4"
# simulate data with non-normal Y
install.packages("simsem")
library(simsem)
analysis.nn.237 <- sim(model=analysis.model, n=seq(100, 1000, 50), pmMCAR=c(0, 0.1, 0.2),generate=pop.model, lavaanfun = "cfa", indDist=distrib, seed=565, multicore=FALSE)
summaryParam(analysis.nn.237, detail = TRUE, alpha = 0.05)
plotCutoff(analysis.nn.237, alpha = 0.05)
# find n for power of .80
power.n <- getPower(analysis.nn.237, alpha=.05)
findPower(power.n, "iv.N", power=0.80)
Terrence D. Jorgensen (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen