Applying Beginner's Understanding of Coding: R Package 'simsem' for Sample Size

122 views
Skip to first unread message

Brandon Chin

unread,
Sep 26, 2020, 4:05:32 AM9/26/20
to lavaan
Dear Coding Engineers,

I am currently working on sample size calculation via R package 'simsem' for a CFA-based questionnaire validation project. To emphasize, this is a social science research.
(Pornprasertmanit Miller, Schoemann, Jorgensen, 2020) 

To provide context for my model:
My model is a latent model; 3 Latent Variables, 

Indicators and Latent Variable
3 Indicators for two Latent Variables and the last Latent Variable with 4. 
Hence, Total of 3 Latent and 10 Measured Variables.

Paramaratization nature of CFA model:
Two latent variables are correlated with each other, 
while the last latent variable is negatively correlated both of the other two. 

All indicator / measured variables are expected to positively estimate unto their respective latent variables.

In R, my model should look like this
X =~ x1 + x2 + x3
Y =~ y1 + y2 + y3
Z =~ z1 + z2 + z3 + z4

X ~~ Y
Z ~~ Y

In terms of conceptual framework, you may look at the attached file below for clarification.

I will run through my attempt using an example broken down into components as a template from this link and also provide some questions for those I have no idea how to deal with :
https://rdrr.io/cran/simsem/man/continuousPower.html

Full Example
# Specify Sample Size by n
loading <- matrix(0, 6, 1)
loading[1:6, 1] <- NA
LY <- bind(loading, 0.7)
RPS <- binds(diag(1))
RTE <- binds(diag(6))
CFA.Model <- model(LY = LY, RPS = RPS, RTE = RTE, modelType="CFA")
dat <- generate(CFA.Model, 50)
out <- analyze(CFA.Model, dat)

# Specify both continuous sample size and percent missing completely at random.
# Note that more fine-grained values of n and pmMCAR is needed, e.g., n=seq(50, 500, 1)
# and pmMCAR=seq(0, 0.2, 0.01)
Output <- sim(NULL, CFA.Model, n=seq(100, 200, 20), pmMCAR=c(0, 0.1, 0.2))
summary(Output)

# Find the power of all combinations of different sample size and percent MCAR missing
Cpow <- continuousPower(Output, contN = TRUE, contMCAR = TRUE)
Cpow

# Find the power of parameter estimates when sample size is 200 and percent MCAR missing is 0.3
Cpow2 <- continuousPower(Output, contN = TRUE, contMCAR = TRUE, pred=list(N = 200, pmMCAR = 0.3))
Cpow2  

Example 1#
# Specify Sample Size by n
loading <- matrix(0, 6, 1)
loading[1:6, 1] <- NA

My Coding 1#
loading <- matrix(0, 10, 3)
loading[1:3, 1] <- NA

loading[4:6, 2] <- NA 
loading[7:10, 3] <- NA
 
Question (1)
What does NA indicates?

Question (2)
What about covariance?

I found one simulation that specify it around:
#Specify latent variances and covariances
latent.cor <- matrix(NA, 2, 2)
diag(latent.cor) <- 1
RPH <- symMatrix(latent.cor, 0.1)

#Specify measurement errors
error.cor <- matrix(0, 6, 6)
diag(error.cor) <- 1
RTD <- symMatrix(error.cor)  

Question (3)
How do I identify and apply relevant matrix and diag settings?

Example 2#
LY <- bind(loading, 0.7)
RPS <- binds(diag(1))
RTE <- binds(diag(6))

My Coding 2#
I have no idea how to deal with this.

Example 3#
CFA.Model <- model(LY = LY, RPS = RPS, RTE = RTE, modelType="CFA")
dat <- generate(CFA.Model, 50)
out <- analyze(CFA.Model, dat)

My Coding 3#
CFA.Model <- model(LY=LY, RPS=RPS, RTE=RTE, modelType="CFA")
dat <- generate(CFA.Model, 50)
out <- analyze(CFA.Model, dat)


Question (4)
For 
"dat <- generate(CFA.Model, 50)"

The 50 above is the sample size that I want to simulate for power analysis.
Is this interpretation correct?
If I am correct, should 50 be the default number? or should there be some condition or assumptions to look for?

Example 4#
# Specify both continuous sample size and percent missing completely at random.
# Note that more fine-grained values of n and pmMCAR is needed, e.g., n=seq(50, 500, 1)
# and pmMCAR=seq(0, 0.2, 0.01)

Output <- sim(NULL, CFA.Model, n=seq(100, 200, 20), pmMCAR=c(0, 0.1, 0.2))
summary(Output)

My Coding 4#  
Output <- sim(NULL, CFA.Model, n=seq(50, 500, 1), pmMCAR=c(0, 0.1, 0.2))
summary(Output)  


Example 5#  
# Find the power of all combinations of different sample size and percent MCAR missing
Cpow <- continuousPower(Output, contN = TRUE, contMCAR = TRUE)
Cpow

My Coding 5#  
Cpow <- continuousPower(Output, contN = TRUE, contMCAR = TRUE)
Cpow  


Example 6#  
# Find the power of parameter estimates when sample size is 200 and percent MCAR missing is 0.3

Cpow2 <- continuousPower(Output, contN = TRUE, contMCAR = TRUE, pred=list(N = 200, pmMCAR = 0.3))

Cpow2 

My Coding 6# 
Cpow2 <- continuousPower(Output, contN = TRUE, contMCAR = TRUE, pred=list(N = 200, pmMCAR = 0.3))

Cpow2 

Question (5)
Is there a wiki/book that summarize all the NULL, TRUE interpretation for all of the functions above?

Thanks,
Brandon
(Baron, 2017) User Guide for the Expectancy-Value-Cost Survey of Student Motivation.pdf

Terrence Jorgensen

unread,
Sep 28, 2020, 7:35:06 AM9/28/20
to lavaan
Question (1)
What does NA indicates?

That indicates "missing value" in R.  In simsem, a missing parameter is one that is not fixed to a specific value, so it is estimated.  Before asking any other questions, try reading through the examples provided on simsem.org that explain how the package works:  https://github.com/simsem/simsem/wiki/Vignette

Question (2)
What about covariance?

Question (3)
How do I identify and apply relevant matrix and diag settings?

These questions are too vague to answer.  Again, perhaps learning about the package via the examples will help. 

Question (4)
For 
"dat <- generate(CFA.Model, 50)"

The 50 above is the sample size that I want to simulate for power analysis.
Is this interpretation correct?

You can find a description of each argument on the ?generate help page.
 
If I am correct, should 50 be the default number? or should there be some condition or assumptions to look for?

Depends on context.  As you next line of syntax indicates, the point of this example is to vary N so that power can be estimated at different Ns.
 
Question (5)
Is there a wiki/book that summarize all the NULL, TRUE interpretation for all of the functions above?

You can read the help page for whatever function you are asking about, and the vignettes (especially early ones) explain much about the package:  https://github.com/simsem/simsem/wiki/Vignette

And here is a paper written by the package authors, in which they use the method you are using (but with planned missing data designs): https://doi.org/10.1177/0165025413515169

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Alex Schoemann

unread,
Sep 28, 2020, 9:27:11 AM9/28/20
to lavaan
To add to the discussion, simsem can use either the matrix style input (as you're using above) or lavaan syntax as input for models. If you're running through a lot of conditions and changing parameter values often, using the matrix style input is easier. But if you're just running simulations for small models, with a limited set of parameter values it's probably easier to use lavaan syntax as input. On the vignette page (https://github.com/simsem/simsem/wiki/Vignette) be sure to look at the lavaan syntax specification examples

Alex

Reply all
Reply to author
Forward
0 new messages