Simulate Dichotomous Data

Monica Casella

unread,

Mar 31, 2020, 11:05:35 AM3/31/20

to lavaan

Hello,

I post the following question also in this group:

I’m working on my degree thesis and I’m using R package “Lavaan” and the SimulateData () function to simulate data from this model:

Model <- '

f1 =~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + 0.2*V10 + 0.2*V11 + 0.2*V12 + 0.2*V13 + 0.2*V14 + 0.2*V15

f2 =~ V16 + V17 + V18 + V19 + V20 + V21 + V22 +V23 + V24 + 0.2*V25 + 0.2*V26 + 0.2*V27 + 0.2*V28 + 0.2*V29 + 0.2*V30

f3 =~ V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38 + V39 + 0.2*V40 + 0.2*V41 + 0.2*V42 + 0.2*V43 + 0.2*V44 + 0.2*V45

O1 ~ 1*f1 + 0.2*f2 + 0.2*f3 '

The observed variables and the criterion variable O1 should be dichotomous.

I know that the "|" operator can be used to define the thresholds of categorical endogenous variables, but how should I choose the threshold values? Does this way work for the criterion variable? Is it possible to simulate dichotomous data in Lavaan?

Thanks for any help,

Monica

Nickname

unread,

Apr 1, 2020, 10:52:52 AM4/1/20

to lavaan

Monica,
Taking your questions in reverse order:

Yes, simulateData() will simulate dichotomous variables.

Yes, this will work for both exogenous and endogenous variables. In some sense, all dichotomous variables are endogenous because they are modeled through a latent continuous variable.

Choose the thresholds based on the desired frequencies. The latent continuous variable is (normally) parameterized as a standard normal variable. So, you can use the standard normal distribution to calculate threshold values corresponding to a desired proportion of cases. For example, if you wanted 1/3 of the cases to have the lower value, you could use qnorm(1/3) to obtain the value of -0.4307273. For an even split, use zero.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Terrence Jorgensen

unread,

Apr 9, 2020, 6:45:06 PM4/9/20

to lavaan

to simulate data from this model:

Model <- '

f1 =~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + 0.2*V10 + 0.2*V11 + 0.2*V12 + 0.2*V13 + 0.2*V14 + 0.2*V15

f2 =~ V16 + V17 + V18 + V19 + V20 + V21 + V22 +V23 + V24 + 0.2*V25 + 0.2*V26 + 0.2*V27 + 0.2*V28 + 0.2*V29 + 0.2*V30

f3 =~ V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38 + V39 + 0.2*V40 + 0.2*V41 + 0.2*V42 + 0.2*V43 + 0.2*V44 + 0.2*V45

O1 ~ 1*f1 + 0.2*f2 + 0.2*f3 '

Half of your specified loadings have no population values.

The observed variables and the criterion variable O1 should be dichotomous.

I know that the "|" operator can be used to define the thresholds of categorical endogenous variables, but how should I choose the threshold values?

That's up to you. If you want the categories to have roughly equal counts, you can set thresholds to zero (V1 | 0*t1)

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

James Grace

unread,

Jun 23, 2020, 9:11:41 AM6/23/20

to lavaan

I tried to translate advice in this thread to a similar, but simpler problem. Observed variables only, dichotomous Treatment and two continuous responses. Admit that I really don't know what I am doing with the syntax.

Any advice for how to get dichotomous values for treatment (50:50 split) would be greatly appreciated.

## Sim Model

sim.two <- '

Grazers1 ~ -1.5*(Trt | 0*t1)

Grazers2 ~ -0.8*(Trt | 0*t1)'

## Generate data

set.seed(2)

simdat.two <- simulateData(sim.two, sample.nobs=10000L)

James Grace

unread,

Jun 23, 2020, 9:21:32 AM6/23/20

to lavaan

I found the answer to my own question about syntax.

Using

## Sim Model

sim.two <- '

Grazers1 ~ -1.5*Trt

Grazers2 ~ -0.8*Trt

Trt | 0*t1'

does the trick!

On Tuesday, March 31, 2020 at 10:05:35 AM UTC-5, Monica Casella wrote:

Reply all

Reply to author

Forward