# Simulate Dichotomous Data

90 views

### Monica Casella

Mar 31, 2020, 11:05:35 AM3/31/20
to lavaan

Hello,

I post the following question also in this group:

I’m working on my degree thesis and I’m using R package “Lavaan” and the SimulateData () function to simulate data from this model:

Model <- '

f1 =~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + 0.2*V10 + 0.2*V11 + 0.2*V12 + 0.2*V13 + 0.2*V14 + 0.2*V15

f2 =~ V16 + V17 + V18 + V19 + V20 + V21 + V22 +V23 + V24 + 0.2*V25 + 0.2*V26 + 0.2*V27 + 0.2*V28 + 0.2*V29 + 0.2*V30

f3 =~ V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38 + V39 + 0.2*V40 + 0.2*V41 + 0.2*V42 + 0.2*V43 + 0.2*V44 + 0.2*V45

O1 ~ 1*f1 + 0.2*f2 + 0.2*f3 '

The observed variables and the criterion variable O1 should be dichotomous.

I know that the "|" operator can be used to define the thresholds of categorical endogenous variables, but how should I choose the threshold values? Does this way work for the criterion variable? Is it possible to simulate dichotomous data in Lavaan?

Thanks for any help,

Monica

### Nickname

Apr 1, 2020, 10:52:52 AM4/1/20
to lavaan
Monica,
Taking your questions in reverse order:

Yes, simulateData() will simulate dichotomous variables.

Yes, this will work for both exogenous and endogenous variables.  In some sense, all dichotomous variables are endogenous because they are modeled through a latent continuous variable.

Choose the thresholds based on the desired frequencies.  The latent continuous variable is (normally) parameterized as a standard normal variable.  So, you can use the standard normal distribution to calculate threshold values corresponding to a desired proportion of cases.  For example, if you wanted 1/3 of the cases to have the lower value, you could use qnorm(1/3) to obtain the value of -0.4307273.  For an even split, use zero.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

### Terrence Jorgensen

Apr 9, 2020, 6:45:06 PM4/9/20
to lavaan

to simulate data from this model:

Model <- '

f1 =~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + 0.2*V10 + 0.2*V11 + 0.2*V12 + 0.2*V13 + 0.2*V14 + 0.2*V15

f2 =~ V16 + V17 + V18 + V19 + V20 + V21 + V22 +V23 + V24 + 0.2*V25 + 0.2*V26 + 0.2*V27 + 0.2*V28 + 0.2*V29 + 0.2*V30

f3 =~ V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38 + V39 + 0.2*V40 + 0.2*V41 + 0.2*V42 + 0.2*V43 + 0.2*V44 + 0.2*V45

O1 ~ 1*f1 + 0.2*f2 + 0.2*f3 '

The observed variables and the criterion variable O1 should be dichotomous.

I know that the "|" operator can be used to define the thresholds of categorical endogenous variables, but how should I choose the threshold values?

That's up to you.  If you want the categories to have roughly equal counts, you can set thresholds to zero (V1 | 0*t1

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

### James Grace

Jun 23, 2020, 9:11:41 AM6/23/20
to lavaan
I tried to translate advice in this thread to a similar, but simpler problem. Observed variables only, dichotomous Treatment and two continuous responses. Admit that I really don't know what I am doing with the syntax.
Any advice for how to get dichotomous values for treatment (50:50 split) would be greatly appreciated.

## Sim Model
sim.two <- '
Grazers1 ~ -1.5*(Trt | 0*t1)
Grazers2 ~ -0.8*(Trt | 0*t1)'

## Generate data
set.seed(2)
simdat.two <- simulateData(sim.two, sample.nobs=10000L)

### James Grace

Jun 23, 2020, 9:21:32 AM6/23/20
to lavaan
Using

## Sim Model
sim.two <- '
Grazers1 ~ -1.5*Trt
Grazers2 ~ -0.8*Trt
Trt | 0*t1'

does the trick!

On Tuesday, March 31, 2020 at 10:05:35 AM UTC-5, Monica Casella wrote: