Simulating data from path model - bias

42 views

Skip to first unread message

Kjorte

unread,

Mar 23, 2023, 5:34:41 PM3/23/23

to lavaan

Hello,

I am trying to fit a path model from a simulated data set using a population model. I am having trouble producing a model that has <10% bias for all parameters consistently. Specifically, I am having trouble with the ELAB ~ GENDER parameter. The model I am using to simulate data is as follows:

popModel <- ' # regressions READING ~ MEMO*-.309 + ELAB*-.180 + CSTRAT*.550 MEMO ~ ESCS*.044 + GENDER*.106 + IMMIGR*.059 ELAB ~ ESCS*.110 + GENDER*-.012 + IMMIGR*.051 CSTRAT ~ ESCS*.221 + GENDER*.131 + IMMIGR*.078 # covariances CSTRAT ~~ .591 * ELAB CSTRAT ~~ .625 * MEMO CSTRAT ~~ .251 * READING CSTRAT ~~ .196 * ESCS CSTRAT ~~ .128 * GENDER CSTRAT ~~ .015 * IMMIGR ELAB ~~ .468 * MEMO ELAB ~~ .001 * READING ELAB ~~ .095 * ESCS ELAB ~~ -.013 *GENDER ELAB ~~ .019 * IMMIGR MEMO ~~ -.049 * READING MEMO ~~ .025 * ESCS MEMO ~~ .106 * GENDER MEMO ~~ .047 * IMMIGR READING ~~ .394 * ESCS READING ~~ .145 * GENDER READING ~~ -.088 * IMMIGR ESCS ~~ -.015 * GENDER ESCS ~~ -.291 * IMMIGR GENDER ~~ .011 * IMMIGR READING ~~ 1 * READING CSTRAT ~~ 1 * CSTRAT ELAB ~~ 1 * ELAB MEMO ~~ 1 * MEMO ESCS ~~ 1 * ESCS GENDER ~~ 1 * GENDER IMMIGR ~~ 1 * IMMIGR '

I simulate 500,000 observations to measure how accurate my simulation is, and sometimes the bias of the model fitted from simulated data is acceptable for all parameters:

myData <- simulateData(popModel, model.type = "sem", sample.nobs = 500000, int.ov.free = FALSE, fixed.x = TRUE)

fit <- sem(myModel, data = myData)summary(fit_default)

lavaan 0.6-12 ended normally after 1 iterations Estimator ML Optimization method NLMINB Number of model parameters 16 Number of observations 500000Model Test User Model: Test statistic 552047.956 Degrees of freedom 6 P-value (Chi-square) 0.000Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model StructuredRegressions: Estimate Std.Err z-value P(>|z|) READING ~ MEMO -0.306 0.001 -235.491 0.000 ELAB -0.179 0.001 -137.747 0.000 CSTRAT 0.548 0.001 421.200 0.000 MEMO ~ ESCS 0.044 0.001 30.018 0.000 GENDER 0.103 0.001 73.531 0.000 IMMIGR 0.058 0.001 39.647 0.000 ELAB ~ ESCS 0.109 0.001 74.248 0.000

GENDER -0.012 0.001 -8.378 0.000 (1% bias)

IMMIGR 0.050 0.001 34.191 0.000 CSTRAT ~ ESCS 0.221 0.001 154.618 0.000 GENDER 0.130 0.001 94.480 0.000 IMMIGR 0.079 0.001 54.812 0.000Variances: Estimate Std.Err z-value P(>|z|) .READING 0.846 0.002 500.000 0.000 .MEMO 0.987 0.002 500.000 0.000 .ELAB 0.989 0.002 500.000 0.000 .CSTRAT 0.939 0.002 500.000 0.000

But sometimes, using the same code, bias is upwards of 12% for the ELAB~GENDER parameter. All other parameters are within 1-2% bias every time.

summary(fit)lavaan 0.6-12 ended normally after 1 iterations Estimator ML Optimization method NLMINB Number of model parameters 16 Number of observations 500000Model Test User Model: Test statistic 550912.037 Degrees of freedom 6 P-value (Chi-square) 0.000Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model StructuredRegressions: Estimate Std.Err z-value P(>|z|) READING ~ MEMO -0.311 0.001 -239.239 0.000 ELAB -0.176 0.001 -135.546 0.000 CSTRAT 0.549 0.001 421.599 0.000 MEMO ~ ESCS 0.044 0.001 29.931 0.000 GENDER 0.106 0.001 75.804 0.000 IMMIGR 0.060 0.001 40.664 0.000 ELAB ~ ESCS 0.110 0.001 74.695 0.000

GENDER -0.013 0.001 -9.531 0.000 (11% bias)

IMMIGR 0.052 0.001 35.667 0.000 CSTRAT ~ ESCS 0.221 0.001 154.669 0.000 GENDER 0.129 0.001 94.765 0.000 IMMIGR 0.078 0.001 54.366 0.000Variances: Estimate Std.Err z-value P(>|z|) .READING 0.846 0.002 500.000 0.000 .MEMO 0.985 0.002 500.000 0.000 .ELAB 0.987 0.002 500.000 0.000 .CSTRAT 0.938 0.002 500.000 0.000

I have been unable to find a way to ensure my fitted model from simulated data consistently has little to no bias with a sample this large. Is there some argument in simulateData() I am missing? Or am I misspecifying my model in some way? Thank you in advance for any help.

Keith Markus

unread,

Mar 24, 2023, 10:16:50 AM3/24/23

to lavaan

Kjorte,

Are you calculating relative bias? Your parameter value is very small: .013 for variables with variances of 1. Your standard error displays as .001. The margin for error approximated as twice the standard error can account for 15.38% relative bias with such a small parameter value.

> .002/.013
[1] 0.1538462

So, perhaps there is no problem with your estimates. Perhaps relative bias is a misleading index when your parameter is so close to zero.

Keith

------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Reply all

Reply to author

Forward

0 new messages