Hello,
I am trying to fit a path model from a simulated data set using a population model. I am having trouble producing a model that has <10% bias for all parameters consistently. Specifically, I am having trouble with the ELAB ~ GENDER parameter. The model I am using to simulate data is as follows:
popModel <- ' # regressions READING ~ MEMO*-.309 + ELAB*-.180 + CSTRAT*.550 MEMO ~ ESCS*.044 + GENDER*.106 + IMMIGR*.059 ELAB ~ ESCS*.110 + GENDER*-.012 + IMMIGR*.051 CSTRAT ~ ESCS*.221 + GENDER*.131 + IMMIGR*.078 # covariances CSTRAT ~~ .591 * ELAB CSTRAT ~~ .625 * MEMO CSTRAT ~~ .251 * READING CSTRAT ~~ .196 * ESCS CSTRAT ~~ .128 * GENDER CSTRAT ~~ .015 * IMMIGR ELAB ~~ .468 * MEMO ELAB ~~ .001 * READING ELAB ~~ .095 * ESCS ELAB ~~ -.013 *GENDER ELAB ~~ .019 * IMMIGR MEMO ~~ -.049 * READING MEMO ~~ .025 * ESCS MEMO ~~ .106 * GENDER MEMO ~~ .047 * IMMIGR READING ~~ .394 * ESCS READING ~~ .145 * GENDER READING ~~ -.088 * IMMIGR ESCS ~~ -.015 * GENDER ESCS ~~ -.291 * IMMIGR GENDER ~~ .011 * IMMIGR READING ~~ 1 * READING CSTRAT ~~ 1 * CSTRAT ELAB ~~ 1 * ELAB MEMO ~~ 1 * MEMO ESCS ~~ 1 * ESCS GENDER ~~ 1 * GENDER IMMIGR ~~ 1 * IMMIGR '
I simulate 500,000 observations to measure how accurate my simulation is, and sometimes the bias of the model fitted from simulated data is acceptable for all parameters:
myData <- simulateData(popModel, model.type = "sem", sample.nobs = 500000, int.ov.free = FALSE, fixed.x = TRUE)
fit <- sem(myModel, data = myData)summary(fit_default)
lavaan 0.6-12 ended normally after 1 iterations Estimator ML Optimization method NLMINB Number of model parameters 16 Number of observations 500000Model Test User Model: Test statistic 552047.956 Degrees of freedom 6 P-value (Chi-square) 0.000Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model StructuredRegressions: Estimate Std.Err z-value P(>|z|) READING ~ MEMO -0.306 0.001 -235.491 0.000 ELAB -0.179 0.001 -137.747 0.000 CSTRAT 0.548 0.001 421.200 0.000 MEMO ~ ESCS 0.044 0.001 30.018 0.000 GENDER 0.103 0.001 73.531 0.000 IMMIGR 0.058 0.001 39.647 0.000 ELAB ~ ESCS 0.109 0.001 74.248 0.000
GENDER -0.012 0.001 -8.378 0.000 (1% bias)
IMMIGR 0.050 0.001 34.191 0.000 CSTRAT ~ ESCS 0.221 0.001 154.618 0.000 GENDER 0.130 0.001 94.480 0.000 IMMIGR 0.079 0.001 54.812 0.000Variances: Estimate Std.Err z-value P(>|z|) .READING 0.846 0.002 500.000 0.000 .MEMO 0.987 0.002 500.000 0.000 .ELAB 0.989 0.002 500.000 0.000 .CSTRAT 0.939 0.002 500.000 0.000
But sometimes, using the same code, bias is upwards of 12% for the ELAB~GENDER parameter. All other parameters are within 1-2% bias every time.
summary(fit)lavaan 0.6-12 ended normally after 1 iterations Estimator ML Optimization method NLMINB Number of model parameters 16 Number of observations 500000Model Test User Model: Test statistic 550912.037 Degrees of freedom 6 P-value (Chi-square) 0.000Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model StructuredRegressions: Estimate Std.Err z-value P(>|z|) READING ~ MEMO -0.311 0.001 -239.239 0.000 ELAB -0.176 0.001 -135.546 0.000 CSTRAT 0.549 0.001 421.599 0.000 MEMO ~ ESCS 0.044 0.001 29.931 0.000 GENDER 0.106 0.001 75.804 0.000 IMMIGR 0.060 0.001 40.664 0.000 ELAB ~ ESCS 0.110 0.001 74.695 0.000
GENDER -0.013 0.001 -9.531 0.000 (11% bias)
IMMIGR 0.052 0.001 35.667 0.000 CSTRAT ~ ESCS 0.221 0.001 154.669 0.000 GENDER 0.129 0.001 94.765 0.000 IMMIGR 0.078 0.001 54.366 0.000Variances: Estimate Std.Err z-value P(>|z|) .READING 0.846 0.002 500.000 0.000 .MEMO 0.985 0.002 500.000 0.000 .ELAB 0.987 0.002 500.000 0.000 .CSTRAT 0.938 0.002 500.000 0.000
I have been unable to find a way to ensure my fitted model from simulated data consistently has little to no bias with a sample this large. Is there some argument in simulateData() I am missing? Or am I misspecifying my model in some way? Thank you in advance for any help.