Simulating data from path model - bias

42 views
Skip to first unread message

Kjorte

unread,
Mar 23, 2023, 5:34:41 PM3/23/23
to lavaan

Hello, 

 

I am trying to fit a path model from a simulated data set using a population model. I am having trouble producing a model that has <10% bias for all parameters consistently. Specifically, I am having trouble with the ELAB ~ GENDER parameter. The model I am using to simulate data is as follows: 

 

popModel <- '      # regressions        READING ~ MEMO*-.309 + ELAB*-.180 + CSTRAT*.550        MEMO ~ ESCS*.044 + GENDER*.106 + IMMIGR*.059        ELAB ~ ESCS*.110 + GENDER*-.012 + IMMIGR*.051        CSTRAT ~  ESCS*.221 + GENDER*.131 + IMMIGR*.078         # covariances        CSTRAT ~~ .591 * ELAB        CSTRAT ~~ .625 * MEMO        CSTRAT ~~ .251 * READING        CSTRAT ~~ .196 * ESCS        CSTRAT ~~ .128 * GENDER        CSTRAT ~~ .015 * IMMIGR        ELAB ~~ .468 * MEMO        ELAB ~~ .001 * READING        ELAB ~~ .095 * ESCS        ELAB ~~ -.013 *GENDER        ELAB ~~ .019 * IMMIGR        MEMO ~~ -.049 * READING        MEMO ~~ .025 * ESCS        MEMO ~~ .106 * GENDER        MEMO ~~ .047 * IMMIGR        READING ~~ .394 * ESCS        READING ~~ .145 * GENDER        READING ~~ -.088 * IMMIGR        ESCS ~~ -.015 * GENDER        ESCS ~~ -.291 * IMMIGR        GENDER ~~ .011 * IMMIGR           READING ~~ 1 * READING        CSTRAT ~~ 1 * CSTRAT        ELAB ~~ 1 * ELAB        MEMO ~~ 1 * MEMO        ESCS ~~ 1 * ESCS        GENDER ~~ 1 * GENDER        IMMIGR ~~ 1 * IMMIGR      '

 

I simulate 500,000 observations to measure how accurate my simulation is, and sometimes the bias of the model fitted from simulated data is acceptable for all parameters:

 

myData <- simulateData(popModel, model.type = "sem", sample.nobs = 500000, int.ov.free = FALSE, fixed.x = TRUE)

fit <- sem(myModel, data = myData)summary(fit_default)

lavaan 0.6-12 ended normally after 1 iterations  Estimator                                             ML  Optimization method                           NLMINB  Number of model parameters                        16  Number of observations                        500000Model Test User Model:                                                         Test statistic                              552047.956  Degrees of freedom                                   6  P-value (Chi-square)                             0.000Parameter Estimates:  Standard errors                                      Standard  Information                                             Expected  Information saturated (h1) model           StructuredRegressions:                           Estimate  Std.Err  z-value  P(>|z|)  READING ~                                              MEMO           -0.306    0.001 -235.491    0.000    ELAB             -0.179    0.001 -137.747    0.000    CSTRAT         0.548    0.001  421.200    0.000  MEMO ~                                                  ESCS              0.044    0.001   30.018    0.000    GENDER        0.103    0.001   73.531    0.000    IMMIGR          0.058    0.001   39.647    0.000  ELAB ~                                                  ESCS              0.109    0.001   74.248    0.000

    GENDER       -0.012    0.001   -8.378    0.000     (1% bias)

    IMMIGR          0.050    0.001   34.191    0.000  CSTRAT ~                                                ESCS              0.221    0.001  154.618    0.000    GENDER         0.130    0.001   94.480    0.000    IMMIGR           0.079    0.001   54.812    0.000Variances:                          Estimate  Std.Err  z-value  P(>|z|)   .READING       0.846    0.002  500.000    0.000   .MEMO            0.987    0.002  500.000    0.000   .ELAB              0.989    0.002  500.000    0.000   .CSTRAT         0.939    0.002  500.000    0.000

 

But sometimes, using the same code, bias is upwards of 12% for the ELAB~GENDER parameter. All other parameters are within 1-2% bias every time. 

 

summary(fit)lavaan 0.6-12 ended normally after 1 iterations  Estimator                                         ML  Optimization method                           NLMINB  Number of model parameters                        16  Number of observations                        500000Model Test User Model:                                                         Test statistic                              550912.037  Degrees of freedom                                   6  P-value (Chi-square)                             0.000Parameter Estimates:  Standard errors                                     Standard  Information                                            Expected  Information saturated (h1) model          StructuredRegressions:                          Estimate  Std.Err  z-value  P(>|z|)  READING ~                                              MEMO           -0.311    0.001 -239.239    0.000    ELAB             -0.176    0.001 -135.546    0.000    CSTRAT         0.549    0.001  421.599    0.000  MEMO ~                                                  ESCS              0.044    0.001   29.931    0.000    GENDER        0.106    0.001   75.804    0.000    IMMIGR          0.060    0.001   40.664    0.000  ELAB ~                                                  ESCS              0.110    0.001   74.695    0.000

    GENDER       -0.013    0.001   -9.531    0.000     (11% bias) 

    IMMIGR           0.052    0.001   35.667    0.000  CSTRAT ~                                                ESCS              0.221    0.001  154.669    0.000    GENDER         0.129    0.001   94.765    0.000    IMMIGR           0.078    0.001   54.366    0.000Variances:                          Estimate  Std.Err  z-value  P(>|z|)   .READING       0.846    0.002  500.000    0.000   .MEMO            0.985    0.002  500.000    0.000   .ELAB              0.987    0.002  500.000    0.000   .CSTRAT         0.938    0.002  500.000    0.000

 

I have been unable to find a way to ensure my fitted model from simulated data consistently has little to no bias with a sample this large. Is there some argument in simulateData() I am missing? Or am I misspecifying my model in some way? Thank you in advance for any help. 

Keith Markus

unread,
Mar 24, 2023, 10:16:50 AM3/24/23
to lavaan
Kjorte,
Are you calculating relative bias?  Your parameter value is very small: .013 for variables with variances of 1.  Your standard error displays as .001. The margin for error approximated as twice the standard error can account for 15.38% relative bias with such a small parameter value.

> .002/.013
[1] 0.1538462

So, perhaps there is no problem with your estimates.  Perhaps relative bias is a misleading index when your parameter is so close to zero.
Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/


Reply all
Reply to author
Forward
0 new messages