How does sem() calculate the variances of categorical endogenous variables which depends on a mediator?

53 views
Skip to first unread message

Jana F

unread,
Dec 9, 2020, 12:08:25 PM12/9/20
to lavaan

Hello,

I am trying to understand how lavaan calculates variances of a categorical endogenous variable, when the categorical output depends on a mediator variable.

I have this simple model for illustration:
dat <- data.frame(a=rnorm(1000000))
dat$b <- dat$a*0.80+rnorm(1000000,0,0.1)
dat$c <- factor(ifelse(+5-2*dat$b+rnorm(1000000,0,1)>0,1,0),ordered = TRUE)
sem('b~a
            c~b', data=dat, meanstructure=TRUE)

which gives the following output:Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  b ~                                                 
    a                 0.800    0.000 8027.399    0.000
  c ~                                                 
    b                -1.970    0.019 -105.891    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .b                -0.000    0.000   -0.538    0.590
   .c                 0.000                           

Thresholds:
                   Estimate  Std.Err  z-value  P(>|z|)
    c|t1             -4.936    0.034 -146.734    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .b                 0.010    0.000  707.188    0.000
   .c                 0.961                           

Scales y*:
                   Estimate  Std.Err  z-value  P(>|z|)
    c                 1.000       

I am wondering how the estimate for the variance of the categorical variable is calculated (marked in yellow). If b was an exogenous variable, the variance of c would be 1. But here, b is endogenous (and a mediator) and the variance is 0.961. I understand that the threshold estimates for c are obtained from the intercept of a ordered probit-regression model. But, how is the variance here calculated?

Could someone help me here please?
(I think a similar question remained unsanswered in this post)

Thank you very much!

Terrence Jorgensen

unread,
Dec 13, 2020, 8:53:19 AM12/13/20
to lavaan
The total variance of c is 1 (see its scaling factor; this is the default: parameterization = "delta").  The residual variance of c is thus 1 minus its R-squared (variance explained by its only predictor: b).  Because b is endogenous, its total variance is not a model parameter, so the R-squared is a sum of 2 components:
  • the squared direct effect of b times b's residual variance
  • the squared indirect effect of a (via b) times a's variance
Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
 

Jana F

unread,
Dec 16, 2020, 9:16:50 AM12/16/20
to lav...@googlegroups.com
Thank you very much for your helpful response!

Can I use this residual variance of c to generate new data using the estimates (coefficients, intercepts, thresholds and variances) from sem?

Meaning that I would generate the binary variable from an underlying normal distribution with a mean to the product of parent values and coefficients, and a variance equal to the one reported to the SEM output applying the threshold reported in the output?

Based on the previous example, can we generate new c using the following formula:
b= 0+ 0.8*a + rnorm(n,0, sqrt(0.01))
c_normal= 0 -1.970*b +rnorm(n, 0, sqrt(0.961))
c= c_normal>=th -4.936

If not, how can I generate it?
Is there any command that does this automatically?


Thank you very much!



--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/dJeMBlPL9_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a2ffa842-1485-4d5d-ac36-76bd9a45c904n%40googlegroups.com.

Terrence Jorgensen

unread,
Dec 16, 2020, 11:09:32 AM12/16/20
to lavaan
Can I use this residual variance of c to generate new data using the estimates (coefficients, intercepts, thresholds and variances) from sem?

Yes
 
Meaning that I would generate the binary variable from an underlying normal distribution with a mean to the product of parent values and coefficients, and a variance equal to the one reported to the SEM output applying the threshold reported in the output?

You generate the latent responses for c as a sum of the b effect and the c residuals, then use the threshold to dichotomize it.  Your syntax below looks correct, except you would need to simulate a first (or use the same a used to fit your model, which would be consistent with fixed.x=TRUE). 
 
Based on the previous example, can we generate new c using the following formula:
b= 0+ 0.8*a + rnorm(n,0, sqrt(0.01))
c_normal= 0 -1.970*b +rnorm(n, 0, sqrt(0.961))
c= c_normal>=th -4.936

If not, how can I generate it?
Is there any command that does this automatically?

No, but you can specify a population model using these parameters, and simulateData() can do the rest of the job for you.  Pretty easy to use paste() to construct the syntax automatically from your model's parTable() output.

Jana F

unread,
Dec 17, 2020, 3:43:29 AM12/17/20
to lav...@googlegroups.com
That works! Thank you so much!


--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/dJeMBlPL9_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages