How does sem() calculate the variances of categorical endogenous variables which depends on a mediator?

Jana F

unread,

Dec 9, 2020, 12:08:25 PM12/9/20

to lavaan

Hello,

I am trying to understand how lavaan calculates variances of a categorical endogenous variable, when the categorical output depends on a mediator variable.

I have this simple model for illustration:

dat <- data.frame(a=rnorm(1000000))
dat$b <- dat$a*0.80+rnorm(1000000,0,0.1)
dat$c <- factor(ifelse(+5-2*dat$b+rnorm(1000000,0,1)>0,1,0),ordered = TRUE)

sem('b~a
c~b', data=dat, meanstructure=TRUE)

which gives the following output:Regressions:
                   Estimate Std.Err z-value P(>|z|)
b ~
    a                 0.800    0.000 8027.399    0.000
c ~
    b                -1.970    0.019 -105.891    0.000

Intercepts:
                   Estimate Std.Err z-value P(>|z|)
   .b                -0.000    0.000   -0.538    0.590
   .c                 0.000

Thresholds:
                   Estimate Std.Err z-value P(>|z|)
    c|t1             -4.936    0.034 -146.734    0.000

Variances:
                   Estimate Std.Err z-value P(>|z|)
   .b                 0.010    0.000 707.188    0.000
   .c                 0.961

Scales y*:
                   Estimate Std.Err z-value P(>|z|)
    c                 1.000

I am wondering how the estimate for the variance of the categorical variable is calculated (marked in yellow). If b was an exogenous variable, the variance of c would be 1. But here, b is endogenous (and a mediator) and the variance is 0.961. I understand that the threshold estimates for c are obtained from the intercept of a ordered probit-regression model. But, how is the variance here calculated?

Could someone help me here please?

(I think a similar question remained unsanswered in this post)

Thank you very much!

Terrence Jorgensen

unread,

Dec 13, 2020, 8:53:19 AM12/13/20

to lavaan

The total variance of c is 1 (see its scaling factor; this is the default: parameterization = "delta"). The residual variance of c is thus 1 minus its R-squared (variance explained by its only predictor: b). Because b is endogenous, its total variance is not a model parameter, so the R-squared is a sum of 2 components:

the squared direct effect of b times b's residual variance
the squared indirect effect of a (via b) times a's variance

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Jana F

unread,

Dec 16, 2020, 9:16:50 AM12/16/20

to lav...@googlegroups.com

Thank you very much for your helpful response!

Can I use this residual variance of c to generate new data using the estimates (coefficients, intercepts, thresholds and variances) from sem?

Meaning that I would generate the binary variable from an underlying normal distribution with a mean to the product of parent values and coefficients, and a variance equal to the one reported to the SEM output applying the threshold reported in the output?

Based on the previous example, can we generate new c using the following formula:

b= 0+ 0.8*a + rnorm(n,0, sqrt(0.01))

c_normal= 0 -1.970*b +rnorm(n, 0, sqrt(0.961))

c= c_normal>=th -4.936

If not, how can I generate it?

Is there any command that does this automatically?

Thank you very much!

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/dJeMBlPL9_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a2ffa842-1485-4d5d-ac36-76bd9a45c904n%40googlegroups.com.

Terrence Jorgensen

unread,

Dec 16, 2020, 11:09:32 AM12/16/20

to lavaan

Can I use this residual variance of c to generate new data using the estimates (coefficients, intercepts, thresholds and variances) from sem?

Yes

Meaning that I would generate the binary variable from an underlying normal distribution with a mean to the product of parent values and coefficients, and a variance equal to the one reported to the SEM output applying the threshold reported in the output?

You generate the latent responses for c as a sum of the b effect and the c residuals, then use the threshold to dichotomize it. Your syntax below looks correct, except you would need to simulate a first (or use the same a used to fit your model, which would be consistent with fixed.x=TRUE).

Based on the previous example, can we generate new c using the following formula:
b= 0+ 0.8*a + rnorm(n,0, sqrt(0.01))
c_normal= 0 -1.970*b +rnorm(n, 0, sqrt(0.961))
c= c_normal>=th -4.936

If not, how can I generate it?
Is there any command that does this automatically?

No, but you can specify a population model using these parameters, and simulateData() can do the rest of the job for you. Pretty easy to use paste() to construct the syntax automatically from your model's parTable() output.

Jana F

unread,

Dec 17, 2020, 3:43:29 AM12/17/20

to lav...@googlegroups.com

That works! Thank you so much!

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/dJeMBlPL9_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/1ccfee71-e786-48b9-9727-d3843846a4bfn%40googlegroups.com.

Reply all

Reply to author

Forward