Interpreting coefficients with categorical endogenous variables

786 views
Skip to first unread message

Blain Waan

unread,
Feb 25, 2018, 10:03:58 PM2/25/18
to lavaan
I have run a cross-lagged SEM with two time points, year 1 and 2. The variable 'emp' is binary and 'SA' is ordinal. 'PHQ8' is continuous. I declared them both as 'ordered' before using 'laavan' to estimate the coefficients. 

1) Could I use nominal variables as endogenous by declaring them as 'factor' variables instead of 'ordered' in R?  

2) The results are as follows:

Regressions:


Estimate Std.Err z-value P(>|z|)


  PHQ8_1 ~


    emp1 -3.779   0.507  -7.451   0.000


  SA1 ~


    emp1   0.180   0.109   1.663   0.096


    PHQ8_1   0.017  0.009    1.886   0.059


  emp2 ~


    emp1   1.875   0.133  14.070   0.000


    SA1   0.679   0.068   9.965   0.000


    PHQ8_1 -0.057   0.010  -5.605   0.000


  PHQ8_2 ~


    emp2 -1.489   0.171  -8.714   0.000


  SA2 ~


    emp2   0.419   0.038  11.129   0.000


    PHQ8_2   0.047   0.009   5.053   0.000


I'm wondering how I should interpret these coefficients. The estimation method used 'DWLS'. I have also got some threshold parameter. Is there any example that I can see that will help me understand the interpretation? 



Terrence Jorgensen

unread,
Feb 26, 2018, 5:39:54 AM2/26/18
to lavaan
1) Could I use nominal variables as endogenous by declaring them as 'factor' variables instead of 'ordered' in R?  

lavaan does not support nominal multicategory outcomes

I'm wondering how I should interpret these coefficients. The estimation method used 'DWLS'. I have also got some threshold parameter. Is there any example that I can see that will help me understand the interpretation? 

It is probit regression for the ordinal outcome, linear regression for the continuous outcome.  If you are using the default "delta" parameterization and have a single-group model (or no constraints across multiple groups), then you can interpret the probit slope as a regular linear effect on the assumed standard-normal (i.e., z-score) latent response underlying the observed ordinal response.

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Blain Waan

unread,
Feb 27, 2018, 4:11:17 PM2/27/18
to lavaan
Thank you for your reply. Do you remember any work that interpreted similar SEM outputs? If emp1 and emp2 both are binary, should I be able to say "if emp1 changes from 0 to 1, the probability for the variable emp2 taking value one rises by 187.5% points, or other words, almost doubles"?

Blain Waan

unread,
Feb 27, 2018, 4:14:54 PM2/27/18
to lavaan
Also, could you please suggest me how will be the interpretation for SA2 over SA1, both of which are ordinal?

Terrence Jorgensen

unread,
Feb 28, 2018, 5:39:15 AM2/28/18
to lavaan
Also, could you please suggest me how will be the interpretation for SA2 over SA1, both of which are ordinal?

In a probit model, slopes have the same interpretation whether the outcome has 2, 3, 4... ordered categories.  See below.

Do you remember any work that interpreted similar SEM outputs?

Not path analyses, just common-factor and IRT models (where the predictors are latent traits).  But you can search the web for readings about probit regression.  Path analyses are simultaneous regression models, so what you find should generally apply to path models too.  

If emp1 and emp2 both are binary, should I be able to say "if emp1 changes from 0 to 1, the probability for the variable emp2 taking value one rises by 187.5% points, or other words, almost doubles"? 

No, the slopes are not effects on the probability, but linear effects on the latent variable that underlies the discrete observed response.  The slope of 1.875 indicates the latent variable (assumed normally distributed, probably a z score if you use delta parameterization) is 1.875 higher for the emp1==1 group than the emp1==0 group, holding the other covariates constant.  The corresponding change in the probability that emp2==1 will depend on the values at which you hold the other covariates constant (because the intercept is fixed at zero, it won't depend on that).  

$$ probit(emp2=1) = 0 + 1.875(emp1) + 0.679(SA1) - 0.057(PHQ8_1) $$

For example, if the covariates both == 0, then the expected probit(emp2==1) is 
  • for emp1==0: 0 + 1.875(0) + 0.679(0) - 0.057(0) = 0
  • for emp1==1: 0 + 1.875(1) + 0.679(0) - 0.057(0) = 1.875
The cumulative probabilities that correspond to these z-scores are

> pnorm(c(0, 1.875))
[1] 0.5000000 0.9696036

If you hold the covariates constant at different values, you will get different expected values for the probit, and therefore different (differences between) probabilities. 

Because SA1 is endogenous, I assume its estimated effect on emp2 is the effect of SA1's underlying latent response, but I am not certain about that. 
Reply all
Reply to author
Forward
0 new messages