# WLSMV Interpreting probit coefficients and thresholds in terms of probability

298 views

### Myriam

Jan 14, 2021, 5:08:59 PM1/14/21
to lavaan
Hello,

I am estimating a path model by using probit estimation (WLSMV).

In one equation, the dependent variable (Y1) is binary and the two (endogenous) predictors are continuous (X1 and X2). Knowing that it is a probit regression based on WLSMV and not a logit regression, my question is:

How could I interpret the results of this regression in terms of probability?

Here are the unstandardized parameters:
X1 : B1 = -.23
X2: B2 = .14
Threshold = .59

(I understand that the regression coefficients (B1 or B2) can be interpreted as the change in probit value of Y1* (latent response variable) for each 1 unit increase in X1 or X2 (respectively). I also understand that the threshold is the value of the latent response variable at which the observe binary variable switches from 0 to 1. )

Any help would be greatly appreciated!

Myriam

### Terrence Jorgensen

Jan 17, 2021, 10:45:10 PM1/17/21
to lavaan
How could I interpret the results of this regression in terms of probability?

You mean like logistic regression slopes can be converted to odds ratios?  You can't.  The effect on probability is nonlinear.
I understand that the regression coefficients (B1 or B2) can be interpreted as the change in probit value of Y1* (latent response variable) for each 1 unit increase in X1 or X2 (respectively).

There is no single number that describes the change in P(Y=1 | X) per unit change in X.  There is a linear / additive change in Y1* (probit), or in logistic regression, a linear / additive change in the logit OR a multiplicative change in odds.

If X were a grouping variable, you could calculate the expected probability in each group and calculate your own risk ratio.  Likewise, for continuous X you could choose any 2 representative values of X and calculate the expected probabilities, to calculate a risk ratio for those 2 levels of X.

I also understand that the threshold is the value of the latent response variable at which the observe binary variable switches from 0 to 1. )

Right.  So if you used the default delta parameterization, then 0.59 is a z score (because Y1* has unit variance) beyond which Y=1.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Message has been deleted

### Myriam

Jan 19, 2021, 7:30:27 PM1/19/21
to lavaan
Hi Terrence,

I really appreciate the clarifications. If you don't mind 2 follow-up questions:

Right.  So if you used the default delta parameterization, then 0.59 is a z score (because Y1* has unit variance) beyond which Y=1.

In this model I used theta parameterization, so it's the residual variance of Y1* that has unit variance, right? In that case, I imagine that the threshold cannot be interpreted as a z score, but is there another meaningful way that I can interpret the threshold besides what was described above? (i.e., the threshold is the value of the latent response variable at which the observed binary variable switches from 0 to 1).

2. Scales y*
Am I right to think that in my output, the coefficient under "Scales y*" is the estimated variance of the y* variable? (Since in theta parameterization, it's the residual variance that is fixed at 1.)

Also, just wanted to thank you for, and highlight for others in my situation, the PPT handout posted in another conversation on measurement equivalence in models with categorical indicators. Super useful to clarify identification trade-offs for thresholds/variance /intercepts and some differences between delta and theta parameterization.

Best,
Myriam

### Terrence Jorgensen

Jan 19, 2021, 11:44:48 PM1/19/21
to lavaan
In this model I used theta parameterization, so it's the residual variance of Y1* that has unit variance, right? In that case, I imagine that the threshold cannot be interpreted as a z score, but is there another meaningful way that I can interpret the threshold besides what was described above?

Yes, Y1* is simply not a z score (its SD > 1).

(i.e., the threshold is the value of the latent response variable at which the observed binary variable switches from 0 to 1).

Right, you answered your own question.  You can print the fully standardized solution (std.all) if you want to know the standardized threshold (the z score above which Y1* is classified in the higher category).  Or equivalently, Y1 = 1 when Y1* exceeds the unstandardized threshold, and the SD of Y1* is the reciprocal of its scale parameter.

2. Scales y*
Am I right to think that in my output, the coefficient under "Scales y*" is the estimated variance of the y* variable? (Since in theta parameterization, it's the residual variance that is fixed at 1.)

No, it is the reciprocal of the marginal SD.  So if the y* scale parameter is 0.5, the marginal/total SD of y* is 1/0.5 = 2.