interpretation of ordinal regression coefficients

Thomas K

unread,

Feb 9, 2017, 7:09:25 AM2/9/17

to brms-users

Hi there,

thanks for this package, its a good opportunity to learn and actually do Bayesian data analysis.
I'm working with acceptability rating data (discrete from 1 - 9) and therefore need to use ordinal regression (as I understand it).
Doing so works fine but also results in some questions regarding the interpretation of the coefficients

My current brms model -- simplified for this thread -- looks like this:

brm(Rating ~ metricPredictor + (1 | subject),
family = cumulative, data = ratings)

Running this model and printing the summary gives me 8 intercepts and 1 slope parameters.
As I understood ordinal regression (mainly based on John Kruschke's DBDA book), the 8 intercepts are the thresholds of the underlying cumulative function (all prob. density below Intercept[0] is summed up to get the prob. for rating 1, all prob. density between Intercept[0] and Intercept[1] is summed and mapped to rating 2, etc.)
Since I have ratings ranging from 1 to 9, I get 8 thresholds.

Now some questions:
- are the thresholds actual intercepts of linear functions or are they named to be compatible with other models but actually they are thresholds for the pdf? I'm not sure, if this is clearly formulated, so I put the question in a different way: If the intercepts are intercepts from a linear function, I colud draw this linear function (x-axis: predictor, y-axis: rating). but why do I have more intercepts than slopes?
- I'm getting negative intercepts (aka threshold) but my rating data only allows positive ratings -- how does this fit?
- can I interpret the coefficient for metricPredictor as a slope of linear function on the same scale as the rating data? So, for example, if I have a coefficient of 3.5, can I interpret this in terms of "increasing metricPredictor by 1 unit leads to a higher rating of 3.5 (compared to the baseline)"?
- does the choice of the link function (logit by default) changes the way I should think about the coefficients? Kruschke uses in his book as explanation an underlying metric variable (see Figure 23.6, page 647, available from https://sites.google.com/site/doingbayesiandataanalysis/figures), but he explains that this underlying variable is normally distributed (and then cumulated into the distinct categories using the threshold). If I got this correct, his explanation leads to a probit link-function? Can I still use the explanation given by Kruschke if I'm using a logit function (which actually is quicker and fits the data better in terms of a pp_check).

Thanks in advance for an answer,
Thomas

Paul Buerkner

unread,

Feb 9, 2017, 12:16:31 PM2/9/17

to brms-users

Hi Thomas,

They are both thresholds and intercepts in the way that the thresholds happen to be the model intercepts in this parameterization.

In all your questions, don't forget that you are using a link function (logit by default as you correctly point out).
This basically explains question 2 and also question 3 since slopes are on the logit-scale.
More details on the parameterization in ordinal models can be found in vignette("brms_families").

When assuming a normal distribution, you have to use the probit link. Using logit implies a logistisc distribution.

Best,
Paul

Thomas K

unread,

Feb 14, 2017, 4:27:18 AM2/14/17

to brms-users

Hi Paul,

thanks for the quick reply.
I'm not sure, however, if I'm following your following comment

This basically explains question 2 and also question 3 since slopes are on the logit-scale.

I put my head around this once again and try to restate what I've understood -- and why this kind of conflicts with your comment. There is most certainly some redundancy in the lines below, sorry for that. Please indicate if my thoughts are misleading.

As far as I understood, the logistic and the normal distribution do not differ dramatically. In both ordinal regressions, they can be thought of as an metric variable that underlies the ordinal outcome. I reckon whether it is logistic or normal function does not matter much here, am I right? It's kind of choosing students' t distribution instead of the normal distribution for a robust standard linear regression? If yes, I would go for the better fitting model (which is the logistic/logit in my case).

Since for me it is easier to reason with e.g. the mean of the underlying metric variable: Is it correct, that the output of the linear function resembles this central tendency (denoted eta in your vignettes)? This central tendency should be the same parameter used as mean for the underlying normal distribution (when using the probit link) or logistic distribution (using the logit link). Thereafter, this central tendency is used to compute probabilities for each possible outcome in the data using the thresholded cumulative normal (or logistic) function.

So, if I have a nominal predictor with tho levels (or metric, looking at values 0 and 1), the central tendency for level 0 is eta = 0 + slope*0 = 0 and for level 1 it is eta = 0 + slope*1 = slope.
(Based on the vignette, there is no intercept in the model, if I got this correct, therefore I put a 0 + ... in front)
Given that this is correct, I think it is safe to say, that going from level 0 to level 1 increases the predicted data point by slope points.
Since the eta value is subtracted from the threshold values within the link function, the threshold values are also on the same scale as the underlying metric variable resembling the ordinal outcomes.
Since at the end of the day it is not legitimate to reason with the ordinal outcomes as being metric (but it's good for the intuition), the link function is needed.
The main job of the link function in this model is to bin this metric variable into the distinct outcomes. If I'm okay with thinking of the ordinal outcome as metric (e.g., rating goes up by 0.5), I would not need to transform the slope of the model using the inverse link function to get an interpretable value. If I want to report/predict the actual numbers of, say, ratings 7 or 8, I would need to take the link function and its cumulative use into account.

If you are saying the thresholds are also intercepts, I may intuitively grasps this in the following picture:
x-axis metric predictor, y-axis metric variable underlying predicted variable (distributed as either normal or logistic function), the thresholds now are the intercepts on the y-axis that determine what y-values fall in each of the distinct categories of the ordinal outcome. Since these thresholds intercept with the y-axis, they are intercepts?

Best,
Thomas

Paul Buerkner

unread,

Feb 14, 2017, 4:58:59 AM2/14/17

to brms-users

Hi Thomas,

sorry for by first reply being a bit brief, I believe your explanations are mostly correct.

Logistic and normal distribution don't differ dramatically in their shape and going for logistic is usual fine and easier / faster to fit. I wouldn't worry too much about this choice for practical purposes.

Indeed the linear function (often called linear predictor) resembles the central tendency / mean of the latent metric distribution. Note that the scale of this latent distribution is not identifable using only an ordinal outcome. Accordingly, the latent distribution is assumed to be standarized (i.e. standard normal / standard logistic), which is done implicitely by applying the link function. Please do not interprete the linear predictor as being on the scale of the ordinal ratings: The linear predictor (and the thresholds) are on the scale of the assumed latent metric variable, which is thought to have produced the ordinal outcome through categorization. Does that make sense?

I believe your picture of the intercepts to be reasonable.

Thomas K

unread,

Feb 14, 2017, 6:09:23 AM2/14/17

to brms-users

Indeed the linear function (often called linear predictor) resembles the central tendency / mean of the latent metric distribution. Note that the scale of this latent distribution is not identifable using only an ordinal outcome. Accordingly, the latent distribution is assumed to be standarized (i.e. standard normal / standard logistic), which is done implicitely by applying the link function. Please do not interprete the linear predictor as being on the scale of the ordinal ratings: The linear predictor (and the thresholds) are on the scale of the assumed latent metric variable, which is thought to have produced the ordinal outcome through categorization. Does that make sense?

It makes sense, thanks, but how could I then interpret the linear predictor/slope/thresholds as relevant for the ordinal scale (as this is what interests me in the analysis apart from the question whether the effect is positive or negative)? Maybe the following is the thing to do?:
When you are saying, the latent distribution is standardized, I'm reading this as mean = 0, SD = 1, correct? And, since slope and thresholds are on this standardized scale, they could be said to be standardized parameters.

From Kruschke's book (p. 689 / 625), I have the following equation that turns standardized parameters back to parameters in the non-standardized scale (which would be in my case a metric scale that "fits" to the ordinal scale):

b1 = b1_standardized / sd(predictor1)

Does this equation allow me to interpret the converted model parameters on the ordinal scale? And if yes, is it possible to automatically apply this re-conversion in the brms/stan model, so that it's printed in the output?

Paul Buerkner

unread,

Feb 14, 2017, 7:00:01 AM2/14/17

to brms-users

When I said standardized I just ment a scale parameter of the latent distribution fixed to one, i.e. SD = 1 in case of the probit link or scale = 1 in case of the logit link. The mean is given by threshold + linear predictor.

I believe you have one misconception. There is no natural way to interpret the obtained regression coefficients in terms of the ordinal scale without applying the link function. The equation you are referring to is something very different and is concerned with regression parameters when predictors are standarized. It cannot be applied in our case.

The regression coefficients your get for family "cumulative" are always on the latent metric scale and should be interpreted as such.

-

Thomas K

unread,

Feb 14, 2017, 8:18:08 AM2/14/17

to brms-users

Okay, almost got it ;)
The mean of the latent metric distribution is determined by thresholds and linear predictor, the scale/SD = 1. Now, whether the mean is 3.5 or 7.5 does not matter, as the ordinal outcomes are generated via categorization (which doesn't care about the absolute value of the latent distribution).
Also, whether the slope is 2.5 or 4.7 does not matter for the interpretation on the ordinal scale -- the only thing one can say is: ratings are getting higher (since slope > 0), right?

How then would one apply the link function to be able to interpret in terms of the ordinal scale?
I'm a bit familiar with logistic regression, where the regression coefficient is on the log-odds scale: negative means more of outcome A, positive more of outcome B.
Having a slope of 3.5 in the ordinal regression, could I somehow apply the logit function to get, say, the number of ratings > 6 for condition A and the number of ratings < 4 for condition B?
So that it could read in a paper like "70% of people's rating were higher than 6 for condition A whereas 65% of the ratings were lower 4 for condition (beta = 3.5)." (obviously, I'm making up numbers ...)
Otherwise, I'm stuck with saying: coefficient is unequal to zero, there is a positive/negative effect of the factor (at least this is my feeling at the moment).

Paul Buerkner

unread,

Feb 14, 2017, 9:17:40 AM2/14/17

to brms-users

"Does not matter" is not entirely correct. It does matter, but the effect of the slope (apart from the sign of course) is hard to grasp on the ordinal scale.

Here is what I ment with interpreting on the ordinal scale. Suppose the following simple model

fit <- brm(y ~ gender, mydata, family = cumulative())

Then we can generate predicted probabilities for the two categories of gender (male, female) as follows:

fitted(fit, newdata = data.frame(gender = c("male", "female")))

Thomas K

unread,

Feb 15, 2017, 6:54:54 AM2/15/17

to brms-users

Alright, this makes sense. I could then state something like "Males have a higher probability to give ratings 1 (40%) or 2 (30%) in contrast to females, who mostly gave ratings 3 (60%) or 4 (10%)", correct? (assuming the percentages come from the fitted() call)

Does it make sense to compute a difference for all probabilities for the ratings? Suppose I have four ratings and probabilities for males are
P(Y=1) = 0.4, P(Y=2) = 0.3, P(Y=3) = 0.2, P(Y=4) = 0.1, whereas probabilities for females are
P(Y=1) = 0.1, P(Y=2) = 0.2, P(Y=3) = 0.6, P(Y=4) = 0.1. The difference (male - female) then would be
P(Y=1) = 0.3, P(Y=2) = 0.1, P(Y=3) = -0.4, P(Y=4) = 0.0
And this might be more easy to interpret, like: no difference in rating 4, but females give more ratings 3, whereas males give more ratings 1 and 2.
I'm just not sure whether this subtracting is valid to do with these probabilities?

Another thing, of more importance, is how to choose a prior for the slope if the effect of the slope is not that easy to see on the ordinal scale.
(close to) real-world example: Acceptability ratings from a scale 1-9. Previous research, using t-tests showed that condition A resulted in significantly higher ratings than condition B (mean difference 3.49).
What I did until this conversation was to use this mean difference (or possibly more differences from other studies) as a prior for the slope of the ordinal regression model. Since you say that the interpretation of the slope on the ordinal scale is not straightforward, do you still think this is a valid way to set a prior? One could argue that calculation mean (difference) ratings is not a valid approach for these ordinal data, but this is what I get from the literature.
Do you think one could assume that previous research was reasoning with the latent metric variable and therefore it is reasonable to set the slope parameter according to a mean difference in rating?
The interesting thing is actually that the slopes of the ordinal regression models on replication data of the effects from the literature are in a similar magnitude as the previously reported mean rating differences -- which is why parts of my intuition about interpreting the slope as on the ordinal scale comes from.
Related to this: If I have two different ordinal models for a different effect (on the same rating scale): Could one say slope estimation 3.49 resembles a greater effect than slope value 0.3?

Paul Buerkner

unread,

Feb 15, 2017, 11:45:31 AM2/15/17

to brms-users

Subtracting probabilities is valid.

I wouldn't put a prior on the slope unless you deperately feel you need to. The results of the t-test won't tell you much what kind of prior you should put on the slope, since they are by no means of the latent scale assumed by the ordinal models.

The similar magnitude of coefficients is a coincidence from my perspective. I wouldn't start interpreting that.

If you have two ordinal models (with the same family and link function) as well as the same rating scale, you can compare coefficients.

Thomas Kluth

unread,

Feb 15, 2017, 1:05:12 PM2/15/17

to brms-...@googlegroups.com

Alright, thanks a lot for your explanations!

Thomas

> fitted(fit,newdata =data.frame(gender =c("male","female")))

> ordinal scale *without *applying the link function. The equation

> you are referring to is something very different and is
> concerned with regression parameters when predictors are
> standarized. It cannot be applied in our case.
>
> The regression coefficients your get for family "cumulative" are
> always on the latent metric scale and should be interpreted as such.
>
> -
>
> Am Dienstag, 14. Februar 2017 12:09:23 UTC+1 schrieb Thomas K:
>
>
> Indeed the linear function (often called linear
> predictor) resembles the central tendency / mean of the

> *latent* metric distribution. Note that the scale of

> this latent distribution is not identifable using only
> an ordinal outcome. Accordingly, the latent distribution
> is assumed to be standarized (i.e. standard normal /
> standard logistic), which is done implicitely by
> applying the link function. Please do not interprete the
> linear predictor as being on the scale of the ordinal
> ratings: The linear predictor (and the thresholds) are

Reply to author

Forward