interpretation of ordinal regression coefficients

1,016 views
Skip to first unread message

Thomas K

unread,
Feb 9, 2017, 7:09:25 AM2/9/17
to brms-users
Hi there,

thanks for this package, its a good opportunity to learn and actually do Bayesian data analysis.
I'm working with acceptability rating data (discrete from 1 - 9) and therefore need to use ordinal regression (as I understand it).
Doing so works fine but also results in some questions regarding the interpretation of the coefficients

My current brms model -- simplified for this thread -- looks like this:

brm(Rating ~ metricPredictor + (1 | subject),
    family = cumulative, data = ratings)

Running this model and printing the summary gives me 8 intercepts and 1 slope parameters.
As I understood ordinal regression (mainly based on John Kruschke's DBDA book), the 8 intercepts are the thresholds of the underlying cumulative function (all prob. density below Intercept[0] is summed up to get the prob. for rating 1, all prob. density between Intercept[0] and Intercept[1] is summed and mapped to rating 2, etc.)
Since I have ratings ranging from 1 to 9, I get 8 thresholds.

Now some questions:
- are the thresholds actual intercepts of linear functions or are they named to be compatible with other models but actually they are thresholds for the pdf? I'm not sure, if this is clearly formulated, so I put the question in a different way: If the intercepts are intercepts from a linear function, I colud draw this linear function (x-axis: predictor, y-axis: rating). but why do I have more intercepts than slopes?
- I'm getting negative intercepts (aka threshold) but my rating data only allows positive ratings -- how does this fit?
- can I interpret the coefficient for metricPredictor as a slope of linear function on the same scale as the rating data? So, for example, if I have a coefficient of 3.5, can I interpret this in terms of "increasing metricPredictor by 1 unit leads to a higher rating of 3.5 (compared to the baseline)"?
- does the choice of the link function (logit by default) changes the way I should think about the coefficients? Kruschke uses in his book as explanation an underlying metric variable (see Figure 23.6, page 647, available from https://sites.google.com/site/doingbayesiandataanalysis/figures), but he explains that this underlying variable is normally distributed (and then cumulated into the distinct categories using the threshold). If I got this correct, his explanation leads to a probit link-function? Can I still use the explanation given by Kruschke if I'm using a logit function (which actually is quicker and fits the data better in terms of a pp_check).

Thanks in advance for an answer,
Thomas


Paul Buerkner

unread,
Feb 9, 2017, 12:16:31 PM2/9/17
to brms-users
Hi Thomas,

They are both thresholds and intercepts in the way that the thresholds happen to be the model intercepts in this parameterization.

In all your questions, don't forget that you are using a link function (logit by default as you correctly point out).
This basically explains question 2 and also question 3 since slopes are on the logit-scale.
More details on the parameterization in ordinal models can be found in vignette("brms_families").

When assuming a normal distribution, you have to use the probit link. Using logit implies a logistisc distribution.

Best,
Paul

Thomas K

unread,
Feb 14, 2017, 4:27:18 AM2/14/17
to brms-users
Hi Paul,

thanks for the quick reply.
I'm not sure, however, if I'm following your following comment


This basically explains question 2 and also question 3 since slopes are on the logit-scale.

I put my head around this once again and try to restate what I've understood -- and why this kind of conflicts with your comment. There is most certainly some redundancy in the lines below, sorry for that. Please indicate if my thoughts are misleading.

As far as I understood, the logistic and the normal distribution do not differ dramatically. In both ordinal regressions, they can be thought of as an metric variable that underlies the ordinal outcome. I reckon whether it is logistic or normal function does not matter much here, am I right? It's kind of choosing students' t distribution instead of the normal distribution for a robust standard linear regression? If yes, I would go for the better fitting model (which is the logistic/logit in my case).

Since for me it is easier to reason with e.g. the mean of the underlying metric variable: Is it correct, that the output of the linear function resembles this central tendency (denoted eta in your vignettes)? This central tendency should be the same parameter used as mean for the underlying normal distribution (when using the probit link) or logistic distribution (using the logit link). Thereafter, this central tendency is used to compute probabilities for each possible outcome in the data using the thresholded cumulative normal (or logistic) function.

So, if I have a nominal predictor with tho levels (or metric, looking at values 0 and 1), the central tendency for level 0 is eta = 0 + slope*0 = 0 and for level 1 it is eta = 0 + slope*1 = slope.
(Based on the vignette, there is no intercept in the model, if I got this correct, therefore I put a 0 + ... in front)
Given that this is correct, I think it is safe to say, that going from level 0 to level 1 increases the predicted data point by slope points.
Since the eta value is subtracted from the threshold values within the link function, the threshold values are also on the same scale as the underlying metric variable resembling the ordinal outcomes.
Since at the end of the day it is not legitimate to reason with the ordinal outcomes as being metric (but it's good for the intuition), the link function is needed.
The main job of the link function in this model is to bin this metric variable into the distinct outcomes. If I'm okay with thinking of the ordinal outcome as metric (e.g., rating goes up by 0.5), I would not need to transform the slope of the model using the inverse link function to get an interpretable value. If I want to report/predict the actual numbers of, say, ratings 7 or 8, I would need to take the link function and its cumulative use into account.

If you are saying the thresholds are also intercepts, I may intuitively grasps this in the following picture:
x-axis metric predictor, y-axis metric variable underlying predicted variable (distributed as either normal or logistic function), the thresholds now are the intercepts on the y-axis that determine what y-values fall in each of the distinct categories of the ordinal outcome. Since these thresholds intercept with the y-axis, they are intercepts?

Best,
Thomas

Paul Buerkner

unread,
Feb 14, 2017, 4:58:59 AM2/14/17
to brms-users
Hi Thomas,

sorry for by first reply being a bit brief, I believe your explanations are mostly correct.

Logistic and normal distribution don't differ dramatically in their shape and going for logistic is usual fine and easier / faster to fit. I wouldn't worry too much about this choice for practical purposes.

Indeed the linear function (often called linear predictor) resembles the central tendency / mean of the latent metric distribution. Note that the scale of this latent distribution is not identifable using only an ordinal outcome. Accordingly, the latent distribution is assumed to be standarized (i.e. standard normal / standard logistic), which is done implicitely by applying the link function. Please do not interprete the linear predictor as being on the scale of the ordinal ratings: The linear predictor (and the thresholds) are on the scale of the assumed latent metric variable, which is thought to have produced the ordinal outcome through categorization. Does that make sense?

I believe your picture of the intercepts to be reasonable.

Thomas K

unread,
Feb 14, 2017, 6:09:23 AM2/14/17
to brms-users

Indeed the linear function (often called linear predictor) resembles the central tendency / mean of the latent metric distribution. Note that the scale of this latent distribution is not identifable using only an ordinal outcome. Accordingly, the latent distribution is assumed to be standarized (i.e. standard normal / standard logistic), which is done implicitely by applying the link function. Please do not interprete the linear predictor as being on the scale of the ordinal ratings: The linear predictor (and the thresholds) are on the scale of the assumed latent metric variable, which is thought to have produced the ordinal outcome through categorization. Does that make sense?

It makes sense, thanks, but how could I then interpret the linear predictor/slope/thresholds as relevant for the ordinal scale (as this is what interests me in the analysis apart from the question whether the effect is positive or negative)? Maybe the following is the thing to do?:
When you are saying, the latent distribution is standardized, I'm reading this as mean = 0, SD = 1, correct? And, since slope and thresholds are on this standardized scale, they could be said to be standardized parameters.

From Kruschke's book (p. 689 / 625), I have the following equation that turns standardized parameters back to parameters in the non-standardized scale (which would be in my case a metric scale that "fits" to the ordinal scale):

b1 = b1_standardized / sd(predictor1)

Does this equation allow me to interpret the converted model parameters on the ordinal scale?  And if yes, is it possible to automatically apply this re-conversion in the brms/stan model, so that it's printed in the output?
 

Paul Buerkner

unread,
Feb 14, 2017, 7:00:01 AM2/14/17
to brms-users
When I said standardized I just ment a scale parameter of the latent distribution fixed to one, i.e. SD = 1 in case of the probit link or scale = 1 in case of the logit link. The mean is given by threshold + linear predictor.

I believe you have one misconception. There is no natural way to interpret the obtained regression coefficients in terms of the ordinal scale without applying the link function. The equation you are referring to is something very different and is concerned with regression parameters when predictors are standarized. It cannot be applied in our case.

The regression coefficients your get for family "cumulative" are always on the latent metric scale and should be interpreted as such.

-

Thomas K

unread,
Feb 14, 2017, 8:18:08 AM2/14/17
to brms-users
Okay, almost got it ;)
The mean of the latent metric distribution is determined by thresholds and linear predictor, the scale/SD = 1. Now, whether the mean is 3.5 or 7.5 does not matter, as the ordinal outcomes are generated via categorization (which doesn't care about the absolute value of the latent distribution).
Also, whether the slope is 2.5 or 4.7 does not matter for the interpretation on the ordinal scale -- the only thing one can say is: ratings are getting higher (since slope > 0), right?

How then would one apply the link function to be able to interpret in terms of the ordinal scale?
I'm a bit familiar with logistic regression, where the regression coefficient is on the log-odds scale: negative means more of outcome A, positive more of outcome B.
Having a slope of 3.5 in the ordinal regression, could I somehow apply the logit function to get, say, the number of ratings > 6 for condition A and the number of ratings < 4 for condition B?
So that it could read in a paper like "70% of people's rating were higher than 6 for condition A whereas 65% of the ratings were lower 4 for condition (beta = 3.5)." (obviously, I'm making up numbers ...)
Otherwise, I'm stuck with saying: coefficient is unequal to zero, there is a positive/negative effect of the factor (at least this is my feeling at the moment).

Paul Buerkner

unread,
Feb 14, 2017, 9:17:40 AM2/14/17
to brms-users
"Does not matter" is not entirely correct. It does matter, but the effect of the slope (apart from the sign of course) is hard to grasp on the ordinal scale.

Here is what I ment with interpreting on the ordinal scale. Suppose the following simple model

fit <- brm(y ~ gender, mydata, family = cumulative())

Then we can generate predicted probabilities for the two categories of gender (male, female) as follows:

fitted(fit, newdata = data.frame(gender = c("male", "female")))

Thomas K

unread,
Feb 15, 2017, 6:54:54 AM2/15/17
to brms-users
Alright, this makes sense. I could then state something like "Males have a higher probability to give ratings 1 (40%) or 2 (30%) in contrast to females, who mostly gave ratings 3 (60%) or 4 (10%)", correct? (assuming the percentages come from the fitted() call)

Does it make sense to compute a difference for all probabilities for the ratings? Suppose I have four ratings and probabilities for males are
P(Y=1) = 0.4, P(Y=2) = 0.3, P(Y=3) = 0.2, P(Y=4) = 0.1, whereas probabilities for females are
P(Y=1) = 0.1, P(Y=2) = 0.2, P(Y=3) = 0.6, P(Y=4) = 0.1. The difference (male - female) then would be
P(Y=1) = 0.3, P(Y=2) = 0.1, P(Y=3) = -0.4, P(Y=4) = 0.0
And this might be more easy to interpret, like: no difference in rating 4, but females give more ratings 3, whereas males give more ratings 1 and 2.
I'm just not sure whether this subtracting is valid to do with these probabilities?

Another thing, of more importance, is how to choose a prior for the slope if the effect of the slope is not that easy to see on the ordinal scale.
(close to) real-world example: Acceptability ratings from a scale 1-9. Previous research, using t-tests showed that condition A resulted in significantly higher ratings than condition B (mean difference 3.49).
What I did until this conversation was to use this mean difference (or possibly more differences from other studies) as a prior for the slope of the ordinal regression model. Since you say that the interpretation of the slope on the ordinal scale is not straightforward, do you still think this is a valid way to set a prior? One could argue that calculation mean (difference) ratings is not a valid approach for these ordinal data, but this is what I get from the literature.
Do you think one could assume that previous research was reasoning with the latent metric variable and therefore it is reasonable to set the slope parameter according to a mean difference in rating?
The interesting thing is actually that the slopes of the ordinal regression models on replication data of the effects from the literature are in a similar magnitude as the previously reported mean rating differences -- which is why parts of my intuition about interpreting the slope as on the ordinal scale comes from.
Related to this: If I have two different ordinal models for a different effect (on the same rating scale): Could one say slope estimation 3.49 resembles a greater effect than slope value 0.3?

Paul Buerkner

unread,
Feb 15, 2017, 11:45:31 AM2/15/17
to brms-users
Subtracting probabilities is valid.

I wouldn't put a prior on the slope unless you deperately feel you need to. The results of the t-test won't tell you much what kind of prior you should put on the slope, since they are by no means of the latent scale assumed by the ordinal models.

The similar magnitude of coefficients is a coincidence from my perspective. I wouldn't start interpreting that.

If you have two ordinal models (with the same family and link function) as well as the same rating scale, you can compare coefficients.

Thomas Kluth

unread,
Feb 15, 2017, 1:05:12 PM2/15/17
to brms-...@googlegroups.com
Alright, thanks a lot for your explanations!

Thomas
> fitted(fit,newdata =data.frame(gender =c("male","female")))
> ordinal scale *without *applying the link function. The equation
> you are referring to is something very different and is
> concerned with regression parameters when predictors are
> standarized. It cannot be applied in our case.
>
> The regression coefficients your get for family "cumulative" are
> always on the latent metric scale and should be interpreted as such.
>
> -
>
> Am Dienstag, 14. Februar 2017 12:09:23 UTC+1 schrieb Thomas K:
>
>
> Indeed the linear function (often called linear
> predictor) resembles the central tendency / mean of the
> *latent* metric distribution. Note that the scale of
> this latent distribution is not identifable using only
> an ordinal outcome. Accordingly, the latent distribution
> is assumed to be standarized (i.e. standard normal /
> standard logistic), which is done implicitely by
> applying the link function. Please do not interprete the
> linear predictor as being on the scale of the ordinal
> ratings: The linear predictor (and the thresholds) are
> on the scale of the assumed *latent *metric variable,
> <https://sites.google.com/site/doingbayesiandataanalysis/figures>),
> but he explains that this underlying
> variable is normally distributed (and then
> cumulated into the distinct categories using
> the threshold). If I got this correct, his
> explanation leads to a probit link-function?
> Can I still use the explanation given by
> Kruschke if I'm using a logit function
> (which actually is quicker and fits the data
> better in terms of a pp_check).
>
> Thanks in advance for an answer,
> Thomas
>
>
> --
> You received this message because you are subscribed to a topic in the Google
> Groups "brms-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/brms-users/SdXC3T9U9hU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> brms-users+...@googlegroups.com
> <mailto:brms-users+...@googlegroups.com>.
> To post to this group, send email to brms-...@googlegroups.com
> <mailto:brms-...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/brms-users/5763b05c-cc24-4efa-8ea5-7ea8af98cdd3%40googlegroups.com
> <https://groups.google.com/d/msgid/brms-users/5763b05c-cc24-4efa-8ea5-7ea8af98cdd3%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.
>

--
Im Anhang mein öffentlicher Schlüssel, zum Senden verschlüsselter
Nachrichten an mich.
Anleitung für Mozilla Thunderbird:
http://www.thunderbird-mail.de/wiki/Enigmail_OpenPGP

Thomas K

unread,
Feb 21, 2017, 7:29:51 AM2/21/17
to brms-users
Just a quick follow-up:

Suppose I  do have some rating data from a previous study (same rating scale) and ran an ordinal regression on these data.
This gives me now posterior distributions for the intercepts/thresholds as well as for the slope.
I guess, I might use the slope to inform the prior for my analyses.
But can/should I also use the intercepts/thresholds? Despite the same rating scale, they do not appear to be comparable from one to the other analysis.

Paul Buerkner

unread,
Feb 21, 2017, 1:21:35 PM2/21/17
to brms-users
You may use the previous slope to inform your prior on the new slope, but make sure not to use a too restrictive prior. Also make this explicit in your paper and also run the same analysis using default priors. Otherwise reviewers may (rightly) wonder if your results are just due to your informative prior.

I would just use default priors for the thresholds (or very wide normal priors), but I would not base them on previous thresholds.

Thomas K

unread,
Mar 2, 2017, 8:12:37 AM3/2/17
to brms-users

The following is a question about interpretation of ordinal regression coefficients from an interaction model. It is related to the github issue I opened earlier this day about using marginal_effects plot for ordinal regression (https://github.com/paul-buerkner/brms/issues/190).

I do have the following ordinal regression model:

rating ~ relDistCentered * proxOrientationCentered * CoMOrientationRadian  + (1 | subject)

[Using loo() I figured out that all predictors and interactions are actually adding something useful to the model. Otherwise I'd have no trouble interpreting the coefficient from a single predictor only.]
The predictors are all continuous variables.

The estimate for the relDistCentered predictor (the one I'm actually interested in) is positive (0.17) but not credibly different from 0 (credible interval: -0.01, 0.35)
The estimates for CoMOrientationRadian and proximalOrientationRadian are both negative and credibly different from 0 (-1.62 and -11.76).
I've ignored the interaction coefficients for now (which might be not legitimate), and used the marginal_effects plot to make sense of the results.

The two negative coefficients for CoMOrientationRadian and proximalOrientationRadian tell me to expect lower ratings for higher amounts of these predictors.
This is what I see in the marginal_effects plot for proximalOrientationRadian:

https://lh3.googleusercontent.com/-kcmO-Rg7I6w/WLgZdoxjSRI/AAAAAAAAAHA/c40zxH5YKUcCN0qHn0NqTOKTBCxDwclEgCLcB/s1600/prox.png

But not in the plot for CoMOrientationRadian:

https://lh3.googleusercontent.com/-bn3kjwYSfag/WLgZ3OGsyJI/AAAAAAAAAHE/hyCqyUQzuTgrAU2Q1klI6pAl1WkRKmgrQCLcB/s1600/com.png

More surprisingly, the plot for the relDistCentered predictor looks completely different from what I expected. Due to positive coefficient I'd expect a blue line going up, so: higher ratings for higher amounts of relDistCentered. Since the coefficient is not credibly different from zero, however, I'd expect the line to be almost flat and the gray intervals to include also the possibility of a reversed effect. All in all something like I see for CoMOrientationRadian above.
But here is what I see for relDistCentered:
https://lh3.googleusercontent.com/-TP4ucsAxFsE/WLgZ_J9utnI/AAAAAAAAAHI/Ian4Vx-UN94gl4Ufb07ZmKG7fS0V_ufIwCLcB/s1600/reldist.png



Now I wondered if I totally misunderstood something here (very likely) or if there is a bug in brms that flipped the lines for relDistCentered and CoMorientation. The latter is not that likely, in particular, the values on the x-axis also do fit the label that's printed on it. This is why I guess that I'd need to include the interaction effects into my plot interpretation as well.

Paul Buerkner

unread,
Mar 2, 2017, 9:31:44 AM3/2/17
to brms-users
When marginal_effects generates plots for, say, relDistCentered, it uses the means of the other predictors to condition on (you may change the conditioning values using argument conditions).

As a result, the interaction coefficients will come into play as soon as one of the conditioned predictors has non-zero mean. Likely, this is the cause of why the plots do not intuitively match your main effects.

Thomas Kluth

unread,
Mar 3, 2017, 9:03:00 AM3/3/17
to brms-...@googlegroups.com
Alright, that helped, thanks.
This conditioning on the mean of the other predictors is not done in the
summary table, right?

So, when looking at the summary table that contains a regression
coefficient for a single predictor not credibly different from zero in
such interaction model: One could say that this predictor has no effect
"on its own" but it has an effect when combined with other predictors
(as the interaction coefficients are non-zero), correct?

Probably the same question just with different terminology: Is that what
you called a main effect truly a main effect or a simple effect if one
considers the following definition (borrowed from
http://talklab.psy.gla.ac.uk/tvw/catpred/)?:

"Put simply, in an A×B design, the simple effect of A is the effect of A
controlling for B, while the main effect of A is the effect of A
ignoring B."

Paul Buerkner

unread,
Mar 3, 2017, 9:42:49 AM3/3/17
to brms-users
No the conditioning does not happen in the summary table.

Interpreting main effects in the presense of interactions is always diffcult and I teach my students to be very careful with it. It gets easier when all predictors are centered around zero though, since then the "main effects" can be interpreted as the effect of the predictor when all other predictors are at their mean.

I think the term "on its own" might be misleading. I mean what is the mathematical translation to that?

Not sure what the difference of "controlling" and "ignoring" is in this case (I cannot load the webpage). The former would be from my understanding, when B is present in the model and "ignoring" is when B is not modeled at all. Does that match your understanding?

Thomas Kluth

unread,
Mar 3, 2017, 11:41:08 AM3/3/17
to brms-...@googlegroups.com
> No the conditioning does not happen in the summary table.
>
> Interpreting main effects in the presense of interactions is always diffcult and
> I teach my students to be very careful with it. It gets easier when all
> predictors are centered around zero though, since then the "main effects" can be
> interpreted as the effect of the predictor when all other predictors are at
> their mean.

I understand this as the following: Such "main effect" is the difference
that the value of a predictor makes on the outcome. This difference
compares whatever is "in the intercept" and what happens when we change
one (and only one) predictor. "The intercept" is the value 0.0 for all
predictors. Centering predictors then means to "put the mean of all
predictors in the intercept".
Accordingly, one could interpret such "main effect" for predictor A as
the change of the outcome variable when holding all predictors at their
mean and only changing predictor A.

If this is correct, the marginal_effects plot should follow my
intuitions about the "main effects" as in the summary table, iff I'm
centering all predictors. This is in fact true for my model.
But couldn't then the summary table said to be "conditioned on all
predictors == 0"? Depending on what 0 means for this predictor, this is
hard to interpret then, which is why interpretation with centered
variables is more easy.

>
> I think the term "on its own" might be misleading. I mean what is the
> mathematical translation to that?

So, what about (predictors all centered, coefficient A greater than 0
and credibly different from 0):

"The higher predictor A, the higher the outcome (keeping predictor B and
C constant at their mean)."

>
> Not sure what the difference of "controlling" and "ignoring" is in this case (I
> cannot load the webpage). The former would be from my understanding, when B is
> present in the model and "ignoring" is when B is not modeled at all. Does that
> match your understanding?

Here's the correct link: http://talklab.psy.gla.ac.uk/tvw/catpred/
Your interpretation could be correct. I'm not familiar with these terms
simple/main effect, they just came up during a discussion last week in
our group. As far as understood it, the main effect collapses across the
levels of all other predictors. So, no matter if predictor B has level 1
or 2, the effect of predictor A is positive.
A simple effect of A, however, would held B constant at, say, level 1,
and look if A makes a difference.
The more I think about this, the more I think your interpretation is
right: If B is not in the model at all, it is "ignored" and the summary
would give a so called "main effect". If B is modeled, however, it has
to have some value to interpret the then so-called "simple effect" of A.
My gut feeling says this is the same as the conditioning in the
marginal_effects plot and/or the question "what is in the intercept".

Paul Buerkner

unread,
Mar 3, 2017, 11:49:07 AM3/3/17
to brms-users
I think your interpretations are correct.

The explanation of "main" and "simple" effect seems reasonable, but I guess that different people might use these terms differently. Anyway, I think when centering all predictors you should be good to go with interpreting the results.
Reply all
Reply to author
Forward
0 new messages