logistic regression w/ Design::lrm

79 views
Skip to first unread message

daniel

unread,
Feb 25, 2010, 10:57:50 AM2/25/10
to StatForLing with R
you guys... ;)

I have a small question concerning the interpretation of a plot that
the Design package allows for logistic regression analyses

lets say we have a lrm-object that models a binary response as a
function of 3 binary predictors and call it "model"

now apparently, we can plot all logit-coefficients (turned back into
probabilities) for each level of each factor with

par(mfrow=c(1,3)) # cause we have 3 predictors
plot(model,fun=plogis, ylab="Pr(Y=1)",ylim=c(0,1))

# the crucial part is "fun=plogis"; we could also choose different
sclales, e.g. fun=exp for exponentiated logits...or logits

If I understand these plots correctly, they do in fact show Pr(Y=1|
factor_level_what_have_you)

Q1: Is this correct?
Q2: How does R know...nope, let me rephrase this...How can I get from
the changes in predicted probability (i.e. the values I get from
inv.logit(coefficients(model)) ) to these values?

(I guess the two questions are interrelated)
thanks in advance

-daniel

Stefan Th. Gries

unread,
Feb 25, 2010, 11:29:12 PM2/25/10
to statforli...@googlegroups.com
Yes, you're right, but the tricky (and sometimes treacherous) issue is
that the plotted probabilities are often those that you find when
other predictor variables are set to x (e.g., in the case of
continuous predictors, their median; cf. the sub in that kind of
plot). You would therefore have to plug the same median or factor
level into

predict(some.model, newdata=...)

to get that predicted probability.

HTH,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries
-----------------------------------------------

daniel

unread,
Feb 26, 2010, 3:40:39 AM2/26/10
to StatForLing with R
first things first:
thanks stefan! # amazing how much help you offer/month to so many
people; should refer to you as "saint stefan" really ;)

but I guess the formulation of my Q was a little clumsy;
let me see if I understand...

when I plot the effects w/

plot(model, fun=plogis, ylab="Pr(Y=1)", adj.subtitle= T)
I get to see the adjustments for each plot.

So I understand what I see in the plots is the transformed value of
"logit p", where
logit p = beta0 + beta1_level_1(of factor_1) + beta2_adjusted_level(of
factor_2) + ... + betak_adjusted_level(of factor_k)

right?

> the tricky (and sometimes treacherous) issue is
> that the plotted probabilities are often those that you find when
> other predictor variables are set to x

I understand that the adjustments are sometimes treacherous w/
continuous predictors.
But my problem can in fact be illustrated, when there is only 1
predictor.

It boils down to: How exactly do the values I get from
inv.logit(coefficients(model) [=the transformed logits] relate to the
Pr in the plot (or the output of predict.lrm() ) ?

Let me give an example: I have fit another model (model2) that
includes only one of my three predictors (for the sake of simplicity)
I can look at the transformed effect with

> inv.logit(coefficients(model2))
Intercept extern.height=ext.low
0.58880779 0.06452229

when I plot the model with

plot(model2, fun=plogis,...) # I get something like .59 for the level
"ext.low" (hard to read off the plot), so I apply
predict(model2, type="fitted.ind") # turns out the value is .588; and
the value for the complementary level is .089

all that is fine, but I dont understand how these values (.588, .089)
relate to the 0.06452229 above; or, to put it differently, what
exactly the value of inv.logit tells me...I was hoping it would be a
ratio of the two Pr ... but apparently it's not

all is good when I operate with "exp(coefficients)" and "plot(model2,
type= exp)" and then look at the predicted values (--> odds
ratio)...hold on *IT'S A BINGO*...or it might be...

is perhaps inv.logit(coefficients(model) = Pr(Y=1|levelA) / (1 +
Pr(Y=1|levelB) ? # that would make some sense ...

Stefan Th. Gries

unread,
Feb 26, 2010, 11:43:59 PM2/26/10
to statforli...@googlegroups.com
So, stuff like that is always easier to understand with an example:

##### pasteable code: beginning
# set up
rm(list=ls(all=TRUE))
library(Design)
logit<-function(x) { log(x/(1-x)) }
ilogit<-function(x) { 1/(1+exp(-x)) }

# generate random data
set.seed(1) # homogenize random values across users
z<-data.frame(xxx=factor(sample(letters[1:3], 100, replace=TRUE)),
yyy=factor(sample(letters[1:2], 100, replace=TRUE))); attach(z)
example.dd<-datadist(z); options(datadist="example.dd")

# compute bin log regr.
model.lrm <- lrm(yyy ~ xxx, data=z, x=TRUE, y=TRUE,
linear.predictors=TRUE) # model
plot(model.lrm, fun=plogis, ylab="Pr(Y=1)", adj.subtitle=T) # from your mail

# see the predicted probability of yyy being "b" for each level of xxx
table(xxx, predict(model.lrm, type="fitted.ind"))

prediction.for.xxx.a <- ilogit(sum(model.lrm$coef[1])) # prediction
for when xxx is a
prediction.for.xxx.b <- ilogit(sum(model.lrm$coef[1:2])) # prediction
for when xxx is b
prediction.for.xxx.c <- ilogit(sum(model.lrm$coef[c(1, 3)])) #
prediction for when xxx is c

# when I type
locator(3)[[2]]
# and the click into the middle of the three points, I get the
following predicted probabilities / y-values
[1] 0.7040869 0.3948993 0.5714715
##### pasteable code: end

This seem to me to be reasonably close enough (giving mouse-pointing
inaccuracy) to the three predicted values ... can you tell us what the
problem is with these data?

Cheers,

daniel wiechmann

unread,
Feb 27, 2010, 9:02:03 AM2/27/10
to statforli...@googlegroups.com
well, there is no problem with the data; numerically everything is just fine. my question is more conceptual in nature...

it concerns how we can derive the beta-coefficents w/ applied inverse logit function, i.e. the output of inv.logit(coef.model) [or your "ilogit" function)) from the actual predicted probabilities for P(Y=1|some.factor.level), i.e. the values we get from predict.lrm(model, type="fitted.ind") [or the values we get in those effect-plots when fun=plogis for that matter]. I can see how it works for odds, but have had trouble getting there when I work with Pr

lets take your example but make it even simpler (-->1 binary response)


# set up
rm(list=ls(all=TRUE))
library(Design)
logit<-function(x) { log(x/(1-x)) }
ilogit<-function(x) { 1/(1+exp(-x)) }

# generate random data
set.seed(1) # homogenize random values across users
z<-data.frame(xxx=factor(
sample(letters[1:2], 100, replace=TRUE)),
             yyy=factor(sample(letters[1:2], 100, replace=TRUE))); attach(z) # CHANGE 3 levels to 2 levels

example.dd<-datadist(z); options(datadist="example.dd")

# compute bin log regr.
model.lrm <- lrm(yyy ~ xxx, data=z, x=TRUE, y=TRUE,
linear.predictors=TRUE) # model
plot(model.lrm, fun=exp, ylab="Odds(Y=1)", adj.subtitle=T) # CHANGE scale form Pr to Odds

In this setting, the exponentiated coefficients ( the output of exp(coef(model.lrm) ) can be straightforwardly related
to the predicted values in the plot. All I need to do is divide the predicted value for the odds(Y=1|xxx=b) by the predicted value for the odds(Y=1|xxx=a) to get the (exponentiated) regression coefficient. so thats nice. but cant do the same for Pr (ceteris paribus)

so, when asked for the meaning of "exp(coef(model.lrm)" I can answer that it is the odds ratio of odds(Y=1|xxx=b) / odds(Y=1|xxx=a) and as the plot above shows these two odds. hence, I can directly relate what the plot shows me to the exponentiated model coefficient

my question was: in analogy to this, what is the meaning of "ilogit(coef(model.lrm)"; can we compute it form the predicted probabilities in the plot; and if yes, how?

but your way of getting the predicted probabilities has already helped me a lot...

many thanks
daniel

Stefan Th. Gries

unread,
Feb 27, 2010, 9:58:07 PM2/27/10
to statforli...@googlegroups.com
> it concerns how we can derive the beta-coefficents w/ applied inverse logit function, i.e. the output of inv.logit(coef.model) [or your "ilogit" function)) from the actual predicted probabilities for P(Y=1|some.factor.level)
Well, you get them with the logit function, the inverse of which is
ilogit. That is, re the probability for the alphabetically first level
(which corresponds to the intercept), you type this:

logit(0.703703696627804)

to get the intercept: 0.8649.

And re the probability for the alphabetically second level, you type this:

logit(0.703703696627804)-logit(0.394736842105285) # intercept - prob of "b"

Etc.

daniel wiechmann

unread,
Feb 28, 2010, 5:01:58 AM2/28/10
to statforli...@googlegroups.com
hm...I think we might be talking about different issues....I am sorry for being so imprecise....let me try it differently, I meant this:

for the model w/ 1 binary predictor:
given a predicted probability Pr(y=1|xxx=a) is 0.5576923 and a probability Pr(y=1|xxx=b) = 0.5208333
how can I derive the value of ilogit(coef(model.lrm)), i.e xxx=b0.4629630, using these numbers *without rescaling*?

when I rescale to odds, I can simply divide one value by the other (as the exponentiated beta coefficient is the ratio of the odds)
>prediction.for.xxx.a <- exp(sum(model.lrm$coef[1]))
1.260870
prediction.for.xxx.b <- exp(sum(model.lrm$coef[1:2]))
1.086957

the beta coefficient on the odds-scale is now simply
prediction.for.xxx.b/prediction.for.xxx.a = 1.086957 = 1.260870
which is exactly what I get from exp(coef(model.lrm)

so, the exponentiated coefficient I can get w/ odds(y=1|xxx=b)/odds(y=1|xxx=a)
but I of course cant get the value of the inverse-logit coefficient for xxx=b ( = 0.4629630) form Pr(y=1|xxx=b)/Pr(y=1|xxx=a)
nor can I just subtract the predicted Pr of one level with that of the other (this only works on the logit-scale, right? ... you can use logit(0.703703696627804)-logit(0.394736842105285) = 1.29 (which is beta-coefficient for xxx=b on a logit scale...well, the sign is differnt but OK);

but is there a direct way to relate the predicted values 0.5576923 and 0.5208333 to 0.4629630 (a way that does not involve transformation)?

I am just asking because I'd like to understand the maths and logic behind logistic regression models a little better...

I am looking for an answer like "no, you cant use only the predicted Pr values (0.5576923 and 0.5208333) to get the value for the change in predicted probability (0.4629630) without transformation"; or "yes, of course you can do it without transformation,  all you need to do is ..."

cheers,
daniel

daniel wiechmann

unread,
Feb 28, 2010, 11:39:44 AM2/28/10
to statforli...@googlegroups.com
sorry there's a couple of typos in my last post that might cause problems:


I wrote:
"the beta coefficient on the odds-scale is now simply
prediction.for.xxx.b/
prediction.for.xxx.a = 1.086957 = 1.260870
which is exactly what I get from exp(coef(model.lrm)
so, the exponentiated coefficient I can get w/ odds(y=1|xxx=b)/odds(y=1|xxx=a)"

but it should say

"the beta coefficient on the odds-scale is now simply
prediction.for.xxx.b/prediction.for.xxx.a =1.086957/1.260870 = 0.862069
, which is exactly what I get from exp(coef(model.lrm)
so, the exponentiated coefficient IS the odds(y=1|xxx=b) / odds(y=1|xxx=a)"

sorry again and cheers
daniel
Reply all
Reply to author
Forward
0 new messages