Ordering factors in a logistic regression model

909 views
Skip to first unread message

Nana Kodee

unread,
Jan 22, 2014, 6:01:58 PM1/22/14
to meds...@googlegroups.com

   Hello All,
I will be happy if someone could explain why my logistic regression output changes if I order one of the predictors that has four levels.
 I used this code to order my 4 factor levels:
 
dat$sch.ordered=factor( dat$schools, levels=c("grp1","grp2","grp3","grp4"),ordered=T)

 model1=glm( y~X1+X2+X3+sch.ordered,data=dat,family=binomial(link="logit"))

model2=glm( y~X1+X2+X3+schools,data=dat,family=binomial(link="logit"))

The only difference between model1 and model2 is in model 1 schools are ordered while in model 2 schools are not ordered.
Now my outputs to these two models are different and want to know why?

Thank you.

Naaaaa

Swank, Paul R

unread,
Jan 22, 2014, 6:15:33 PM1/22/14
to meds...@googlegroups.com, meds...@googlegroups.com
When a predictor is categorical (not ordered) the differences between adjacent categories can be anything but when it is ordered, the differences between adjacent categories at assumed equal. Comparing these two models is a test of deviations from linearity.

Paul swank

Sent from my iPad
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com<mailto:MedS...@googlegroups.com> .
MedStats' home page is http://groups.google.com/group/MedStats<https://urldefense.proofpoint.com/v1/url?u=http://groups.google.com/group/MedStats&k=yYSsEqip9%2FcIjLHUhVwIqA%3D%3D%0A&r=MsTc3vpmGdusVsnu9TIMrF5Pxq5xArSzq1fV4Jqks9o%3D%0A&m=5yBThVQN3pCaQXdkQNXiG164cSyrKSjGkKSBY3YswEE%3D%0A&s=6381c27070929c8e6947d09fc1840094b90cac7cd28213c8665d3d2d8a705a38> .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules<https://urldefense.proofpoint.com/v1/url?u=http://groups.google.com/group/MedStats/web/medstats-rules&k=yYSsEqip9%2FcIjLHUhVwIqA%3D%3D%0A&r=MsTc3vpmGdusVsnu9TIMrF5Pxq5xArSzq1fV4Jqks9o%3D%0A&m=5yBThVQN3pCaQXdkQNXiG164cSyrKSjGkKSBY3YswEE%3D%0A&s=4766ef7ce03583ef164d88d12f518c1a5ce6aed2ff5ede88a8a7e56cab2797ac>

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com<mailto:medstats+u...@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out<https://urldefense.proofpoint.com/v1/url?u=https://groups.google.com/groups/opt_out&k=yYSsEqip9%2FcIjLHUhVwIqA%3D%3D%0A&r=MsTc3vpmGdusVsnu9TIMrF5Pxq5xArSzq1fV4Jqks9o%3D%0A&m=5yBThVQN3pCaQXdkQNXiG164cSyrKSjGkKSBY3YswEE%3D%0A&s=1c39391fd3f8d8d6b07af9cdcdec06ba4d043ca4eef16e2a66ce1d48403c0e5d>.

Marc Schwartz

unread,
Jan 22, 2014, 7:04:27 PM1/22/14
to MedStats MedStats
In R, the default contrast for a K-level factor are K - 1 "treatment" contrasts, where there is a reference level (by default, the first lexically sorted level) and all other levels are compared to it.

When you define a K-level ordered factor, then orthogonal polynomial contrasts are used, resulting in a different contrast matrix. In your example above, with 4 levels, you will end up with a linear (.L), quadratic (.Q) and cubic (.C) contrasts.

You can see the difference in how the contrast coding is done:

> contrasts(factor(c("grp1","grp2","grp3","grp4")))
grp2 grp3 grp4
grp1 0 0 0
grp2 1 0 0
grp3 0 1 0
grp4 0 0 1


> contrasts(factor(c("grp1","grp2","grp3","grp4"), ordered = TRUE))
.L .Q .C
[1,] -0.6708204 0.5 -0.2236068
[2,] -0.2236068 -0.5 0.6708204
[3,] 0.2236068 -0.5 -0.6708204
[4,] 0.6708204 0.5 0.2236068


It all comes down to what hypotheses you want to test, given your data.

There is a page at UCLA that might be helpful:

http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm


Regards,

Marc Schwartz

BXC (Bendix Carstensen)

unread,
Jan 22, 2014, 9:34:47 PM1/22/14
to meds...@googlegroups.com
First you should check that you have what you think:

with( dat, table(schools,sch.ordered) )

What are the residual deviances for the two models?

Try for example:

anova( model1, model2, test="Chisq" )

My suspicion is that you merely have a reparametrization. And what that is depends on the values of

options( "contrasts" )

It's considered good manners to identify yourself.

Best regards
Bendix Carstensen
______________________________________________

Bendix Carstensen
Senior Statistician
Epidemiology
Steno Diabetes Center A/S
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk http://BendixCarstensen.com
www.steno.dk
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nana Kodee

unread,
Jan 23, 2014, 9:48:07 AM1/23/14
to meds...@googlegroups.com
Hello all,
Thanks so much all for the responses and links, they were all helpful.

A follow up questions that I might need help is : in the same model if I want x1 (continuous variable) interaction with school added to the model as

model.int=glm( y~X1+X2+X3+schools+X1:schools,data=dat,family=binomial(link="logit"))

 Is this the same as subsetting the data into the four levels of the school and fitting them individually as follows:

school1: model1=glm( y~X1+X2+X3,data=dat.school1, family=binomial(link="logit"))
school2: model2=glm( y~X1+X2+X3,data=dat.school2, family=binomial(link="logit"))
school3: model3=glm( y~X1+X2+X3,data=dat.school3, family=binomial(link="logit"))
school4: model4=glm( y~X1+X2+X3,data=dat.school4, family=binomial(link="logit"))
If is not the same what mistake would I be committing by doing it this way.

A friend suggested this method instead of interaction as previously fitted above and want to know if this is the practice.
If it is right how do you compare school 1 and school 2 effects..........with overlap of confidence intervals or ...? Your comments would
be very much appreciated.

Thank you.

Naa

BXC (Bendix Carstensen)

unread,
Jan 23, 2014, 9:58:44 AM1/23/14
to meds...@googlegroups.com
No, this will give you different results, because the splitting of the data in 4 chunks corresponds to the analysis where you have school interacting with X2 and X3 too.

Best
Bendix Carstensen

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of Nana Kodee
Reply all
Reply to author
Forward
0 new messages