[R] bugs and misfeatures in polr(MASS).... fixed!

1,381 views
Skip to first unread message

Mr Timothy James BENHAM

unread,
Nov 2, 2010, 8:03:10 PM11/2/10
to r-h...@r-project.org
In polr.R the (several) functions gmin and fmin contain the code

> theta <- beta[pc + 1L:q]
> gamm <- c(-100, cumsum(c(theta[1L], exp(theta[-1L]))), 100)

That's bad. There's no reason to suppose beta[pc+1L] is larger than
-100 or that the cumulative sum is smaller than 100. For practical
datasets those assumptions are frequently violated, causing the
optimization to fail. A work-around is to center the explanatory
variables. This helps keep the zetas small.

The correct approach is to use the values -Inf and Inf as the first
and last cut points. The functions plogis, dnorm, etc all behave
correctly when the input is one of these values. The dgumbel function
does not, returning NaN for -Inf. Correct this as follows

dgumbel <- function (x, loc = 0, scale = 1, log = FALSE)
{
x <- (x - loc)/scale
d <- log(1/scale) - x - exp(-x)
d[is.nan(d)] <- -Inf # -tjb
if (!log) exp(d) else d
}

The documentation states

> start: initial values for the parameters. This is in the format
> 'c(coefficients, zeta)': see the Values section.

The relevant code is

> s0 <- if(pc > 0) c(start[seq_len(pc+1)], diff(start[-seq_len(pc)]))
> else c(start[1L], diff(start))

This doesn't take the logs of the differences as required to repose
the zetas into the form used in the optimization. The fix is
obvious. polr.fit has the same problem which is responsible for
summary() frequently failing when the Hessian is not provided.

I'm not convinced the t values reported by summary() are
reliable. I've noticed that a one dimensional linear transformation
the independent variables can cause the reported t values to change by
a factor of more than 100, which doesn't seem right.

--tjb

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

tjb

unread,
Nov 6, 2010, 6:00:49 PM11/6/10
to r-h...@r-project.org

The start value generation code in polr is also known to fail quite
frequently. For example, against the Iris data as recently posted to this
list by blackscorpio ( Sep 6, 2010).

> polr(Species~Sepal_Length+Sepal_Width+Petal_Length+Petal_Width,data=iris)
Error in polr(Species ~ Sepal_Length + Sepal_Width + Petal_Length +
Petal_Width, :
attempt to find suitable starting values failed
In addition: Warning messages:
1: In glm.fit(X, y1, wt, family = binomial(), offset = offset) :
algorithm did not converge
2: In glm.fit(X, y1, wt, family = binomial(), offset = offset) :
fitted probabilities numerically 0 or 1 occurred

I suggest that simply setting the coefficients beta to zero and the
cutpoints zeta to sensible values will always produce a feasible starting
point for non-pathological data. Here is my code that does this:

if(missing(start)) {
# try something that should always work -tjb
u <- as.integer(table(y))
u <- (cumsum(u)/sum(u))[1:q]
zetas <-
switch(method,
"logistic"= qlogis(u),
"probit"= qnorm(u),
"cauchit"= qcauchy(u),
"cloglog"= -log(-log(u)) )
s0 <- c(rep(0,pc),zetas[1],log(diff(zetas)))

Using this start code the problem is not manifested.

> source('fixed-polr.R')
> polr(Species~Sepal_Length+Sepal_Width+Petal_Length+Petal_Width,data=iris)
Call:
polr(formula = Species ~ Sepal_Length + Sepal_Width + Petal_Length +
Petal_Width, data = iris)

Coefficients:
Sepal_Length Sepal_Width Petal_Length Petal_Width
-2.466724 -6.671515 9.431689 18.270058

Intercepts:
setosa|versicolor versicolor|virginica
4.080189 42.639320

Residual Deviance: 11.89857
AIC: 23.89857

My change would also likely fix the problem reported by Kevin Coombes on May
6, 2010.
--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p3030405.html
Sent from the R help mailing list archive at Nabble.com.

tjb

unread,
Nov 6, 2010, 6:12:46 PM11/6/10
to r-h...@r-project.org

Note that that the enhancements in my original post solve the unresolved
problem of Chaehyung Ahn (22 Mar 2005) whose data I reproduce:

y,x,lx
0,3.2e-02,-1.49485
0,3.2e-02,-1.49485
0,1.0e-01,-1.00000
0,1.0e-01,-1.00000
0,3.2e-01,-0.49485
0,3.2e-01,-0.49485
1,1.0e+00,0.00000
0,1.0e+00,0.00000
1,3.2e+00,0.50515
1,3.2e+00,0.50515
0,1.0e+01,1.00000
1,1.0e+01,1.00000
1,3.2e+01,1.50515
2,3.2e+01,1.50515
2,1.0e+02,2.00000
1,1.0e+02,2.00000
2,3.2e+02,2.50515
1,3.2e+02,2.50515
2,1.0e+03,3.00000
2,1.0e+03,3.00000

Using the MASS version we get

> ahn$y<-as.factor(ahn$y)
> summary(polr(y~lx,data=ahn))

Re-fitting to get Hessian

Error in optim(s0, fmin, gmin, method = "BFGS", hessian = Hess, ...) :
initial value in 'vmmin' is not finite

Whereas,

> source('fixed-polr.R')
> summary(polr(y~lx,data=ahn))

Re-fitting to get Hessian

Call:
polr(formula = y ~ lx, data = ahn)

Coefficients:
Value Std. Error t value
lx 2.421 0.8146 2.971

Intercepts:
Value Std. Error t value
0|1 0.5865 0.8118 0.7224
1|2 4.8966 1.7422 2.8106

Residual Deviance: 20.43286
AIC: 26.43286

--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p3030411.html


Sent from the R help mailing list archive at Nabble.com.

______________________________________________

ahs

unread,
Oct 18, 2012, 8:05:19 AM10/18/12
to r-h...@r-project.org
Hello,
I am trying to use this fix for the convergence problem in polr, but I don't
seem to get the change of code right. I redefine the function polr by the
lines


tjb wrote
> if(missing(start)) {
> # try something that should always work -tjb
> u <- as.integer(table(y))
> u <- (cumsum(u)/sum(u))[1:q]
> zetas <-
> switch(method,
> "logistic"= qlogis(u),
> "probit"= qnorm(u),
> "cauchit"= qcauchy(u),
> "cloglog"= -log(-log(u)) )
> s0 <- c(rep(0,pc),zetas[1],log(diff(zetas)))

but - which part should I erase? The whole old if(missing(start)) {...} or
something else or nothing? I have tried all ways I could think of.

I also can't see where the new object s0 is used in the old code?

Any input on my problem would be very much appreciated!




--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4646600.html

tjb

unread,
Oct 24, 2012, 8:28:47 AM10/24/12
to r-h...@r-project.org
Sorry, I have been away. I'll have a look at this and get back to you
tomorrow.

Tim

On Thu, Oct 18, 2012 at 10:35 PM, ahs [via R] <
ml-node+s7896...@n4.nabble.com> wrote:

> Hello,
> I am trying to use this fix for the convergence problem in polr, but I
> don't seem to get the change of code right. I redefine the function polr by
> the lines
>
> tjb wrote
> if(missing(start)) {
> # try something that should always work -tjb
> u <- as.integer(table(y))
> u <- (cumsum(u)/sum(u))[1:q]
> zetas <-
> switch(method,
> "logistic"= qlogis(u),
> "probit"= qnorm(u),
> "cauchit"= qcauchy(u),
> "cloglog"= -log(-log(u)) )
> s0 <- c(rep(0,pc),zetas[1],log(diff(zetas)))
>
> but - which part should I erase? The whole old if(missing(start)) {...} or
> something else or nothing? I have tried all ways I could think of.
>
> I also can't see where the new object s0 is used in the old code?
>
> Any input on my problem would be very much appreciated!
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4646600.html
> To unsubscribe from bugs and misfeatures in polr(MASS).... fixed!, click
> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3024677&code=dGltb3RoeS5iZW5oYW1AdXFjb25uZWN0LmVkdS5hdXwzMDI0Njc3fDE5NTE2NDMxMjk=>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




-----
Tim J. Benham
--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4647307.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

ahs

unread,
Oct 24, 2012, 8:38:00 AM10/24/12
to r-h...@r-project.org
Great!
You can skip my question about s0 though, I found where it is being used,
but I still struggle with the code and convergence problem.

I also found something here
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/CharlesDupontStuff/newPolr.R
that seems like someone tried to fix it, but with this code I get mu
starting values out of range, nor do I get it right when I try to force
starting values.

If you got your code to work fine, you might have the answer to my problem.
If you want, maybe you could post your whole *fixed-polr.R*? :-)



I also can't see where the new object s0 is used in the old code?



--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4647311.html
Sent from the R help mailing list archive at Nabble.com.

tjb

unread,
Oct 25, 2012, 8:28:10 AM10/25/12
to r-h...@r-project.org
This could well be out of date because I have not paid any attention to the
official POLR code in a year or more. Attached is fixed-polr.R.

cheers,
Tim

On Wed, Oct 24, 2012 at 10:38 PM, ahs [via R] <
ml-node+s7896...@n4.nabble.com> wrote:

> Great!
> You can skip my question about s0 though, I found where it is being used,
> but I still struggle with the code and convergence problem.
>
> I also found something here
> http://biostat.mc.vanderbilt.edu/wiki/pub/Main/CharlesDupontStuff/newPolr.R
> that seems like someone tried to fix it, but with this code I get mu
> starting values out of range, nor do I get it right when I try to force
> starting values.
>
> If you got your code to work fine, you might have the answer to my
> problem. If you want, maybe you could post your whole *fixed-polr.R*? :-)
>
>
>
> I also can't see where the new object s0 is used in the old code?
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4647311.html
fixed-polr.R (28K) <http://r.789695.n4.nabble.com/attachment/4647403/0/fixed-polr.R>




-----
Tim J. Benham
--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4647403.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

tjb

unread,
Aug 12, 2014, 9:32:08 AM8/12/14
to r-h...@r-project.org
The official maintainers were dismissive when I suggested there were some
problems I could fix with the then implementation of polr. I haven't looked
at it since, sorry.


On Tue, Aug 12, 2014 at 7:44 PM, Guido Biele [via R] <
ml-node+s7896...@n4.nabble.com> wrote:

> I modified (where neccessary) the file polr.R of the current MASS package
> (7.3-33) following the fixes in fixed-polr.R* and it is still working.
> the original polr.R file had implemented some of Tim's suggestion, but not
> the new method to generate starting values for the optimization.
>
> Does anybody know why polr was only partially "fixed"?
>
> Regards - Guido
>
> *http://r.789695.n4.nabble.com/attachment/4647403/0/fixed-polr.R
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4695392.html
-----
Tim J. Benham
--
View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4695394.html
Reply all
Reply to author
Forward
0 new messages