[R] Safeguarded Newton method for function minimization

J C Nash

unread,

Apr 18, 2017, 1:33:12 PM4/18/17

to r-help

Recently Marie Boehnstedt reported a bug in the nlm() function for function minimization
when both gradient and hessian are provided. She has provided a work-around for some
cases and it seems this will get incorporated into the R function eventually.

However, despite the great number of packages on CRAN, there does not appear to be a
straightforward Newton approach to function minimization. This may be because providing
the code for a hessian (the matrix of second derivatives) is a lot of work and error-prone.
(R could also use some good tools for Automatic Differentiation). I have also noted that
a number of researchers try to implement textbook methods and run into trouble when maths
and computing are not quite in sync. Therefore, I wrote a simple safeguarded Newton and
put a small package on R-forge at

https://r-forge.r-project.org/R/?group_id=395

Note that Newton's method is used to solve nonlinear equations. In fact, for function
minimization, we apply it to solve g(x) = 0 where g is the gradient and x is the vector
of parameters. In part, safeguards ensure we reduce the function f(x) at each step to avoid
some of the difficulties that may arise from a non-positive-definite hessian.

In the package, I have a very simple quadratic test, the Rosenbrock test function and
the Wood test function. The method fails on the last function -- the hessian is not
positive definite where it stops.

Before submitting this package to CRAN, I would like to see its behaviour on other
test problems, but am lazy enough to wish to avoid creating the hessian code. If anyone
has such code, it would be very welcome. Please contact me off-list. If I get some workable
examples that are open for public view, I'll report back here.

John Nash

______________________________________________
R-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Martin Maechler

unread,

Apr 19, 2017, 8:57:03 AM4/19/17

to J C Nash, r-help

>>>>> J C Nash <profj...@gmail.com>

>>>>> on Tue, 18 Apr 2017 13:32:52 -0400 writes:

> Recently Marie Boehnstedt reported a bug in the nlm()
> function for function minimization when both gradient and
> hessian are provided.

Indeed, on R's Bugzilla here :
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249

> She has provided a work-around for some cases and it seems
> this will get incorporated into the R function eventually.

indeed.... the first part -- fixing the wrong choldc() -- in the C code has
been in my version of R-devel for a while now.

See my follow up
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4 and #c5

including my attachment which is an extended version of Marie's original :
https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246

As that mentions _and_ shows: Fixing choldc() solves the problem
for the 2D Rosenbrook example, but _not_ the 4D Wood example.

> However, despite the great number of packages on CRAN,
> there does not appear to be a straightforward Newton
> approach to function minimization. This may be because
> providing the code for a hessian (the matrix of second
> derivatives) is a lot of work and error-prone.

> (R could also use some good tools for Automatic Differentiation).

The last part of what you say above is not at all true:
R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and deriv3()
and my attachment above _shows_ you how to use deriv3() to get
both the gradient and the hessian via a version of automatic
differentiation completely effortlessly !!

For ease of readers, that part here, with an example:

##' Wood function (4 arguments 'x1' ... 'x4')
fwood <- function(x1,x2,x3,x4) {
100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 +
10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4)
}
## automatically construct correct gradient and hessian:
woodf.gh <- function(x) {
stopifnot(is.numeric(x))
woodGH <- deriv3(body(fwood)[[2]],
c("x1","x2","x3","x4"), function.arg=TRUE)
if(length(x) == 4)
woodGH(x[1],x[2],x[3],x[4])
else if(is.matrix(x) && ncol(x) == 4)
woodGH(x[,1], x[,2], x[,3], x[,4])
else stop("'x' must have length 4 or be a matrix with 4 columns")
}

and now get both the function f(x), gradient g(x) and Hessian H(x)
for
x = (0 0 0 0),
x = (1 1 1 1), and
x = (1 2 3 4)

with such a simple calle :

> woodf.gh(rbind(0, 1, 1:4))
[1] 42.0 0.0 2514.4
attr(,"gradient")
x1 x2 x3 x4
[1,] -2 -40.0 -2 -40.0
[2,] 0 0.0 0 0.0
[3,] -400 279.6 5404 -819.6
attr(,"hessian")
, , x1

x1 x2 x3 x4
[1,] 2 0 0 0
[2,] 802 -400 0 0
[3,] 402 -400 0 0

, , x2

x1 x2 x3 x4
[1,] 0 220.2 0 19.8
[2,] -400 220.2 0 19.8
[3,] -400 220.2 0 19.8

, , x3

x1 x2 x3 x4
[1,] 0 0 2 0
[2,] 0 0 722 -360
[3,] 0 0 8282 -1080

, , x4

x1 x2 x3 x4
[1,] 0 19.8 0 200.2
[2,] 0 19.8 -360 200.2
[3,] 0 19.8 -1080 200.2

J C Nash

unread,

Apr 19, 2017, 10:06:34 AM4/19/17

to Martin Maechler, r-help

I should have given a more detailed explanation about Automatic Differentiation.
As Martin points out, there are some AD elements in deriv etc., but to my view
they are related more to symbolic differentiation, and not in the same space
as AD Model Builder or its successor whose name eludes me at the moment.
Nor like ADIFOR or ADOLC that can provide the derivative of a subprogram.

However, Martin's example of the Wood function shows what can be done, and this
is a valuable case. Indeed it shows a syntax of which I was previously unaware!
Because it is fairly complicated, I'm not sure I would say "effortlessly", but I
will certainly be copying it. Thanks. Previously, I've done the steps to get the
function partly manually to get to the functional form, as in Chapter 10 of my
Nonlinear Parameter Optimization book.

Note that there are functions all in R that are similar to the deriv() and deriv3()
in package nlsr by Duncan Murdoch and me. There is a facility to extend the derivatives
table to new functions. We can also generate the R gradient functions from expressions.

I'll take this chance to note that the Wood problem is NOT solved by nlm() with
analytic hessian, nor by a safeguarded or unsafeguarded Newton. And I ran a version
of the Fortran code UNCMIN that is the origin of nlm() and got a similar early termination.
But the hessian when these codes halt is not positive definite, so we are outside
the assumed conditions for these methods. Dealing with a non-pos-def hessian in
function minimization has created a small industry in Ph Ds for researchers in
optimization.

Thanks again for Martin's work.

JN

Reply all

Reply to author

Forward