Nonlinear solvers for expensive functions

Adrian Ferramosca

unread,

Mar 11, 2002, 12:20:46 PM3/11/02

to

I have a multidimensional nonlinear problem to solve where the
function evaluations are very expensive and the number of variables
can be large. Therefore derivative based solvers (user-supplied or
numerical) are not an option. I am using a Wegstein type solver at
present, but are there any other solutions methods out there which can
handle these types of problems? I tried implementing GDEM from the
original paper but couldn't get it working.

Adrian

dave fournier

unread,

Mar 11, 2002, 3:33:35 PM3/11/02

to

Actually the opposite conclusion is probably true. Using the reverse mode
of automatic differentiation you can calculate the derivatives of your
funciton with
only about 4 times the nubmer of claculations as the function itself. So
give a resonably scaled problem etc. derivatvie based methods would be
the *best* first way to try and solvbe your problem.

Dave

Adrian Ferramosca wrote:

--
Dave Fournier, Otter Research Ltd
PO Box 2040, Sidney, B.C. V8L 3S3, Canada
250-655-3364
email: ot...@otter-rsch.com http://otter-rsch.com

bendel boy

unread,

Mar 11, 2002, 6:10:00 PM3/11/02

to

adr...@bjahouston.com (Adrian Ferramosca) wrote in message news:<204e7041.02031...@posting.google.com>...

Try netlib for polytope ('Simplex', Nelder-Mead, etc.) and conjugate
direction methods (Powell's method, Brent's method, Praxis, etc.).

Dr Chaos

unread,

Mar 12, 2002, 12:11:40 AM3/12/02

to

If you have an expensive to evaluate problem then using derivatives
would seem *more* important than not.

Peter Spellucci

unread,

Mar 12, 2002, 5:37:14 AM3/12/02

to

In article <204e7041.02031...@posting.google.com>,

until now i believed to know something about optimization but must state that
neither I know was a wegstein solver is nor what GDEM.
It would be a good idea not to assume that others know your slang.
you did say nothing about the nature of your function evaluations. as others
correctly stated, the use of gradient (and even Hessian) information becomes
more important then the function evaluation is costly and dimension is large.
(what is "large" for you? n=50000?)
I assume that your function evaluation come from a software (or even worse
hardware) blackbox, and in this case indeed automatic differentiation is not
possible. If you have an algorithm in the form of a subprogram, then automatic
differentiation is the way to go.
but here come some info's, which possibly might be of help
peter

Jones, Donald R.; Schonlau, Matthias; Welch, William J.
Efficient global optimization of expensive black-box functions. (English)
[J] J. Glob. Optim. 13, No.4, 455-492 (1998). [ISSN 0925-5001]

Toropov, V.V.
Multipoint approximation method in optimization problems with
expensive function values. (English)
[CA] Computational systems analysis, Proc. 4th Int. Symp. Syst. Anal.
Simulation, Berlin/Ger. 1992, 207-212 (1992).

Schagen, I.P.
Stochastic interpolation applied to the optimisation of expensive
objective functions. (English)
[CA] Computational statistics, Proc. 4th Symp., Edinburgh 1980, 362-367 (1980).

Adrian Ferramosca

unread,

Mar 12, 2002, 6:48:40 AM3/12/02

to

For a derivative based solution with 10 dimensions, I would need to
calculate 100 derivatives for a full Jacobian. Each numerical
derivative would require at least one extra function evaluation for a
perturbation from a base evaluation (for a forward difference
approximation). Therefore I would need of the order of 200 function
evaluations for each Newton step. Granted I might be able to utilize
some previous information to cut down on this. But at the moment I
use exactly one function evaluation for each iteration.

Adrian

dave fournier <otter...@home.com> wrote in message news:<3C8C9465...@home.com>...

Michal Kvasnicka

unread,

Mar 12, 2002, 8:32:03 AM3/12/02

to

See http://www.schonlau.net/space.html

Michal

"Peter Spellucci" <spel...@mathematik.tu-darmstadt.de> píąe v diskusním
příspěvku news:a6kloq$dpt$1...@sun27.hrz.tu-darmstadt.de...

David Pearson

unread,

Mar 12, 2002, 9:39:18 AM3/12/02

to Adrian Ferramosca

Adrian,

Automatic differentiation does not require multiple calls of
the function, as it does not work by perturbing it. You do have
to modify the function code though. Look here:

http://www-unix.mcs.anl.gov/autodiff/tech_reports.html

and try a Google search for "automatic differentiation".

-David Pearson.

--
David Pearson, Phone: +44 (0)118 9318741
ESSC, Fax: +44 (0)118 9316413
University of Reading, Email: dw...@mail.nerc-essc.ac.uk
Reading RG6 6AL,
UK. www.nerc-essc.ac.uk/~dwcp/Home.html

dave fournier

unread,

Mar 12, 2002, 10:27:34 AM3/12/02

to

Adrian Ferramosca wrote:

I don't think anyone can properly advise you about this until you
describe
your function to be optimized in more detail.

Dave

--
David A. Fournier
Otter Research Ltd
PO Box 265 Sidney, B.C. V8L 3S3, Canada
phone/fax: (250) 655-3364
http://otter-rsch.com

Adrian Ferramosca

unread,

Mar 12, 2002, 10:57:36 AM3/12/02

to

spel...@mathematik.tu-darmstadt.de (Peter Spellucci) wrote in message news:<a6kloq$dpt$1...@sun27.hrz.tu-darmstadt.de>...

> In article <204e7041.02031...@posting.google.com>,
> adr...@bjahouston.com (Adrian Ferramosca) writes:
> |> I have a multidimensional nonlinear problem to solve where the
> |> function evaluations are very expensive and the number of variables
> |> can be large. Therefore derivative based solvers (user-supplied or
> |> numerical) are not an option. I am using a Wegstein type solver at
> |> present, but are there any other solutions methods out there which can
> |> handle these types of problems? I tried implementing GDEM from the
> |> original paper but couldn't get it working.
> |>
> |> Adrian
>
> until now i believed to know something about optimization but must state that
> neither I know was a wegstein solver is nor what GDEM.
> It would be a good idea not to assume that others know your slang.

My apologies, I thought these terms were well-known. In the chemical
engineering industry, most flowsheeting problems are solved with an
acceleration technique such as Wegstein (Wegstein, J. H.,
'Accelerating convergence of iterative processes', Comm. ACM, 1, No 6,
9-13 (1958)).

One problem with this type of acceleration is that interaction between
variables is not accounted for. This is where, for examples, General
Dominant Eigenvalue Methods (GDEM) are useful. See: Crowe, C. M., and
Nishio, M., 'Convergence promotion in the simulation of chemical
processes - the general dominant eigenvalue method', AIChE Journal,
21, 3, May 1975, pp 528-533.

This type of solution is useful for converging system of equations
which can be expressed as x(i) = f(j)(x(i)), where repeated
substitution is the simplest method of solution. This is the form of
my system of equations. The Wegstein method of acceleration works
well, but, as it assumes that the cross dependencies of the variables
are negligible, convergence in these cases sometimes suffers.

Restructuring this system as an optimization problem, such as:
minimize f = sum (x(i) - f(j)(x(i)))
would enable generalized optimization methods to be used, however I am
concerned that derivative calculations would increase the time for
convergence quite substantially over the Wegstein solution. It is the
f(j)(x) functions which are expensive to calculate.

The f(j)(x) functions are a call to a separate (large) package for
which we have the code. Large is 5000.

Adrian

bendel boy

unread,

Mar 12, 2002, 11:58:51 AM3/12/02

to

a forward difference approximation would need 101 evaluations, not
200. central difference would need 201. If you knew something about
the structure of your equations you can reduce the number of calls. if
you knew that there was no interaction then you would need only 2
calls per differential. there are codes that attempt to estimate the
interaction - expense at the start of the solution to save on expense
later.

if you use the quasi-newton methods then you start with the identity
matrix (same as powell's method) and build up knowledge of the
Jacobian matrix. Usually (not always) gradient-based methods are
faster. But if your equations are obtained as the solution of a system
of equations in their turn (eg your system is an ODE) then
finite-difference approximation or quasi-newton approximation may
fail. under those cases gradient-free methods are to be preferred.
netlib contains codes; peter spellucci's web page on optimisation
codes contains advice.

adr...@bjahouston.com (Adrian Ferramosca) wrote in message news:<204e7041.02031...@posting.google.com>...

Peter Spellucci

unread,

Mar 12, 2002, 1:45:04 PM3/12/02

to

In article <204e7041.02031...@posting.google.com>,
adr...@bjahouston.com (Adrian Ferramosca) writes:

snip

|> My apologies, I thought these terms were well-known. In the chemical
|> engineering industry, most flowsheeting problems are solved with an
|> acceleration technique such as Wegstein (Wegstein, J. H.,
|> 'Accelerating convergence of iterative processes', Comm. ACM, 1, No 6,
|> 9-13 (1958)).
|>
|> One problem with this type of acceleration is that interaction between
|> variables is not accounted for. This is where, for examples, General
|> Dominant Eigenvalue Methods (GDEM) are useful. See: Crowe, C. M., and
|> Nishio, M., 'Convergence promotion in the simulation of chemical
|> processes - the general dominant eigenvalue method', AIChE Journal,
|> 21, 3, May 1975, pp 528-533.
|>
|> This type of solution is useful for converging system of equations
|> which can be expressed as x(i) = f(j)(x(i)), where repeated
|> substitution is the simplest method of solution. This is the form of
|> my system of equations. The Wegstein method of acceleration works
|> well, but, as it assumes that the cross dependencies of the variables
|> are negligible, convergence in these cases sometimes suffers.

|> The f(j)(x) functions are a call to a separate (large) package for
|> which we have the code. Large is 5000.
|>

if I understand you right, your original problem is to solve a
nonlinear system of equations already in fixed point form

(*) x = F (x )

where x and F both have dimension 5000 and F is evaluated in some large software
package. this evaluation is timeconsuming.
(forget automatic differentiation).
Presently you try direct iteration plus some more or less heuristic
convergence acceleration techniques from the older applied literature.

(An idea in between: maybe that system is already the transformation of
some other, say A(x)*x-b=0 ? with special properties of A? then it
might be useful to go back to the roots of the problem).

but, let us presently assume that (*) is the right way to go.
then convergence acceleration for vector valued sequences is the way to go
(obviously you try such things already).
but there is a lot of newer literature about this task, I append some
promising titles below. maybe a look at these papers give you some
promising ways to go.
peter

Rhanizar, B.
Hybrid procedures for solving some unconstrained nonlinear optimization
problems. (English)
[J] Appl. Numer. Math. 30, No.4, 459-474 (1999). [ISSN 0168-9274]
http://www.elsevier.nl/locate/apnum/
The vector $\varepsilon$-algorithm (a generalization of Aitken's
${\Delta}^2$-process) coupled with some hybrid procedures is used for solving a
convex unconstrained optimization problem.

Noda, Tatsuo
A general Steffensen iteration method for systems of nonlinear equations.
(English)
[J] Math. Jap. 46, No.1, 91-97 (1997). [ISSN 0025-5513]
The paper is concerned with the solution of a system of nonlinear
equations using a method of iteration which is considered an improvement of the
Aitken-Steffensen formula. A general Steffensen iteration method is
proposed which does not require both the partial derivatives of the nonlinear
functions and the computation of the inverse matrix. The convergence of the
method is proved and a numerical example is presented to compare the
Newton method, the Steffensen iteration method and the general Steffensen
iteration method.
[ F.Luban (Bucuresti) ]

Graves-Morris, P.R.
A new approach to acceleration of convergence of a sequence of vectors.
(English)
[J] Numer. Algorithms 11, No.1-4, 189-201 (1996). [ISSN 1017-1398]

Le Ferrand, Herve
Une generalisation au cas vectoriel du procede $\Delta\sp 2$ d'Aitken et les
suites a comportement lineaire. (A generalization to
the vector case of Aitken's $\Delta\sp 2$-process and sequences with
linear comportment). (French. English summary)
[J] RAIRO, Modelisation Math. Anal. Numer. 29, No.1, 53-62 (1995).
The author generalizes the $\varepsilon\sb 2$-transformation to the
vectorial case and proves that this generalization accelerates the convergence
of sequences which converge $\lambda I$ linearly in {\it Y. Nievergelt's}
sense [Numer. Math. 59, No. 3, 295-310 (1991; Zbl. 726.65055)].
[ A.Lopez-Carmona (Granada) ]

Brezinski, C.; Redivo-Zaglia, M.
Extrapolation methods. (English)
[J] Appl. Numer. Math. 15, No.2, 123-131 (1994).
This paper is really an introduction to the book [Extrapolation methods,
theory and practice (1991; Zbl. 744.65004)], by the same authors. The paper
begins with a discussion of terminology and notation, using Aitken's method
as an example. The sequence transformation for Aitken's algorithm is
then generalized to produce the Shanks transformation, and there is a brief
discussion of the $\varepsilon$-algorithm for computing the Shanks
transformation. This leads to a discussion of the possibility of a
universal transformation for accelerating the convergence of all convergent
sequences. The concept of remanence is introduced to discuss sets of
sequences that are non-accelerable. The paper concludes with some general
comments about extrapolation methods and their applications.
[ A.C.Genz (Pullman) ]

Graves-Morris, P.R.
Extrapolation methods for vector sequences. (English)
[J] Numer. Math. 61, No.4, 475-487 (1992).
Author's summary: An analogue of Aitken's $\Delta\sp 2$ method,
suitable for vector sequences, is proposed. Aspects of the numerical
performance
of the vector $\varepsilon$-algorithm, based on using the
Moore-Penrose inverse, are investigated. The fact that the
denominator polynomial
associated with a vector Pade approximant is the square of its equivalent
in the scalar case is shown to be a source of approximation errors.\par In
cases where the convergence of the vector sequence is dominated by
real eigenvalues, a hybrid form of the vector Pade approximant, having a
denominator polynomial of minimal degree, is proposed and its
effectiveness is demonstrated on several standard examples.
[ L.Gatteschi (Torino) ]

Nievergelt, Yves
Aitken's and Steffensen's accelerations in several variables. (English)
[J] Numer. Math. 59, No.3, 295-310 (1991).

Mitin, A.V.
Linear extrapolation in an iterative method for solving systems
of equations. (English)
[J] U.S.S.R. Comput. Math. Math. Phys. 25, No.2, 1-6 (1985).
Translation from Zh. Vychisl. Mat. Mat. Fiz. 25, No.3, 325-334
(Russian) (1985; Zbl. 583.65026)

Brezinski, C.
A subroutine for the general interpolation and extrapolation problems.
(English )
[J] ACM Trans. Math. Software 8, 290-301 (1982)..

(code is not self explaining. you need the paper to use it).

Adrian Ferramosca

unread,

Mar 13, 2002, 3:25:52 AM3/13/02

to

Excellent! I'll do some reading.
Thanks,
Adrian

Adrian Ferramosca

unread,

Mar 13, 2002, 3:27:40 AM3/13/02

to

Thanks for the info, I'll continue with Peter's suggestions in the other thread.
Adrian

sbdp...@aol.com (bendel boy) wrote in message news:<4db10a80.02031...@posting.google.com>...

David Pearson

unread,

Mar 21, 2002, 11:09:46 AM3/21/02

to Adrian Ferramosca

Adrian,

There's some stuff on this at citeseer, e.g.:

http://citeseer.nj.nec.com/5760.html

http://citeseer.nj.nec.com/booker98rigorous.html

Regards,
David.