Numerical Optimization Notes

0 views

Skip to first unread message

Octaviano Collars

unread,

Aug 4, 2024, 4:29:12 PM8/4/24

to outlitacnetz

Thisis a set of lecture notes for MATH 555, Penn State's graduate Numerical Optimization course. Numerical Optimization is the study of maximizing or minimizing functions through numerical techniques. Generally, it's rare to optimize anything other than through numerical techniques (unless of course you're talking about something really simple). Numerical optimization is used every day and is built on techniques from multi-variable calculus, optimization theory (obviously) numerical linear algebra (for algorithm efficiency) and other branches of mathematics.

The lecture notes are loosely based on Nocedal and Wright's book Numerical Optimization, Avriel's text on Nonlinear Optimization, Bazaraa, Sherali and Shetty's book on Non-linear Programming, Bazaraa, Jarvis and Sherali's book on Linear Programming and several other books. All of the books mentioned are good books (some great). The problem is, some books don't cover things in enough depth. The other problem is for students taking this course, this may be the first time they're seeing optimization, so we have to cover some preliminaries. This set of notes correct some of the problems I mention by presenting the material in a format for that can be used easily in Penn State in MATH 555. These notes are probably really inappropriate if you have a strong Operations Research program. You'd be better off reading Nocedal and Wright's book directly.

I am a Research Professor at the Applied Research Laboratory (ARL) at Penn State. In the broadest possible sense, my work is in applied math. Some of my work is on applied statistics on (real-world) dynamical systems.

This course will cover a wide range of topics in numerical optimization. The major goal is to learn a set of tools that will be useful for research in Artificial Intelligence and Computer Graphics. The course is a graduate-level course that combines instruction of basic material, written homeworks , and a final project. The course material integrates the theory of optimization and concrete real applications. Grading is based on homeworks (50%) and the final project (50%).

Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives.[1][2] It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering[3] to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.[4]

In the more general approach, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations constitutes a large area of applied mathematics.

Problems formulated using this technique in the fields of physics may refer to the technique as energy minimization,[5] speaking of the value of the function f as representing the energy of the system being modeled. In machine learning, it is always necessary to continuously evaluate the quality of a data model by using a cost function where a minimum implies a set of possibly optimal parameters with an optimal (lowest) error.

Typically, A is some subset of the Euclidean space R n \displaystyle \mathbb R ^n , often specified by a set of constraints, equalities or inequalities that the members of A have to satisfy. The domain A of f is called the search space or the choice set, while the elements of A are called candidate solutions or feasible solutions.

While a local minimum is at least as good as any nearby elements, a global minimum is at least as good as every feasible element.Generally, unless the objective function is convex in a minimization problem, there may be several local minima.In a convex problem, if there is a local minimum that is interior (not on the edge of the set of feasible elements), it is also the global minimum, but a nonconvex problem may have more than one local minimum not all of which need be global minima.

asks for the maximum value of the objective function 2x, where x may be any real number. In this case, there is no such maximum as the objective function is unbounded, so the answer is "infinity" or "undefined".

The term "linear programming" for certain optimization cases was due to George B. Dantzig, although much of the theory had been introduced by Leonid Kantorovich in 1939. (Programming in this context does not refer to computer programming, but comes from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time.) Dantzig published the Simplex algorithm in 1947, and also John von Neumann and other researches worked on the theoretical aspects of linear programming (like the theory of duality) around the same time.[7]

Adding more than one objective to an optimization problem adds complexity. For example, to optimize a structural design, one would desire a design that is both light and rigid. When two objectives conflict, a trade-off must be created. There may be one lightest design, one stiffest design, and an infinite number of designs that are some compromise of weight and rigidity. The set of trade-off designs that improve upon one criterion at the expense of another is known as the Pareto set. The curve created plotting weight against stiffness of the best designs is known as the Pareto frontier.

A design is judged to be "Pareto optimal" (equivalently, "Pareto efficient" or in the Pareto set) if it is not dominated by any other design: If it is worse than another design in some respects and no better in any respect, then it is dominated and is not Pareto optimal.

The choice among "Pareto optimal" solutions to determine the "favorite solution" is delegated to the decision maker. In other words, defining the problem as multi-objective optimization signals that some information is missing: desirable objectives are given but combinations of them are not rated relative to each other. In some cases, the missing information can be derived by interactive sessions with the decision maker.

Optimization problems are often multi-modal; that is, they possess multiple good solutions. They could all be globally good (same cost function value) or there could be a mix of globally good and locally good solutions. Obtaining all (or at least some of) the multiple solutions is the goal of a multi-modal optimizer.

Classical optimization techniques due to their iterative approach do not perform satisfactorily when they are used to obtain multiple solutions, since it is not guaranteed that different solutions will be obtained even with different starting points in multiple runs of the algorithm.

The satisfiability problem, also called the feasibility problem, is just the problem of finding any feasible solution at all without regard to objective value. This can be regarded as the special case of mathematical optimization where the objective value is the same for every solution, and thus any solution is optimal.

Many optimization algorithms need to start from a feasible point. One way to obtain such a point is to relax the feasibility conditions using a slack variable; with enough slack, any starting point is feasible. Then, minimize that slack variable until the slack is null or negative.

The extreme value theorem of Karl Weierstrass states that a continuous real-valued function on a compact set attains its maximum and minimum value. More generally, a lower semi-continuous function on a compact set attains its minimum; an upper semi-continuous function on a compact set attains its maximum point or view.

One of Fermat's theorems states that optima of unconstrained problems are found at stationary points, where the first derivative or the gradient of the objective function is zero (see first derivative test). More generally, they may be found at critical points, where the first derivative or gradient of the objective function is zero or is undefined, or on the boundary of the choice set. An equation (or set of equations) stating that the first derivative(s) equal(s) zero at an interior optimum is called a 'first-order condition' or a set of first-order conditions.

While the first derivative test identifies points that might be extrema, this test does not distinguish a point that is a minimum from one that is a maximum or one that is neither. When the objective function is twice differentiable, these cases can be distinguished by checking the second derivative or the matrix of second derivatives (called the Hessian matrix) in unconstrained problems, or the matrix of second derivatives of the objective function and the constraints called the bordered Hessian in constrained problems. The conditions that distinguish maxima, or minima, from other stationary points are called 'second-order conditions' (see 'Second derivative test'). If a candidate solution satisfies the first-order conditions, then the satisfaction of the second-order conditions as well is sufficient to establish at least local optimality.

For unconstrained problems with twice-differentiable functions, some critical points can be found by finding the points where the gradient of the objective function is zero (that is, the stationary points). More generally, a zero subgradient certifies that a local minimum has been found for minimization problems with convex functions and other locally Lipschitz functions, which meet in loss function minimization of the neural network. The positive-negative momentum estimation lets to avoid the local minimum and converges at the objective function global minimum.[8]

Further, critical points can be classified using the definiteness of the Hessian matrix: If the Hessian is positive definite at a critical point, then the point is a local minimum; if the Hessian matrix is negative definite, then the point is a local maximum; finally, if indefinite, then the point is some kind of saddle point.