Optimization With Python Book

0 views

Skip to first unread message

Patrice Mieczkowski

unread,

Aug 5, 2024, 3:12:13 AM8/5/24

to eginrapte

CTZhu's solution is elegant but it might violate the positivity constraint on the third coordinate. For gamma = 0.2 this does not seem to be a problem in practice, but for different gammas you easily run into trouble:

For other optimization problems with the same probability simplex constraints as your problem, but for which there is no analytical solution, it might be worth looking into projected gradient methods or similar. These methods leverage the fact that there is fast algorithm for the projection of an arbitrary point onto this set see _onto_the_standard_simplex.

The goal of optimization is to find the best solution to a problem out of alarge set of possible solutions. (Sometimes you'll be satisfied with finding anyfeasible solution; OR-Tools can do that as well.)

Here's a typical optimization problem. Suppose that a shipping company deliverspackages to its customers using a fleet of trucks. Every day, the company mustassign packages to trucks, and then choose a route for each truck to deliver itspackages. Each possible assignment of packages and routes has a cost, based onthe total travel distance for the trucks, and possibly other factors as well.The problem is to choose the assignments of packages and routes that has theleast cost.

One of the oldest and most widely-used areas of optimization islinear optimization(or linear programming), in which the objective function and the constraintscan be written as linear expressions. Here's a simple example of this type ofproblem.

The primary solver in OR-Tools for this type of problem is the linearoptimization solver, which is actually a wrapper for several different librariesfor linear and mixed-integer optimization, including third-party libraries.

A mixed integer optimization problem is one in which some or all of thevariables are required to be integers. An example is theassignment problem, in which a group of workers needs be assignedto a set of tasks. For each worker and task, you define a variable whose valueis 1 if the given worker is assigned to the given task, and 0 otherwise. In thiscase, the variables can only take on the values 0 or 1.

Constraint optimization, or constraint programming (CP), identifies feasiblesolutions out of a very large set of candidates, where the problem can bemodeled in terms of arbitrary constraints. CP is based on feasibility (finding afeasible solution) rather than optimization (finding an optimal solution) andfocuses on the constraints and variables rather than the objective function.However, CP can be used to solve optimization problems, simply by comparing thevalues of the objective function for all feasible solutions.

Assignment problems involve assigning a group of agents (say, workers ormachines) to a set of tasks, where there is a fixed cost for assigning eachagent to a specific task. The problem is to find the assignment with the leasttotal cost. Assignment problems are actually a special case ofnetwork flow problems.

Bin packing is the problem of packing a set of objects of different sizesinto containers with different capacities. The goal is to pack as many of theobjects as possible, subject to the capacities of the containers. A special caseof this is the Knapsack problem, in which there is just one container.

Scheduling problems involve assigning resources to perform a set of tasks atspecific times. An important example is the job shop problem, in whichmultiple jobs are processed on several machines.Each job consists of a sequence of tasks, which must be performed in a givenorder, and each task must be processed on a specific machine. The problem is toassign a schedule so that all jobs are completed in as short an interval of time as possible.

Routing problems involve finding the optimal routes for a fleet of vehiclesto traverse a network, defined by a directed graph.The problem of assigning packages to delivery trucks, described inWhat is an optimization problem ?, is one example of a routingproblem. Another is the traveling salesperson problem.

Many optimization problems can be represented by a directed graph consisting ofnodes and directed arcs between them. For example, transportation problems, inwhich goods are shipped across a railway network, can be represented by a graphin which the arcs are rail lines and the nodes are distribution centers.

In the maximum flow problem, each arc has a maximum capacity that can betransported across it. The problem is to assign the amount of goods to beshipped across each arc so that the total quantity being transported is as largeas possible.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

The minimize function provides a common interface to unconstrainedand constrained minimization algorithms for multivariate scalar functionsin scipy.optimize. To demonstrate the minimization function, consider theproblem of minimizing the Rosenbrock function of \(N\) variables:

Note that the Rosenbrock function and its derivatives are included inscipy.optimize. The implementations shown in the following sectionsprovide examples of how to define an objective function as well as itsjacobian and hessian functions. Objective functions in scipy.optimizeexpect a numpy array as their first parameter which is to be optimizedand must return a float value. The exact calling signature must bef(x, *args) where x represents a numpy array and argsa tuple of additional arguments supplied to the objective function.

The simplex algorithm is probably the simplest way to minimize a fairlywell-behaved function. It requires only function evaluations and is a goodchoice for simple minimization problems. However, because it does not useany gradient evaluations, it may take longer to find the minimum.

As an alternative to using the args parameter of minimize, simplywrap the objective function in a new function that accepts only x. Thisapproach is also useful when it is necessary to pass additional parameters tothe objective function as keyword arguments.

In order to converge more quickly to the solution, this routine usesthe gradient of the objective function. If the gradient is not givenby the user, then it is estimated using first-differences. TheBroyden-Fletcher-Goldfarb-Shanno (BFGS) method typically requiresfewer function calls than the simplex algorithm even when the gradientmust be estimated.

Here, expensive is called 12 times: six times in the objective function andsix times from the gradient. One way of reducing redundant calculations is tocreate a single function that returns both the objective function and thegradient.

When we call minimize, we specify jac==True to indicate that the providedfunction returns both the objective function and its gradient. Whileconvenient, not all scipy.optimize functions support this feature,and moreover, it is only for sharing calculations between the function and itsgradient, whereas in some problems we will want to share calculations with theHessian (second derivative of the objective function) and constraints. A moregeneral approach is to memoize the expensive parts of the calculation. Insimple situations, this can be accomplished with thefunctools.lru_cache wrapper.

The inverse of the Hessian is evaluated using the conjugate-gradientmethod. An example of employing this method to minimizing theRosenbrock function is given below. To take full advantage of theNewton-CG method, a function which computes the Hessian must beprovided. The Hessian matrix itself does not need to be constructed,only a vector which is the product of the Hessian with an arbitraryvector needs to be available to the minimization routine. As a result,the user can provide either a function to compute the Hessian matrix,or a function to compute the product of the Hessian with an arbitraryvector.

For larger minimization problems, storing the entire Hessian matrix canconsume considerable time and memory. The Newton-CG algorithm only needsthe product of the Hessian times an arbitrary vector. As a result, the usercan supply code to compute this product rather than the full Hessian bygiving a hess function which take the minimization vector as the firstargument and the arbitrary vector as the second argument (along with extraarguments passed to the function to be minimized). If possible, usingNewton-CG with the Hessian product option is probably the fastest way tominimize the function.

According to [NW] p. 170 the Newton-CG algorithm can be inefficientwhen the Hessian is ill-conditioned because of the poor quality search directionsprovided by the method in those situations. The method trust-ncg,according to the authors, deals more effectively with this problematic situationand will be described next.

The Newton-CG method is a line search method: it finds a directionof search minimizing a quadratic approximation of the function and then usesa line search algorithm to find the (nearly) optimal step size in that direction.An alternative approach is to, first, fix the step size limit \(\Delta\) and then find theoptimal step \(\mathbfp\) inside the given trust-radius by solvingthe following quadratic subproblem:

Similar to the trust-ncg method, the trust-krylov method is a methodsuitable for large-scale problems as it uses the hessian only as linearoperator by means of matrix-vector products.It solves the quadratic subproblem more accurately than the trust-ncgmethod.

This method wraps the [TRLIB] implementation of the [GLTR] method solvingexactly a trust-region subproblem restricted to a truncated Krylov subspace.For indefinite problems it is usually better to use this method as it reducesthe number of nonlinear iterations at the expense of few more matrix-vectorproducts per subproblem solve in comparison to the trust-ncg method.