Looking for tips on how to reduce NLP function evaluation time?

2,354 views
Skip to first unread message

Adam Hall

unread,
Dec 7, 2021, 3:54:05 PM12/7/21
to CasADi
I am running a constrained Nonlinear MPC, using the opti framework and IPOPT solver. I've been finding that the NLP function evaluations are always 10 to 100 times higher than the actual IPOPT solving time, and don't fully understand why it is so slow.

If I understand correctly, I can see the majority of time is spent calcuating the jacobian nlp_jac_g, as well as compting nlp_hess_l, but I don't know what nlp_hess_l is.


Screenshot from 2021-12-03 01-34-45.png


Some things to note about my setup:
1. I am using the Opti stack because its easy! However, is it generally slower?

2. My dynamics are an integration of of nonlinear continous time dynamics, using a casadi integrator. I am using something like 

discrete_dyn = cs.integrator( 'discrete_dynamics', 'cvodes', { 'x': delta_x, 'p': delta_u, 'ode':continous_dynamics,}, {'tf': dt}

where continous_dynamics is a casadi function for the continuous time dynamics.
I suspect this is the source of the longer computation times since it doesn't seem like the cleanest way to get the discrete dynamics. These would then become the dynamics rollout constraints using something like the following.

for i in range(horizon_length):
    next_state = discrete_dyn(x0=x_var[:, i], p=u_var[:,i])['xf']
    opti.subject_to(x_var[:, i + 1] == next_state)

Any thoughts on how to do this more efficiently other than writing out the discrete dynamics explictly?

3. I specficy some of the state and input constraints using casadi functions, e.g.,
opti.subject_to(state_constraint(x_var + <a numpy array> ) < 0)
where state_constraint is a linear function in the form of Ax-b. I'm assuming this is an okay way to represent linear constraints

4. I am using mostly MX variables. I don't really understand when to use SX vs MX.

Any advice on how to reduce my function evaluation times would be greatly appreciated!

Aaron Ray

unread,
Dec 8, 2021, 3:38:32 PM12/8/21
to CasADi
How large is the problem instance you are trying to solve? What is the target rate for running the MPC?

I agree that there is a good chance the slowdown comes from that integrator, as it's a somewhat complex variable step size integrator. It's probably best to avoid the variable step sized integrator unless you actually need it for your problem. You should at least benchmark a simpler integration scheme, most easily an euler step (x_next = x + continuous_dynamics(x, u) * dt) for a lower bound on the time, and ideally use something like RK4 (https://github.com/casadi/casadi/blob/8d0f80a4d0fe2054384bfb9748f7a0f6bae540ff/docs/examples/python/direct_multiple_shooting.py#L48).

I believe there is a small amount of Opti overhead, but the bigger issue it causes is that I don't think you can compile the resulting solver. If you use the lower-level solver interface, you can compile your solver along the lines of https://web.casadi.org/docs/#generating-c-code, and this gives a nice speedup. You can also get some speedups in general using SX instead of MX, but you lose a little bit of expressivity so I wouldn't worry about that for now.

Finally, at a high level, I should note that Casadi / IPOPT really shines when you want to use a collocation method instead of a multiple shooting method. Shooting methods constrain the solution according to the discrete dynamics between two states (like you have). Collocation methods represent the trajectory with (piecewise) polynomials and constrain the polynomials to match the system dynamics at certain points. Collocation methods result in more variables, but a sparser Hessian, which can lead to performance gains. Interior point methods (like IPOPT) are good at solving large, sparse problems. For denser problems especially, specialized MPC libraries like ACADOS are likely to be easier to tune for performance (at the expense of a more difficult, less general interface than Casadi has). Multiple Shooting: https://github.com/casadi/casadi/blob/master/docs/examples/python/direct_multiple_shooting.py#L48 Collocation: https://github.com/casadi/casadi/blob/master/docs/examples/python/direct_collocation.py

The extent to which you should worry about the code generation and problem transcription nuances depends heavily on how fast you need the code to run and how large your problem is.

Adam Hall

unread,
Dec 29, 2021, 5:41:34 PM12/29/21
to CasADi

Thank you so much for your response! I am trying to implement a gaussian process MPC (GPMPC) for a 2D quadrotor (vertical plane). This has 6 states, with 2 inputs, with input and state constraints. It works, but its just really slow. Generally, I have been limiting myself to a control frequency of 10 Hz with a horizon of 1 second (10 steps), but even this takes over a second to compute each step, which is nowhere close to realtime. I’ve noticed now this is mostly due to the large number of kernel points I have been using. I don’t require realtime, but it would be nice if it wasn’t so slow, and additionally the fact that  its so slow makes me think I am doing something wrong.

For just nonlinear MPC on the quadrotor (no gaussian process) on a stabilization task (moving from rest at one point to a hover at another), changing the integration from using a casadi integrator (cvodes or rk) into just a custom RK gave me a 10-20x speed up (still multiple shooting formulation)! I essentially discretized my CT dynamics using 

X = cs.SX.sym('X', n)

U = cs.SX.sym('U', m)

# Runge-Kutta 4 integration

k1 = f(X, U)

k2 = f(X+dt/2*k1, U)

k3 = f(X+dt/2*k2, U)

k4 = f(X+dt*k3, U)

x_next = X + dt/6*(k1+2*k2+2*k3+k4)

rk_dyn = cs.Function('rk_f', [X, U], [x_next], ['x0', 'p'], ['xf'])

which is similar to what is done in the multiple_shooting link you sent.

Unfortunately, this doesn't really help my GPMPC application. Upon further exploration, I think the problem lives in just evaluating the hessian and jacobians of the GP. Using anything more than say 50 kernel points seems to result in long (very not real time) computation times, and the approximations I have been trying to use of the GP result in poor performance due to the sensitivity of the state to the inputs (the quad we are simulating has a very small inertia, so small input differences lead to pretty large angular accelerations). I’m not really sure how to debug casadi and understand what is causing the hessian and jacobian calculations to be so slow. Do you know how I might be able to debug this further? I know it uses autodiff, but maybe if I supplied analytical expressions for them it could be faster?

Next, I tried implementing a direct collocation method for the nonlinear mpc. I chose to implement both a Hermite-simpson Collocation (compressed and separated forms). I used the following resources (for others looking into this):

I found Bett's book to be the most understandable. Also, for others seeing this who don't know anything about direct collocation (like me), I had trouble understanding exactly how the direct_collocation.py casadi example worked. Essentially, they are interpolating using lagrange polynomials at Legendre-Gauss interpolation points (See the Biegler book, table 10.1). This is a special form of polynomial interpolation using orthogonal polynomials. This selection of interpolation allows for more accurate, and potentially exact, approximation, without adding too much complexity. I chose to use the hermite-simpson method because its simpler.

However, when I implemented this, I found the solutions to go unstable really quickly, and I could only get half decent results if I increased my control frequency to higher values (like 100 Hz or more, still with a 1sec+ time horizon), but would take a second or longer to solve each iteration. Is this something people see often, or is there some major issue in my direct collocation scheme? The quadrotor we are considering has a very small inertia, and thus tiny differences in the inputs create large angular accelerations (highly unstable).

For my current application, generating C code is a bit overkill, and although ACADOS looks helpful, I don't think we can use it at the moment.

Thanks again!

Dominik

unread,
Dec 29, 2021, 9:08:47 PM12/29/21
to CasADi
Some time ago I also struggled with getting an MPC (Runge Kutta 6th order and 40 horizon) to run in realtime. nlp_hess_l and nlp_jac_g were also taking way too much time in my case.
After trying many different things the following actually cut the time from 900 ms to 60 ms for me.

opts_dict["expand"] = 1

Also I was not using the opti stack, but I am not sure if that makes any difference in the end.

Maybe that is helpful to you! :)

sandeep kumar r

unread,
Dec 29, 2021, 11:28:34 PM12/29/21
to Dominik, CasADi
Dear Dominic,

You can use Gauss Newton approximation for the Hessian to reduce computational time, however convergence will be linear. The jacobian time can reduced using parallel calls to the rk45 integrator. Have a look at map( ) while implementing the shooting methods. 

Regards
Sandeep

--
Sent from CasADi's user forum at http://forum.casadi.org.
---
You received this message because you are subscribed to the Google Groups "CasADi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casadi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/casadi-users/4af0e107-a31f-438f-8c10-3e2625d5dbbcn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages