Question about the use of CasADi

1,895 views
Skip to first unread message

Pedro Tabacof

unread,
Nov 6, 2012, 1:59:35 PM11/6/12
to casadi...@googlegroups.com
Hello,

I'm not a user yet but CasADi seems very interesting for my purposes. I'm doing parameter estimation of ODEs through non-linear optimization via the least-squares approach, approximating the Jacobian with difference quotients (forward difference). My problem is that it's taking too long when there are many parameters (due to the numerical approximation of the Jacobian, I believe).

I've been searching for AD tools and this one seems very appropriate since it's already integrated with a good ODE solver. However, I'm not sure how it would be possible to get the sensitivity analysis of the output function with respect to the parameters (supposing the output is a nonlinear function of the states and inputs).

I think maybe I could use a quadrature function for this, defining it as g = (output_function(y, u, t) - experimental_output)², so its integration would be a continuous least-squares, and I could get the gradient d_g/d_pi to be exploited by the optimizer I'm using.

Does this make sense? Do any of you see a better other approach? 

Thanks a lot,
Pedro Tabacof.
State University of Campinas, Brazil.

Joel Andersson

unread,
Nov 6, 2012, 5:10:14 PM11/6/12
to Pedro Tabacof, casadi...@googlegroups.com
Hello Pedro!

I've already answered to the Forum, but I will repost the answer here
as we are in the process of moving to google groups.

If there are very many parameters you probably want to do adjoint
sensitivity analysis. But I think I need a bit more information to be
able to help out:

Could you maybe say a bit how large your dynamic system is now; how
many states and how many parameters. Is it an ODE or are there
algebraic states as well? Do you use an off-the-shelf integrator or
did you write one yourself? Same with the nonlinear optimization - did
you take an existing implementation or did you implement it yourself?

Greetings,
Joel

2012/11/6 Pedro Tabacof <tab...@gmail.com>:
> --
>
>



--
Joel Andersson, PhD Student
Electrical Engineering Department (ESAT-SCD), Room 05.11,
K.U.Leuven, Kasteelpark Arenberg 10 - bus 2446, 3001 Heverlee, Belgium
Phone: +32-16-321819
Mobile: +32-486-672874 (Belgium) / +34-63-4408800 (Spain) /
+46-707-360512 (Sweden)

Private address: Weidestraat 5, 3000 Leuven, Belgium

Pedro Tabacof

unread,
Nov 6, 2012, 6:17:29 PM11/6/12
to Joel Andersson, casadi...@googlegroups.com
Joel, thank you for the quick answer.

I'm only dealing with regular nonlinear ODEs with around 5-50 states and 10-50 parameters which I'm trying to estimate. I'm integrating them with LSODA [1](switches between Adams-Moulton and BDF automatically - not a good implementation though) and using CMinpack's Levemberg-Marquardt [2] as the nonlinear least-squares optimizer. Since I'm numerically calculating the Jacobians (d_error/d_parameters - error = (output - experimental_output)² ), it's not as fast as I need it to be. Also, I'm dealing with real data, so I have to estimate the initial values too.

My problem is that the experimental outputs available to me are not states themselves, but rather (non)linear functions of the states and inputs. If I understood correctly, FSA will give me the sensitivity with respect to the states (for every time I choose), so I would have to use the chain rule to get the output sensitivities [3], which would be fed into the least-squares algorithm the same way I'm doing now.

On the other hand, ASA will only give the sensitivities of an integrated function over time (is this right?), so I wouldn't be able to use the Levember-Marquardt algorithm, which requires a Jacobian containing every experimental error. I thought I could use a implementation that has no such requirement (such as SQP or Ipopt) to minimize the integral of the squared error over time, the error being (output - interpolation_experimental_output)². Does this make sense to you? I'm unsure about this since I've never seen anyone minimize a continuous error, interpolated from real data.

I hope was clear, but please tell me if I wasn't.

Pedro.

--
Pedro Tabacof,
Unicamp - Eng. de Computação 08.

Joel Andersson

unread,
Nov 6, 2012, 7:22:29 PM11/6/12
to Pedro Tabacof, casadi...@googlegroups.com
Hello Pedro!

For a problem your size I would go for a collocation method and solve
with one of the NLP solvers in CasADi. Ipopt will probably be the
fastest (I'd be surprised if the solution would take much longer than
a second), but SQP methods often do a better job when solving a
sequence of related problems (hotstarting). If you want to try out an
SQP method you might want to test Worhp - academic licenses are free,
but better start out with Ipopt. You will find information in the
users' guide how to implement it and you can then go on to modify one
of the examples.

Since you have a parameter estimation problem, a Gauss-Newton
approximation of the Hessian is probably going to work fine, and might
outperform everything else, but it's probably easier just to try out a
BFGS Hessian (the default for Ipopt) or an exact (CasADi-generated)
Hessian first.

A different approach is to use an ODE integrator like you do now.
CasADi comes with the CVodes which is basically an extension of the
code you are using now, but with support for forward and adjoint
sensitivitiy analysis (with auto-generated sensitivity equations and
derivative information by CasADi). I don't have very good experiences
with embedding variable order, variable step-size integrators into
optimization problems, however. In any case, for your problem size, a
collocation method will probably work fine and might be orders of
magnitudes faster. There is a class called "Simulator" in CasADi which
does what you want, namely calculating the sensitivities of an output
function that depends on the ODE solution. Only forward sensitivities
are supported for this function though (i.e. it does all the
chain-rule stuff for you). In any case, I would not attempt adjoint
sensitivity analysis. Your problem is too small to justify this. By
the way, adjoint sensitivity analysis can be used both sensitivities
of the (functions of) states and of quadratures and CasADi supports
both.

I hope this helps!
Joel

2012/11/7 Pedro Tabacof <tab...@gmail.com>:

Pedro Tabacof

unread,
Nov 6, 2012, 7:45:59 PM11/6/12
to Joel Andersson, casadi...@googlegroups.com
Hello Joel,

Thank you very much for the explanation and the suggestions. I just have a few questions left now:

1) If I used Ipopt for my problem, what would be the loss function exactly? This is what has been puzzling me the most, as all examples I looked were only about minimizing the input (optimal control).

2) Does the collocation method work without the need of an ODE solver? I don't know much about it, so if you could point me to a reference it would be very kind. If not, then I didn't quite understand the first sentence of the third paragraph ("A different approach is to use an ODE integrator like you do now.").

3) When do you think the use of ASA is justified?

Thanks again,
Pedro.

Joel Andersson

unread,
Nov 7, 2012, 12:50:32 PM11/7/12
to Pedro Tabacof, Joel Andersson, casadi...@googlegroups.com
Hello Pedro,


Den onsdagen den 7:e november 2012 skrev Pedro Tabacof<tab...@gmail.com>:
> Hello Joel,
> Thank you very much for the explanation and the suggestions. I just have a few questions left now:
> 1) If I used Ipopt for my problem, what would be the loss function exactly? This is what has been puzzling me the most, as all examples I looked were only about minimizing the input (optimal control).

I don't think all examples minimize the control effort. Be aware that most documentation is for Python. It will work just as well from c++ but you'll need to translate it yourself. Using a different objective function is straightforward.  You can use the same objective as now, i.e. minimizing the square deviation from a reference trajectory.


> 2) Does the collocation method work without the need of an ODE solver? I don't know much about it, so if you could point me to a reference it would be very kind. If not, then I didn't quite understand the first sentence of the third paragraph ("A different approach is to use an ODE integrator like you do now.").

In collocation you "write your own" ode solver which is typically a fixed step implicit runge-kutta method. You will find a very superficial introduction to collocation in the users guide. A better introduction can be found e.g. in the book "Nonlinear Optimization " by Lorenz Biegler from 2010.


> 3) When do you think the use of ASA is justified?
When you have many more parameters and states than outputs. For example when calculating the gradient of a Lagrangian or objective function.

Greetings, Joel

Pedro Tabacof

unread,
Nov 7, 2012, 1:27:30 PM11/7/12
to Joel Andersson, casadi...@googlegroups.com
Joel, thanks again for your help.

I've started using the Python version for Windows. Now I'm trying to use the Simulator to obtain the gradients with FSA and after that I will try to use the Ipopt interface (using the collocation method as you suggested) to do the minimization. So far everything has been smooth and intuitive, congratulations on your work.

I'd like bring something up because I don't know whether it's a bug or a feature: if I instantiate an Integrator, then a Simulator (using the Integrator from before as an initialization parameter), and I evaluate the Simulator and the Integrator (on this order), the result will not be the same as only evaluating the Integrator alone. I believe that the Integrator being used inside the Simulator is a shallow copy, is this what is intended?

I hope I'm not being much of a nuisance, perhaps in the future I can contribute with something more important.

Pedro.

Joel Andersson

unread,
Nov 7, 2012, 2:59:09 PM11/7/12
to Pedro Tabacof, casadi...@googlegroups.com
Dear Pedro,

You do not need to use any integrator (or "Simulator" instance) if you
intend to do collocation.

CasADi takes shallow copies by default as you noticed. This is desired
behavior. Most CasADi classes are reference counted.

Pedro Tabacof

unread,
Nov 8, 2012, 4:37:16 PM11/8/12
to Joel Andersson, casadi...@googlegroups.com
Joel, sorry for not being clear, but I'm actually trying to do two separate things.

There is something that has come up and I can't find any fitting example: how do I make a control input change over time if it's not a regular math function (e.g. it comes from experimental data)?

Pedro.

Joel Andersson

unread,
Nov 9, 2012, 5:50:34 AM11/9/12
to Pedro Tabacof, casadi...@googlegroups.com
Dear Pedro,

You can feed the measurement values to the optimization in two different ways:

1. You can include extra variables in the NLP corresponding to the
measurements or set points and then set the upper and lower bounds
equal to the parameter value.

2. You can use the parametric NLP functionality, see e.g. the
"parametric_nlp.cpp" example.

I hope this helps,
Joel


2012/11/8 Pedro Tabacof <tab...@gmail.com>:

Pedro Tabacof

unread,
Nov 9, 2012, 7:37:51 AM11/9/12
to Joel Andersson, casadi...@googlegroups.com
Hello Joel,

How exactly would I make the control signal change inside the systems dynamics or objective function?

Let's say I have an input temperature Tin which is 25ºC for the first 10 seconds and 30ºC for the next 10 seconds. The only way I could think of implementing this was writing: 
Tin = p1*(t < 10) + p2*(t > 10)*(t < 20); p1 = 25; p2 = 30

This actually works but the problem is that the problems I'm working on have varying inputs every second for a long time (over 1000 seconds), so if I used the technique above I'd end up with an unbelievably huge expression, impossible for a human to manage.

Ideally I would have Tin = interp(t, temp_vector, time_vector) which would return the interpolated Tin given time, however I don't know how to write a symbolic interpolation function (i.e. without using conditional clauses and loops).

Thank you,
Pedro. 

Joel Andersson

unread,
Nov 10, 2012, 5:48:58 AM11/10/12
to Pedro Tabacof, casadi...@googlegroups.com
Dear Pedro,

You can have a look at the functions pw_lin and pw_const in CasADi,
but it probably won't solve your problem. Such a function would not be
smooth and hence not suitable to use inside an integrator or
derivative-based optimizer. What you need to do is to reformulate your
problem.

I would recommend you read a book about dynamic optimization before
continuing. My suggestion is "Nonlinear Programming" by Larry Biegler:

http://www.amazon.com/Nonlinear-Programming-Algorithms-Applications-Optimization/dp/0898717027

Best regards,
Joel



2012/11/9 Pedro Tabacof <tab...@gmail.com>:

Pedro Tabacof

unread,
Dec 5, 2012, 1:59:42 PM12/5/12
to casadi...@googlegroups.com, Pedro Tabacof
Hello Joel, 

Sorry for the (very) late answer, but I had finals on the past couple of weeks, so I was very busy. I got myself a copy of Biegler's book and I'm starting to read it.

I do wish to use direct collocation as you recommended on the other thread, however I'm unsure on how CasADi would handle the many (1K-50K) experimental measures I have. You told me before to read Biegler's book and try to reformulate the problem, but I found it actually shows an example where direct collocation is used within parameter estimation using real data [1]. The only difference from my problem is that only the output varies over time in his case, while in mine both the output and input changes with time.

You said interpolating the experimental data would not be smooth, but what if I used a cubic splines interpolation? Would it be hard for me to implement it in CasADi (based on pw_lin)?

Thanks again,
Pedro.

[1] Zavala, Victor M., Carl D. Laird, and Lorenz T. Biegler. "Interior-point decomposition approaches for parallel solution of large-scale nonlinear parameter estimation problems." Chemical Engineering Science 63.19 (2008): 4834-4845.

Pedro Tabacof

unread,
Dec 6, 2012, 10:26:34 AM12/6/12
to casadi...@googlegroups.com, Pedro Tabacof
Just an update, I've figured out how to model the control inputs based on your suggestion of considering them as equal boundaries, but I still don't know how write an objective function that returns the least-squares between measured and simulated output.

Do you have a suggestion on how to do that?

Pedro.

Joel Andersson

unread,
Dec 6, 2012, 10:39:11 AM12/6/12
to casadi...@googlegroups.com, Pedro Tabacof
Hello Pedro,

you need to include your measurements as parameters in the NLP. The easiest way to do this is to enlarge the NLP variable vector to include the measurements at all the time points. Then set the upper and lower bounds equal to the measurement value to fix these parameters. There is no need for `pw_lin`.

As for the objective function, you could just write it as the sum of the square deviations from the measured  and simulated states. One way of doing this is simply to write a for loop:

J = 0
for all measurements
  J = J + (simulated_measurement - estimated_measurement)^2

You can also include weighting factors. A potentially more efficient way is to use the `inner_prod` function.

deviation = simulated_measurements - estimated_measurements
J = inner_prod(deviation,deviation)

Pedro Tabacof

unread,
Dec 6, 2012, 2:45:47 PM12/6/12
to casadi...@googlegroups.com, Pedro Tabacof
Joel, thanks again, you've been very helpful.

This might seem dumb, but how exactly do I evaluate one of the system states at a specific time point? Lets say I just have one state, x, how would you write the objective (least-squares) function if you have the measurements x1 and x2?

Pedro.

Joel Andersson

unread,
Dec 6, 2012, 2:51:12 PM12/6/12
to casadi...@googlegroups.com, Pedro Tabacof
You have to discretize your problem in time. If you use a collocation approach, you have access to the full state at a set of time points. It makes sense to have some of these points coincide with your measurements.

Best, 
Joel

Pedro Tabacof

unread,
Dec 17, 2012, 10:32:26 AM12/17/12
to casadi...@googlegroups.com, Pedro Tabacof
Hello Joel,

I managed to do what you said and I'm getting very interesting results. First I was using Mumps and it was consuming too much memory and taking too long, but then I changed to MA27 and it worked just fine.

Since you helped me a lot, I decided to turn my problem into an example so future users will be more at ease when trying to do parameter estimation based on experimental data. I'm working with a highly nonlinear stiff biochemical system and using measured data from Matlab simulations (see code commentaries fore more information) and estimating 25 parameters.

I had to change a little bit of the code of the vdp_collocation.py to add parameters as part of the NLP formulation and also change the objective function, which is now the least-squares error. The output is also different, printing both the parameter estimation results and the experimental data alongside the simulated output. I commented all of my changes and additions so the code is still easy to understand.

I would be glad if this could become part of the examples folder, so if you think I should make any changes to make code more adjusted to the other examples, just tell me so. I also attached the experimental data used because the code won't run otherwise (ACADO parameter estimations examples also have separate files for the data).

Pedro.
parameter_estimation_collocation.py
parameter_estimation_data.csv

Joel Andersson

unread,
Dec 18, 2012, 4:24:57 AM12/18/12
to casadi...@googlegroups.com
Hello Pedro!

I'm happy to hear it worked out for you. And thanks for the documented example. I will add it to the examples collection. The only thing I think we might change is the generation of the perturbed parameters. With a random guess the behaviour of the script isn't deterministic.

Greetings,
Joel

Pedro Tabacof

unread,
Dec 19, 2012, 10:35:46 AM12/19/12
to casadi...@googlegroups.com
Hello,

I've corrected a small bug and also made the initial parameter vector deterministic so the result is the same everywhere. I also added five more parameters (that used to be constants), making the problem much harder (since those parameters are particularly troublesome), but now at least it's a complete case study for the yeast fermentation system.

Something that is bothering me is that when I choose a smaller discretization interval (let's say nk = 1500 instead of nk = 300), it consumes much more memory and the search is much slower (more than five times). If I keep decreasing it, the linear solver ends up crashing because it says there is not enough memory... Do you think that changing to a different linear solver could help? From Mumps to MA27 there was a significant improvement already. What if started using the C++ version directly, do you think it would consume less memory?

Thanks,
Pedro.
parameter_estimation_collocation.py

Joel Andersson

unread,
Dec 19, 2012, 2:35:20 PM12/19/12
to casadi...@googlegroups.com
Hello,

it sounds to me that your problem is tied to the linear solver in IPOPT. CasADi calls IPOPT via the C++ interface so I don't see how calling IPOPT directly could make any difference. MA27 is over 30 years old and there are other linear solvers around that are better at treating larger systems. Several such solvers have been interfaced to IPOPT. Please look at the IPOPT documentation and write to the IPOPT mailing list if you have problems there. 

On a side note: if you have a bugfix or an addition to CasADi (like here), you can always fork CasADi and make a pull request.

Joel
Reply all
Reply to author
Forward
0 new messages