Code Generation

Chen Yutao

unread,

Sep 28, 2016, 7:33:09 AM9/28/16

to CasADi

Hi all,

I've asked similar questions but I need more suggestions about the code generation of Casadi.

I'm using MPC to control a very complex model with a long prediction horizon. The model is not big (17 states and 3 controls) but is very complex (a lot of nonlinear functions, including sin, cos, exp, power, atan and etc).

If I'm right, c source cannot be generated using general integrator (CVodes, implicit Runge Kutta) in Casadi. I choose to use explicit Runge Kutta integrator, which is explained in Casadi examples.

The problem is, if I use SX as the basic variable type, the compilation time is super long and the memory required is super big (often > 10Gb) . If I use MX as the basic type, the compiler reports an error saying that there are too many loops in the c source file.

Is there any suggestions on how can I improve this problem? either to change the integrator, or to use some tricks for compiling

Thanks a lot

Yutao

Chen Yutao

unread,

Sep 28, 2016, 7:35:36 AM9/28/16

to CasADi

Forget to say, I want to generate the Jacobian of the integration funciton. The function evaluation itself is easy to compile

Joel Andersson

unread,

Sep 28, 2016, 9:39:21 AM9/28/16

to CasADi

Hi Yutao,

Try to use a combination of SX and MX. That is, create a function using SX for the ODE right hand side. Then another function - using MX - for one step of the RK4 scheme (i.e. with four calls to the ODE rhs function). If your time horizon is very long, say 10000, you can create one function for 100 steps and then have this called 100 times. This will limit the memory use when calculating derivatives.

Best regards,

Joel

Chen Yutao

unread,

Sep 28, 2016, 10:53:07 AM9/28/16

to CasADi

Hi Joel,

Thank you for the advice, I'm trying your suggestion and it works fine by now. But indeed the compiler has worked for more than two hours.

I notice that there is something mapaccum/map to reduce the compilation time. How I can use that?

Best,

Yutao

在 2016年9月28日星期三 UTC+2下午3:39:21，Joel Andersson写道：

Joris Gillis

unread,

Sep 28, 2016, 10:59:58 AM9/28/16

to CasADi

Dear Chen,

Writing documentation for mapaccum/map is on my to do list.
At the moment, you may get inspiration for use from the sysid example https://github.com/casadi/casadi/blob/master/docs/examples/python/sysid.py.
It helps if you try things out interactively, doing printDimensions() on the output of calls to Function.map

Best regards,
Joris

Chen Yutao

unread,

Sep 28, 2016, 11:23:18 AM9/28/16

to CasADi

Hi Joris,

Thank you for the information. If I understand right, the mapaccum/map is useful to map the function to a lot of instances.

In my problem, I need more than 2 hours to compile the c code even for one shooting point (I even do not formulate a multiple shooting structure yet). Actually I'm just generating an integration function and its jacobian function by now. Hence in this simple task, it seems that the function to be compiled is too complicate, since it has nothing to do with prediction horizon or the number of samples.

Best,

Yutao

在 2016年9月28日星期三 UTC+2下午4:59:58，Joris Gillis写道：

Joel Andersson

unread,

Sep 28, 2016, 11:51:05 AM9/28/16

to CasADi

Hi Yutao,

The time it will take to compile the code is *highly* dependent on how the symbolic expressions were constructed. I'm confident that it's possible to get those two hours down to minutes or maybe even seconds just by structuring the code differently. I have yet to find a single example where that was not the case.

I agree that mapaccum/map might be useful here, but I do believe that for order-of-magnitude improvements, you'll need to structure the code into multiple Function instances using the MX type (possibly in combination with SX).

Best regards,

Joel

Joel Andersson

unread,

Sep 28, 2016, 12:05:38 PM9/28/16

to CasADi

Hi again,

Just to be clear: I don't believe that you're doing it the intended way if compilation takes two hours. Something is going wrong somewhere. And I don't think using map will resolve that.

Joel

Chen Yutao

unread,

Sep 28, 2016, 12:46:49 PM9/28/16

to CasADi

Hi Joel,

I'm following your suggestion, using SX to formulate the RHS ODE function. But there are hundreds of intermediate states, expressed by the SX variable in the end. Then I formulate a MX function that calculates one step of the integration. This MX function is called multiple times to get the final state, as shown in "sysid.m". In the end, the Jacobian function of the integration function is generated and compiled.

After 2 hours, the compilation is done but the output of the mex function is NaN. When I do everything using SX, the compilation takes several hours and 14Gb memory, but it gives correct output!

Maybe I need to use something that treats intermediate state intelligently to accelerate the compilation.

Best,

Yutao

在 2016年9月28日星期三 UTC+2下午6:05:38，Joel Andersson写道：

Joel Andersson

unread,

Sep 28, 2016, 1:17:13 PM9/28/16

to CasADi

Hi,

If you're following the sysid.m example, did you include the line "one_sample = one_sample.expand();"? That would translate the entire graph into SX which is not what you want if you want short compilation time. So make sure that this line is *not* included.

Joel

Chen Yutao

unread,

Sep 28, 2016, 2:10:35 PM9/28/16

to CasADi

Hi,

I do not include that for sure. If I use pure SX the compilation takes >10Gb memory. Now it occupies less than 800Mb, however, still takes 2 hours

在 2016年9月28日星期三 UTC+2下午7:17:13，Joel Andersson写道：

Joel Andersson

unread,

Sep 28, 2016, 2:22:11 PM9/28/16

to CasADi

Still, something must be wrong. Did you try to do code generation for the function without including the calculation of the Jacobian? Joel

Chen Yutao

unread,

Sep 29, 2016, 3:39:06 AM9/29/16

to CasADi

Yes, the explicit integration function F is easy to compile, but F.jacobian is not.

What's more, by using MX and SX together, the compiler occupies much less memory but the output of the mex function is NaN.

By now, the only way I can imagine is to simplify or improve the model structure

Thanks a lot,

Yutao

在 2016年9月28日星期三 UTC+2下午8:22:11，Joel Andersson写道：

Joel Andersson

unread,

Sep 29, 2016, 9:45:44 AM9/29/16

to CasADi

If the non-differentiated function is easy to compile, it's most likely one of two things:

1. The Jacobian is large but not sparse, or has a sparsity pattern that cannot be efficiently handled

2. One of the intermediate functions is too large

If you're getting NaN, it could be because of a bug, but it could also be a modeling error. You'd need to isolate it to find out why.

If you look at the generated code, you should be able to find out which function is causing the blowup. The names will be related to the names you gave to the corresponding functions. If you know which function is causing problems, you can reformulate it or set some options that will limit memory usage (e.g. disabling the reverse mode algorithmic differentiation).

Also, you can use the command "spy_matlab" to generate a Matlab script for visualizing the Jacobian sparsity pattern:

J = jacobian(f, x);
J.sparsity(). spy_matlab('jsp.m');

Joel

Chen Yutao

unread,

Sep 30, 2016, 6:10:29 AM9/30/16

to CasADi

Thank you Joel,

The Jacobian is small (as I said only 17 states and 3 controls) but obviously it's dense. Even for this its code generation is very painful, needless to say if I formulate a multiple shooting problem.

Now I think the intermediate functions are too large, hence if SX is used, the chain rule for computing the Jacobian contains millions of steps; if MX is used, too many nested loops may make the output blowup.

So I think it's not a problem of CasAdi, but I expect two new developments:

1. a smart treatment of the intermediate functions

2. code generation of implicit integrators

Best,

Yutao

在 2016年9月29日星期四 UTC+2下午3:45:44，Joel Andersson写道：

Joel Andersson

unread,

Sep 30, 2016, 12:21:14 PM9/30/16

to CasADi

Hi,

OK. In general, there is no reason why you would have to end up with so much code. Unless the code *you* are writing consists of millions of lines, there is no reason why the generated code would consist of millions of lines. I'm not sure about the "too many nested functions" thing. It might be compiler specific.

Best regards,

Joel

Chen Yutao

unread,

Oct 5, 2016, 9:47:10 AM10/5/16

to CasADi

Hi again

After several days of testing, I still don't understand how to solve this problem.

I follow the example "sysid.m", except that the ode_rhs is defined by SX inputs. After that, I create new MX variables but called by ode_rhs for one step integration. Hence, this one step integration function is a MX function, which is called four times to finally generate the RK4 scheme.

The compilation of the jacobian of the RK4 function (MX) takes about half an hour, which improved a lot. However, the output is NaN. More specifically, the sparsity of the jacobian is correct, but all the nonzeros are NaN. I tried to check what goes wrong in the c file, it is clear that there are ode_rhs and one step functions included. But it's too hard to figure out something among 35634 lines of codes (the generated jacobian c file). If only SX is used to generate this file, there would be about 85000 lines of codes.

Hence, at this point, can I really do something? Since there is 5 2D interpolation variables in the model, I'm wondering if I can simplify it using Casadi builtin functions to reduce the number of expressions.

Best,

Yutao

Joel Andersson

unread,

Oct 5, 2016, 10:45:12 AM10/5/16

to CasADi

Do you think you can post the code you use to construct the expressions? Just the central parts, it doesn't need to be complete.

Joel

Chen Yutao

unread,

Oct 6, 2016, 4:28:42 AM10/6/16

to CasADi

Hi Joel,

is it possible that we send you the code for constructing the expressions in private?

we don't want our code exposed in this forum

Best,

在 2016年10月5日星期三 UTC+2下午4:45:12，Joel Andersson写道：

Joel Andersson

unread,

Oct 6, 2016, 9:06:41 AM10/6/16

to CasADi

Hi Yutao,
Sorry, but for support that isn't via the the open forum, we would insist on either a paid consulting agreement or an academic collaboration. You can contact us about either.
Joel

Message has been deleted

Chen Yutao

unread,

Oct 10, 2016, 1:13:24 PM10/10/16

to CasADi

Hi there,

I am able to reproduce the NaN problem using a toy example- chain of masses.

The model expressions are given here (SX mixed with MX), the output put of function Gw_RK4 is correct when I call it in casadi environment but is NaN when I compile its c code generated by casadi. Also the compilation takes 1 hour

Do I made any mistakes?

Best,

Yutao

import casadi.*

n=3;
nx=n*3+(n-1)*3;
nu=3;

p0x=0;p0y=0;p0z=0;

fv=SX.sym('fv',nx+nu,1);

k=0.1;
lr=0.55;
m=0.45;
g=9.81;

xend=1;yend=0;zend=0;

px=fv(1:n,1);
py=fv(n+1:2*n,1);
pz=fv(2*n+1:3*n,1);
vx=fv(3*n+1:4*n-1,1);
vy=fv(4*n:5*n-2,1);
vz=fv(5*n-1:6*n-3,1);
ux=fv(nx+1);
uy=fv(nx+2);
uz=fv(nx+3);

norm=SX(n,1);
Fx=SX(n,1);
Fy=SX(n,1);
Fz=SX(n,1);
ax=SX(n-1,1);
ay=SX(n-1,1);
az=SX(n-1,1);

norm(1)= ((px(1)-p0x)^2+(py(1)-p0y)^2+(pz(1)-p0z)^2)^0.5 ;
Fx(1)=(px(1)-p0x)*k*(n-lr/norm(1));
Fy(1)=(py(1)-p0y)*k*(n-lr/norm(1));
Fz(1)=(pz(1)-p0z)*k*(n-lr/norm(1));
for i=2:n
    norm(i)= ((px(i)-px(i-1))^2+(py(i)-py(i-1))^2+(pz(i)-pz(i-1))^2)^0.5 ;
    Fx(i)= (px(i)-px(i-1))*k*(n-lr/norm(i)) ;
    Fy(i)= (py(i)-py(i-1))*k*(n-lr/norm(i)) ;
    Fz(i)= (pz(i)-pz(i-1))*k*(n-lr/norm(i)) ;
    ax(i-1,1)= (Fx(i)-Fx(i-1))*n/m ;
    ay(i-1,1)= (Fy(i)-Fy(i-1))*n/m ;
    az(i-1,1)= (Fz(i)-Fz(i-1))*n/m-g ;
end

xdot=[vx;ux;vy;uy;vz;uz;ax;ay;az];

M = 4; 
DT = Ts/M; 

X=fv(1:nx);
U=fv(nx+1:nx+nu);
f = Function('f', {X,U}, {xdot},{'X','U'},{'xdot'});
  
fv_MX=MX.sym('fv_MX',nx+nu,1);
X=fv_MX(1:nx);
U=fv_MX(nx+1:nx+nu);
k1=f(X,U);
k2 = f(X + DT/2 * k1, U);
k3 = f(X + DT/2 * k2, U);
k4 = f(X + DT * k3, U);
step=X+DT/6*(k1 +2*k2 +2*k3 +k4);

onestep=Function('onestep',{fv_MX},{step},{'fv_MX'},{'step'});
X=fv_MX(1:nx);
U=fv_MX(nx+1:nx+nu);
for j=1:M
    X=onestep([X;U]);
end

F_RK4=Function('F_RK4', {fv_MX}, {X}, {'fv_MX'}, {'xf'});
Gw_RK4= F_RK4.jacobian('fv_MX','xf');

Joel Andersson

unread,

Oct 10, 2016, 3:27:12 PM10/10/16

to CasADi

Hi Yutao,

In your code, you are taking a square root of a number. Are you that the argument of the square root never becomes zero? The square root function has an undefined value at zero, which may or may not result in a NaN in CasADi.

About compilation time: When I run the script on my computer, the C code for the Jacobian (i.e. Gw_RK4) takes 0.16 seconds to compile with clang. If I turn on -O3 code optimization, it takes 3.8 seconds. I'm not sure how it can take 1 hour on your computer.

Best regards,

Joel

Chen Yutao

unread,

Oct 10, 2016, 5:42:56 PM10/10/16

to CasADi

Hi, Joel,

at least the input I give will not produce NaN for zero square root, because the output is correct if I run the function in casadi environment

I'm using Windows 10 with VS 2013, is there any problem with this compiler? It's amazing that the compilation time is so short in you computer, I have only this kind of compilation time when I use SX functions for very small models

Best,

Yutao

在 2016年10月10日星期一 UTC+2下午9:27:12，Joel Andersson写道：

Chen Yutao

unread,

Oct 11, 2016, 4:27:32 AM10/11/16

to CasADi

Also,

The output is correct when I use pure SX to create the function and then compile it. For SX mixed with MX case, I tried to turn off optimization flag in Matlab by adding -g, the compilation takes less than 1 second, but NaN output is also observed, except for one correct value at this time.

I'm not sure if this is problem of compiler of Matla. I'm using Matlab 2016a. Is this version compatible with CasADi?

Joel Andersson

unread,

Oct 11, 2016, 9:22:35 AM10/11/16

to CasADi

Hi,

You are never guaranteed to get a NaN in CasADi. If you have an expression such as "x/x", CasADi might try to simplify it to 1. But if it isn't simplified, and evaluated with x=0, you'll have 0/0 = NaN. So probably you are just "lucky" when you don't get NaN. SX does much more simplifications than MX, which is probably what you are observing.

Try adding a small number to the argument of the square root term to keep it strictly positive, e.g. sqrt(z+1e-100) instead of sqrt(z), where z>=0 is some expression.

Joel

Chen Yutao

unread,

Oct 11, 2016, 12:42:56 PM10/11/16

to CasADi

Hi Joel,

I tried to follow your suggestion and there is no difference if I use 1e-20 in sqrt and denominator. However, when I use 1e-8, part of the output matrix is not NaN, but is not correct. The other part is still NaN.

As far as I know, if I do not compile the funcion (SX & MX), it is executed in casadi environment but in a same way of c code. Hence, I still can not explain that why the output in casadi environment is correct without any small argument added but the output matrix is NaN for c code.

The compilation time is also not understandable.

Something wrong with compiler in windows? or with the latest Matlab version

Yutao

Joel Andersson

unread,

Oct 11, 2016, 12:50:22 PM10/11/16

to CasADi

Hi,

When you evaluate it in from CasADi, did you try to set the option "jit" to true for the Function instance? That will involve CasADi's just-in-time compiler (normally Clang).

Joel

Chen Yutao

unread,

Oct 12, 2016, 11:18:10 AM10/12/16

to CasADi

Hi,

this is worth a try but indeed we would like to use the generated mex function in the simulink or even in other projects.

Is it possible to use 'jit' in this case?

Yutao

Joel Andersson

unread,

Oct 12, 2016, 11:51:25 AM10/12/16

to CasADi

I only suggested this in order to determine if it's really your compiler. But I don't think it is. I think there is something with the equations you have entered. Somewhere, you probably do a division by zero or a square root of zero or the like.

Actually, I have another idea of what could explain the different behaviour: you write "(...)^0.5". Try replacing that with "sqrt(...)". Iirc, that's a simplification that is done by SX but not by MX.

Joel

Chen Yutao

unread,

Oct 12, 2016, 12:36:54 PM10/12/16

to CasADi

Ok I will try this replacement.

But, I've mentioned that the computation in casadi environment should be the same with c code execution. Am I right?

If so, why the outputs are different given the same input, and the same function structure?

If not, then something happened when compile the function.

Yutao

Joel Andersson

unread,

Oct 12, 2016, 1:50:01 PM10/12/16

to CasADi

Yes, in principle it should be the same. If the difference is due to a suspected bug, you would need to simplify the expressions to a minimal failing example so that we could figure out why there is a difference. It's possible that it's due to a bug in CasADi.

Joel

Reply all

Reply to author

Forward