I managed to convert all my code to use functions and maps, but I'm still getting large C code files. I did some analysis to deug this. I'll try to explain what I did:
In my code I use 7 'maps' with various numbers of inputs and 1 output, essentially for each discrete time index i:
1) and 2) compute inputs U[i] from parameters p[i] and optimisation variables o[i]
3) compute states X[i] from U[:] (i.e. function of all optimisation variables) and X0 (X0 is an element of p) as follows:
int intervals = 12;
MX A = MX::sym("A", n_statesUsed, n_states);
MX B = MX::sym("B", n_statesUsed, n_inputs*intervals);
MX b = MX::sym("b", n_statesUsed, 1);
MX X0 = MX::sym("X0", n_states,1);
MX U = MX::sym("U", n_inputs*intervals,1);
MX Xexp = mtimes(A, X0) + mtimes(B, U) + b;
Function XFun("XFun", {A, B, b, X0, U}, {Xexp});
Function XMap = XFun.map(intervals);
4) compute inequality constraints G[i] from X[i], U[i], p[i], o[i]
5 and 6) compute 2 sets of equality constraints H1[i], H2[i] from X[i], U[i], p[i], o[i]
7) compute cost C[i] from X[i], U[i], p[i], o[i]
using these I compute constraints and cost:
constraints = vertcat(vertcat(G[:]), vertcat(H1[:]), vertcat(H2[:]))
cost += C[i] // for all i
finally:
MXDict nlp = {{"x", o}, {"f", cost}, {"g", constraints}, {"p",p}};
Function solver = nlpsol("solver", "ipopt", nlp);
solver.generate_dependencies("nlp.c");
Some vector lengths:
X0: 276
p: 276+176*intervals
o: 16*intervals
G: 8*intervals
H1: 44*intervals
H2: 0
the expressions contained by U, G, H are not trivial bot not incredibly complex either. They come from a Modelica model that I am parsing through JModelica and they involve some min/max/if-else/exp/power and other expressions.
When compiling this into C-code for intervals=12 I get a file size of 1.4 GB when using an implementation using maps and 410 MB when using an earlier version of my code without functions/maps: i.e. U,G,H,X are all inlined into 'constraints' and 'cost' using MX::substitute(). This seems odd since the second implementation contains a lot of code duplication. Either way GCC won't compile the resulting file.
When translating a simpler variant of the same problem with:
X0: 276
p: 276+176*intervals
o: 16*intervals
G: 8*intervals
H1: 0
H2: 0
and U and C essentially consisting of vertcat() only, i.e. they contain no equations, the C code sizes are respectively 920MB and 9MB with/without maps. This seems to suggest that the XFun/XMap implementation is somehow not very efficient.
Since looking at the code for smaller problems is more manageable, I looked at a problem with:
X0: 1
p: 1 +1*intervals
o: 1*intervals
G: 1* intervals
H1: 0
H2: 0
U contains just a few equations, C = sum(o). intervals=12
In the c code in 'fwd24_XFun_12' I find an array of size 6912 that is passed onto 'fwd24_XFun_12_map' for the 7th function input ("Input 7 (fwd_i1), part 0 (x_7)"). The XMap inputs have following sizes:
Atot: (1, 12)
Btot: (1, 288)
btot: (1, 12)
X0tot: (1, 12)
Utot: (24, 12)
fwd_i1 seems to correspond to B and 6912 = 24*288. One 12th of Btot is multiplied with one 12th of Utot, which leads to 24*24 operations. This happens 12 times, which in total leads to 12*24*24=6912 operations. I figure this is related to the used AD algorithm.
Anyway, 'fwd24_XFun_12' is in turn called from 'nlp_jac_g', which simply passes a zero(6912,0.) vector as an input argument. Eventually this array is used in 12 loops over function 'fwd24_XFun', each containing 24 pieces of code looping over a vector of size 24. The first of these pieces of code is pasted here:
w6 = 0.;
/* #12: Input 7 (fwd_i1), part 0 (f0_1) */
if (arg[7])
copy(arg[7], 24, w5);
else
fill(w5, 24, 0);
/* #13: Input 4 (der_i4), part 0 (U) */
if (arg[4])
copy(arg[4], 24, w7);
else
fill(w7, 24, 0);
/* #14: @6 = mac(@5,@7,@6) */
for (i=0, rr=(&w6); i<1; ++i) for (j=0; j<1; ++j, ++rr) for (k=0, ss=w5+j, tt=w7+i*24; k<24; ++k) *rr += ss[k*1]**tt++;
This essentially seems to compute the first term of d(B*U) = dB *U + B*d(U), which makes sense if it weren't for the fact that B is a constant and therefore dB = 0. This computation could therefore be avoided?
Also, is it really required to copy arg[4] into w7, can't you just pass the address of arg[4] to the for loop?
Another odd thing is that there are 24 copies of the code that I pasted above. I would expect you only need 1? If this is a bug (which I doubt) then it would explain the huge codes I get when the size of U becomes larger: i.e. 276+176*intervals = 2388. This could lead to vectors of 2388*2388*12= 68 million entries and 2388 times more code than required.
One other concern I have: in all my problems U is a mix of optimisation variables o and parameters p. There are significantly more parameters p (2388) than optimisation variables (16). For parameters we don't care about the derivatives since dU=dp=0, but they seem to be computed anyway? So I suppose I should split up my matrix B into a part for 'p' and a part for 'o' to get this evaluated efficiently? Or won't this help? Can't this be automated somehow?
Either way I'll give it a try. I'll try to also look up a reference for how DA works, that should help me figure it out. Any advice to help debug this would be appreciated, or a good reference for DA would be cool too. :)
Thanks in advance, good weekend!
Filip