wiki for ODE sensitivities with cvodes / odeSD

Sebastian Weber

unread,

Oct 5, 2015, 3:52:17 AM10/5/15

to stan development mailing list

Hi Bob!

Yes, sure, we can move this to stan-dev. Let me answer inline:

> Can we move this disucssion to stan-dev?

Sure.

>
> Yes, I think we should consider CVODES.

Great.

>
> The main issues with inclusion are getting the builds to work on
>
> * every platform (Mac, Windows, Linux)
> * through every compiler (g++, clang++, MSVC, mingw, Intel, ...?)
> * through every interface's installation (CmdStan, RStan, PyStan)
>
> Ideally that would mean with only using make so we don't have to add on
> any more build dependencies.

The current status is that CVODES builds with g++, clang++ and Intel. Moreover, only make is used to perform the build and install. For Rstan I figured how I get R to build CVODES - it would need Ben to look into what is best here (dynamic, static linking). For Python there are a couple of packages which make CVODES available for use in Python so I suppose this is doable as well (a nice one with a neat c++ interface is pycvodes).

> Of course, we also need it to pass our library tests. And it needs to be
> demonstrably better than whatever is already implemented.
>
> I'm all sorts of confused about which systems we're talking about and what
> they're capable of, along at least the following dimensions:
>
> * switching between stiff and non-stiff regimes

Yes, CVODES can switch between stiff and non-stiff, but not automatically. The user has to say upfront what the problem is. Here LSODA is nice in that it switches on heuristics between stiff (BDF) and non-stiff (Adams).

>
> * using finite diffs for Jacobians to handle stiffness

If you do not give CVODES the Jacobian, it will use finite diffs. Same for LSODA.

>
> * allowing us to pass in system Jacobians for stiffness

Yup, that's possible either CVODES. I guess (don't know) same is true for LSODA.

>
> * what kind of sparseness is implied by the system Jacobians
> (I think all the parameter w.r.t. other parameter components
> are zero)
> and how the solvers deal with them

CVODES has support for sparse structures. To my knowledge LSODA has not, but LSODAS has. Note CVODES = CVODE with Sensitivity while Sparseness is always included if switched on; LSODAS = LSODA with Sparseness.

>
> * what underlying ODE solver algorithms are being used

CVODES: BDF for stiff, Adams for non-stiff both with variable order
LSODA: same as CVODES, but with fixed order to my knowledge.

>
> * how CVODE relates to CVODES relates to LSODA and whether there
> are other acronyms I should be throwing around here

This is the important one:

CVODE: ODE integrator which solves the problem type y' = f(t,y) ; y(t0) = y0
So you specify a y0 and the rhs side of the ODE and you get y(t) calculated. NO SPECIAL SENSITIVITY SUPPORT HERE. In order to get the gradients dy(t)/dp for K parameters p, you have to give the integrator as rhs the coupled_ode system. Therefore, the Jacobian (as seen from CVODE) becomes very large and in order to get at least some useful performance you must take advantage of the sparseness property of the large Jacobian.

CVODES: ODE integrator which solves the problem type y' = f(t,y,p) ; y(t0) = y0(p). The dependency on parameters p is made explicit within CVODES! The integrator takes as input the rhs of the ODE (just as CVODE), but in addition you can ask for the additional solution of the sensitivity equations. So you give CVODES the rhs which is of size N and you give CVODES another rhs_sensitivity function for each parameter p for which you want the gradients dy(t)/dp solved along with y(t). The rhs_sensitivity function must use internally the Jacobian of the original system. Now, you also give to CVODES the Jacobian, but you give CVODES the Jacobian of the original system (with N states only). CVODES then takes automatically for you advantage of the sparse structure of the problem - there is no need any more to program a sparse Jacobian for the coupled system as all of this is taken care of within CVODES. This is why I refer to CVODES as "integrated ODE-sensitivity" solver. Another advantage is that CVODES provides another approach to solve the sensitivity problem. This other approach scales much better with N or K getting large according to the docs.

LSODA: Essentially like CVODE with the convenience to switch between stiff/non-stiff automatically. Solves the problem y' = f(y,t); y(t0) = y0. Hence, NO support to solve the sensitivity system in an integrated way. You have to code the sparseness yourself (which requires to use LSODAS). Note that CVODE is the direct successor of LSODA.

>
> * whether our current non-stiff solver is OK or you want to
> swap that out with a different Sundials component

Solving non-stiff problems with an enlarged coupled ode system is not such a big performance problem, because the Jacobian of the coupled system is not needed. The need to get rid of the boost odeint solver is now much relaxed as the boost solver is now robust (means the solver throws an exception instead of making Stan hang). However, CVODES should perform better for non-stiff for two reasons: (i) it uses the Adams stepper which supposedly evaluates the rhs less frequently and (ii) you cannot take advantage of the alternative way of solving the sensitivity problem.

>
> * availability of interpolation for "dense" solutions

I don't think this is important. My understanding is that this matters if the internal step size dt is much larger than the typical delta t observations are apart from one another.

>
> * whether we can throttle the stepper so as not to go into what
> feels like an infinite loop

CVODES gives up whenever maxSteps is reached. So yes, CVODES has the important robust property to discard (extreme) draws.

>
> * ability to control absolute and relative tolerances

Sure, every solver can do this. From my performance tests this does not really matter what you set there.

>
> * "integrated ODE+sensitivity solver" (Sebastian brought this up, but
> I don't know what it means)

CVODES = integrated ODE+sensitivity solver, solves y'=f(t,y,p) y(t0) = y0(p) and has built-in support to get the gradients dy(t)/dp while taking advantage of sparseness of the Jacobian automatically.

CVODE and LSODA = solvers for y'=f(t,y) y(t0) = y0; to get the sensitivities you must give it the coupled ode system as rhs and you have to program sparse Jacobians yourself (and then the performance is still by a factor of 5-500 lower in comparison to CVODES).

>
> * ability to provide (sparse) user-defined Jacobians
> -- support for sparse matrix Jacobian vs. dense

This removes the need of AD and I found this very important. If the user can even specify the sparsity, this would even be better, of course - but the big gain is to remove AD (8x speed increase for N=3, K=7 case for the gradient evaluation).

>
> * ability to provide user-defined sensitivities

Isn't that the same as providing the Jacobian by the user? Or do you mean that users provide analytic dy(t)/dp functions?

Sebastian

>
> - Bob
>
> > On Oct 2, 2015, at 6:13 AM, Weber, Sebastian
> >
> > Hi Bob & Daniel!
> >
> >
> >
> > I think that we should consolidate in a wiki all the knowledge accumulated
> for solving ODE sensitivities – otherwise I will start to forget about things
> again and I have some ideas in my head which would be nice. I can try to start
> a wiki page on that.
> >
> >
> >
> > Now, before I start that – where are we with a possible inclusion of
> CVODES? I mean, some of my ideas depend on its feasibility to include it. In
> other words, do you consider this as feasible given that with some time we
> carefully think through this? If you agree that this route is worthwhile to
> explore, then it makes sense to start documenting things. It would also be
> desirable to define criteria which are needed to take on CVODES, i.e. under
> what circumstances are we (or better Daniel who decides) good to seriously
> consider an inclusion.
> >
> >
> >
> > FYI, the current state with deploying cvode: It can now be installed with a
> simple make and it uses only g++/clang++/icpc. I also already tried to roll this
> into Rstan and my first attempts look very promising, i.e. R just builds cvode
> during installation. For PyStan: I am not a Python expert, but there are a
> couple of packages out there which make CVODES available to Python users.
> >
> >
> >
> > To make things even more fun, I looked into odeSD which is the
> presumably the fastest ODE sensitivy integrator (according to
> http://www.biomedcentral.com/content/pdf/1752-0509-6-46.pdf).
> >
> > The good about odeSD:
> >
> > - It is just one ~1000lines C code file which includes a single function ->
> there are chances to convert this into a C++ header-only code, I think
> >
> > - About 2x faster than cvodes according to the benchmarks; it’s main
> speedup comes from requiring fewer calls to the system Jacobian which is
> good news as autodiff is so expensive
> >
> > - The authors are friendly and would consider dual-licensing odeSD
> under the BSD as well (I asked)
> >
> > - Compiles under windows
> >
> > The bad:
> >
> > - LGPL – we have to wait for dual-licensing if one were to modify it
> >
> > - By far not as proven in the wild; cvodes has been much more used
> >
> > - The code looks scary to me – lots of hard coded constants, but
> possibly that’s normal for numerics code
> >
> > - Dependency on BLAS and LAPACK – here one could try to use Eigen’s
> implementation, but this requires a Fortran compiler; or one changes the
> BLAS calls to Eigen calls – no idea how straightforward this is. Since Eigen
> implements BLAS, there would a good template available to do that; lots of
> work possibly.
> >
> > - Interface looks a bit messy, but they provide examples
> >
> > - Not as many techniques to solve ODE sensitivities as in CVODES
> >
> >
> >
> > BTW, to me the need for a stiff solver is much reduced once the robust
> patch is in Stan along with the patch from Michael. In fact, if I were to decide,
> I would opt for user gradients first by now, since many systems are non-stiff
> and these would be made a lot faster by removing AD with user gradients.
> For non-stiff systems there is not a huge gain with CVODES, at least this is
> what I expect (whereas for stiff systems there is almost no way around
> CVODES as the Jacobian explodes).
> >
> >
> >
> > Best,
> >
> >
> >
> > Sebastian Weber

Bob Carpenter

unread,

Oct 7, 2015, 4:31:28 PM10/7/15

to stan...@googlegroups.com

Thanks, that was super helpful. It got lost in my stack
of e-mail and I'm just catching up.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,

Oct 16, 2015, 1:39:09 AM10/16/15

to stan...@googlegroups.com

Thanks for all the answers.

I still don't get how the integrated thing in CVODES works.

I don't understand how I would specify

y' = f(t, y, p);

do I literally given it that function f? Does CVODES use finite differences
for the sensitivities? I didn't follow what you meant
by "rhs_sensitivity function". Is that another C function pointer or
something you pass in? Is there a worked example in their doc somehwere?

And the "other" approach you talk about is the one that goes
a single parameter at a time?

- Bob

P.S. Just a clarification. LSODA is variable order. There's
a really great paper here on how it works:

https://computation.llnl.gov/casc/nsde/pubs/u113855.pdf

LSODA also has support for banded Jacobians --- don't know
if that'll help us much since the structure isn't classically
banded in the coupled system.

And here's the description where I got lost:

Sebastian Weber

unread,

Oct 16, 2015, 12:31:27 PM10/16/15

to stan development mailing list

Hi Bob!

Let me try to answer:

>
> I don't understand how I would specify
>
> y' = f(t, y, p);
>
> do I literally given it that function f? Does CVODES use finite differences
> for the sensitivities? I didn't follow what you meant
> by "rhs_sensitivity function". Is that another C function pointer or
> something you pass in? Is there a worked example in their doc somehwere?

Yes, you give CVODES a function with signature (this is the rhs of the ODE)

static int f(realtype t, N_Vector y, N_Vector ydot, void *user_data);

and user_data is a data structure expected to contain a field p which contains the values of the parameters for which you want to have the sensitivities. To calculate the sensitivities CVODES uses finite differences by default, but you can give CVODES a function which I referred to as rhs_sensitivity which has the signature

static int fS(int Ns, realtype t, N_Vector y, N_Vector ydot,
int iS, N_Vector yS, N_Vector ySdot,
void *user_data, N_Vector tmp1, N_Vector tmp2);

and this function is expected to write the derivatives for the sensitivities into ySdot (much like what you currently write into the coupled_sys structure inside the coupled_system functor).

I strongly recommend you to look at the example examples/serial/cvsRoberts_FSA_dns.c which is in the CVODES archive. This example is also documented in the cvs_examples.pdf guide (page 9).

> And the "other" approach you talk about is the one that goes
> a single parameter at a time?

No, the other approach works in a different way and is much more efficient whenever N (states) or K (parameters) is large. The idea is simple: We do not actually care about the derivatives of the function wrt to the parameters. These are just a utility to calculate what is really needed - the derivatives of the log-lik wrt to the parameters. So considering the log-lik as a cost function, then the other approach allows you to calculate directly the derivatives of the log-lik wrt to the K parameters. The downside is that you must pass into CVODES then the cost function, which is only doable in Stan if you call this in a _log function to my understanding. The upside is that the respective problem to solve then (CVODES talks of a "dual" problem) is of order O(K); N does not matter any more to my understanding here! Hence, the scaling is much better whenever N is large.

I hope this helps.

Maybe we should have a phone call next week to clarify matters further?

Sebastian

>
> - Bob
>
> P.S. Just a clarification. LSODA is variable order. There's
> a really great paper here on how it works:
>
> https://computation.llnl.gov/casc/nsde/pubs/u113855.pdf
>
> LSODA also has support for banded Jacobians --- don't know
> if that'll help us much since the structure isn't classically
> banded in the coupled system.

Thanks, I think I confused it with LSODE.

Reply all

Reply to author

Forward