[sundials-users] Runtime Increase with CVODES upgrade (2.5 -> 6.2)

Ravi teja Nallapu

unread,

Jul 12, 2022, 11:24:50 AM7/12/22

to SUNDIAL...@listserv.llnl.gov

Dear Sundials Developers,

My team uses Sundials to propagate orbital motion of satellites, by solving their dynamics at various degrees of fidelity. Until a long time we were using the Sundials 2.5; and have very recently upgraded to version 6.2. When benchmarking we noticed a minor increase in the runtime in the newer version (~1.02x increase), and wanted to check with you if this is an expected or anomalous behavior. --We acknowledge that this is a long shot, and totally understand if it cannot be readily explained.

At a top level, our C++ workflow uses the CVODES solver (with a serial nvector implementation), where the non-stiff orbital dynamics of the satellite are solved using an Adams-Moulton ODE solver. The solver is configured with a functional non-linear root solver (Fixed Point solver in 6.2). The implementation changes of the problem in our workflow looks as shown below for the old and new C++ implementations:

Old:
cvode_mem = CVodeCreate(CV_ADAMS, CV_FUNCTIONAL); // Create Memory
state = N_VNew_Serial(state0.n_elem); // State Vector
CVodeRootInit(cvode_mem, state.n_elem, roots_func); // Initialize root finder

New:
SUNContext_Create(NULL, &context); // Create context
cvode_mem = CVodeCreate(CV_ADAMS, context); // Create Memory
state = N_VNew_Serial(state0.n_elem, context); // State Vector
NLS = SUNNonlinSol_FixedPoint(state, 0, context);
CVodeSetNonlinearSolver(cvode_mem, NLS); // Define and set solver
CVodeRootInit(cvode_mem, state.n_elem, roots_func); // Initialize root finder

The other steps such as initializing the ODE, setting tolerances, and stepping through solver are essentially same. Therefore, I just wanted to ask if:

1. Was there any obvious change in the architecture that can slow the newer versions?
2. Are there any potential sources to speed up the runtime of our ODE solving workflow?

Would love to hear your thoughts and comments on the problem. Looking forward to your response,

Sincerely,

Ravi teja Nallapu, Ph.D.

Orbit R&D Engineer,

Planet,

San Francisco, CA

Email: ravi.n...@planet.com

To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV

Danilo Tomasoni

unread,

Jul 15, 2022, 11:09:04 AM7/15/22

to SUNDIAL...@listserv.llnl.gov

Hello Ravi,

I'm a sundials user like you, and I can confirm that from version 2.5, to versions 3 onwards there is an increase in run-time.

When I upgraded to cvode version 3 I noticed this, and when I asked about it,

someone from this list told me that it was due to new layers of abstraction in code

that from one side allows for more flexibility, but on the other side require a positive, but constant time to be handled.

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)

Piazza Manifattura 1, 38068 Rovereto (TN), Italy

toma...@cosbi.eu

http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the protection of natural persons with regard to the processing of personal data, we inform you that all the data we possess are object of treatment in the respect of the normative provided for by the cited GDPR.

It is your right to be informed on which of your data are used and how; you may ask for their correction, cancellation or you may oppose to their use by written request sent by recorded delivery to The Microsoft Research – University of Trento Centre for Computational and Systems Biology Scarl, Piazza Manifattura 1, 38068 Rovereto (TN), Italy.

P Please don't print this e-mail unless you really need to

Da: sundial...@llnl.gov <sundial...@llnl.gov> per conto di Ravi teja Nallapu <000027aed0c994c...@LISTSERV.LLNL.GOV>
Inviato: martedì 12 luglio 2022 01:32
A: SUNDIAL...@LISTSERV.LLNL.GOV <SUNDIAL...@LISTSERV.LLNL.GOV>
Oggetto: [sundials-users] Runtime Increase with CVODES upgrade (2.5 -> 6.2)

[CAUTION: EXTERNAL SENDER]
[Please check correspondence between Sender Display Name and Sender Email Address before clicking on any link or opening attachments]

Balos, Cody Joe

unread,

Jul 15, 2022, 11:22:31 AM7/15/22

to SUNDIAL...@listserv.llnl.gov

Hi Ravi,

As Danilo pointed out, SUNDIALS was refactored in version 3 to extract and encapsulate some common code from the integrators. As such, it is not surprising that version 3+ is slightly slower than version 2 in an exact head-to-head comparison. Like Danilo said, this overhead from the abstraction layer is constant with respect to the problem size. In regards to trying to speedup 6.2 to be on par with 2.5, first I have a few questions for you:

Do version 2.5 and 6.2 use the same number of time steps for your problem? What about the number of nonlinear iterations? If the integrator statistics are exactly the same, then it is likely the abstraction layer overhead that you are seeing.
A lot of the abstraction layer overhead is in the "setup", i.e. before calling CVode(...). Can you time just the CVode(..) call with 2.5 and 6.2 and report the difference?
Can you tell us a little bit more about your problem setup? How big is the ODE system? Are you solving many of these systems sequentially?

Regards,

Cody

From: sundials-users <sundial...@llnl.gov> on behalf of Danilo Tomasoni <toma...@COSBI.EU>
Sent: Thursday, July 14, 2022 1:13 AM
To: sundials-users <sundial...@llnl.gov>
Subject: [sundials-users] R: [sundials-users] Runtime Increase with CVODES upgrade (2.5 -> 6.2)

Ravi teja Nallapu

unread,

Jul 18, 2022, 3:01:55 PM7/18/22

to SUNDIAL...@listserv.llnl.gov

Hello Cody and Danillo,

Thank you very much for confirming this. Cody, with regards to the questions, please find the responses below:

1. Do version 2.5 and 6.2 use the same number of time steps for your problem? What about the number of nonlinear iterations? If the integrator statistics are exactly the same, then it is likely the abstraction layer overhead that you are seeing.
Response:
Yes, for benchmarking purposes, we set up the integrators with the same simulation timespans. Since the approach to compute step sizes during each iteration has not changed between the 2 versions, both solvers are taking the same step counts.
Regarding the solver, both architectures used the same solver (CV_FUNCTIONAL in 2.5; and SUNNonlinSol_FixedPoint in 6.2), and their default settings (# of acceleration vectors, etc) have not been changed. So I believe the solvers are operating in the same manner.

2. A lot of the abstraction layer overhead is in the "setup", i.e. before calling CVode(...). Can you time just the CVode(..) call with 2.5 and 6.2 and report the difference?
Response:
We timed the CVode execution in both versions. We found that Cvode() call execution in 6.2 is about 0-0.2 seconds slower than its 2.5 counterpart. This is specifically the delay we are trying to mitigate. The initialization was on the same order of magnitude in both versions (~0.0002 seconds)

3. Can you tell us a little bit more about your problem setup? How big is the ODE system? Are you solving many of these systems sequentially?
Response:
Our most standard workflow involves solving a 6x1 ODE system. We solve for a satellite's dynamics using Newton's second law, and our state space includes the 3-dimensional positions and velocities. The accelerations we factor into the dynamics changes the fidelity of the modeling and also makes the ODEs implicit. The specific formulation we implement is called the Cowell's formulation (Sectional link to a Google Book text). Currently our ODE solvers are all solved sequentially, but we would appreciate any pointers you may have on parallelized implementations with SUNDIALS.

Once again, thank you very much for your time. Please let me know if you have any comments/ require more information. Very much looking forward to the responses.

Sincerely,
Ravi teja Nallapu, Ph.D.
Orbit R&D Engineer,
Planet,
San Francisco, CA
Email: ravi.n...@planet.com

To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV

--

Sincerely,

Ravi teja Nallapu, Ph.D.

Orbit R&D Engineer,

Planet,

San Francisco, CA

Email: ravi.n...@planet.com

Phone: 980-333-7038

Balos, Cody Joe

unread,

Jul 18, 2022, 3:28:57 PM7/18/22

to SUNDIAL...@listserv.llnl.gov

Hi Ravi,

So is the 0-0.2 seconds difference in the CVode call with 2.5 and 6.2 for solving one single 6x1 ODE system? Or, is it cumulative over solving several of the 6x1 ODE systems? I assume it’s the former and that the accumulation of 0-0.2 seconds across many solves is the issue for your application.

In either case, I think the easiest way to make up the time difference is to see if we can reduce the number of time steps or nonlinear iterations to make things faster. Can you provide a sample of how many time steps, function evaluations, and nonlinear iterations are being used? Also it would be good to know how many error test failures and nonlinear convergence failures there were.

Specific to the case I am assuming (many 6x1 ODE systems to solve), if you can parallelize across the systems that may also be an option to think about.

Thanks,

Cody

Regards,

Ravi teja Nallapu

unread,

Jul 18, 2022, 8:29:15 PM7/18/22

to SUNDIAL...@listserv.llnl.gov

Hello Cody,

Thank you very much for the response. Yes, your understanding is correct, the delay is noted from comparing the two versions that solve the same ODE system. The concern is more about how to mitigate the 0-0.2 sec delay per 6x1 system that would compound over solving complex systems that require multiple such ODEs.

Regarding timestep and nonlinear solver stats, I just wanted to check if there is a coding template I can use to profile them to get these stats for the two versions. Please let me know,

Sincerely,
Ravi Nallapu

Balos, Cody Joe

unread,

Jul 20, 2022, 11:18:35 AM7/20/22

to SUNDIAL...@listserv.llnl.gov

Hi Ravi,

You can use the functions CVodeGetNumSteps, CVodeGetNumRhsEvals, CVodeGetNumNonlinSolvIters, CVodeGetNumNonlinSolvConvFails, and CVodeGetNumErrTestFails to get the number of time steps, right-hand-side evaluations, nonlinear solver iterations, and nonlinear solver convergence failures, and number of error test failures respectively.

Ravi teja Nallapu

unread,

Aug 1, 2022, 4:56:44 PM8/1/22

to SUNDIAL...@listserv.llnl.gov

Hello Cody,

Apologies for the delay in getting back. I did a comparison between the solver statistics between both versions and found them to have the exact same values. For instance, here are statistics for a typical solution for a simulated time span of 1 day:

Version 6.2

1. CVodeGetNumSteps result: 8,378

2. CVodeGetNumRhsEvals result: 12,034

3. CVodeGetNumErrTestFails result: 29

4. CVodeGetNumNonlinSolvIters result: 12,033

5. CVodeGetNumNonlinSolvConvFails result: 0

6. Runtime: 0.31 seconds

Version 2.5

1. CVodeGetNumSteps result: 8,378

2. CVodeGetNumRhsEvals result: 12,034

3. CVodeGetNumErrTestFails result: 29

4. CVodeGetNumNonlinSolvIters result: 12,033

5. CVodeGetNumNonlinSolvConvFails result: 0

6. Runtime: 0.30 seconds

Please let me know if you need additional information.

Sincerely,

Ravi Nallapu

Balos, Cody Joe

unread,

Aug 22, 2022, 3:46:36 PM8/22/22

to SUNDIAL...@listserv.llnl.gov

Hi Ravi,

Since the integrator stats are the same, and the runtime difference is so small, I have been thinking of ways we can squeeze a little more performance out. The attached patch for SUNDIALS v6.3.0 hardcodes (via a macro) the serial vector into CVODE rather than using the generic N_Vector interface (bypassing the function pointer loads and dereferences), I have found that it does speed things up by O(10^-3) to O(10^-2) seconds in a few of the CVODE examples. For most applications this would not be worth doing, but in your specific case where the runtime difference is so small, this might be a worthwhile improvement.

Similarly, the fixed point nonlinear solver could also be hardcoded in for a tiny bit more performance improvement.

Best,

Cody

From: sundials-users <sundial...@llnl.gov> on behalf of Ravi teja Nallapu <000027aed0c994c...@LISTSERV.LLNL.GOV>
Sent: Monday, August 1, 2022 1:16 PM

hardcode_serial_vector.patch

Ravi teja Nallapu

unread,

Aug 22, 2022, 4:54:11 PM8/22/22

to SUNDIAL...@listserv.llnl.gov

Hello Cody,

That is awesome, thank you very much!! Will send out a final update once we’ve tested the new release.