[sundials-users] dlsodis to sudials conversion

Mittleider (US), Dave N

unread,

May 24, 2022, 4:50:07 PM5/24/22

to SUNDIAL...@listserv.llnl.gov

Hello

I’ve been using the legacy LSODE (specifically DLSODIS) group of solvers to create a real-time simulation of a stiff problem. The model is only 53 states so I thought DLSODIS would likely be fast enough. I calculate a simplified version of a fairly complex (but sparse) Jacobian (seems to be good enough) and a fairly complex (but sparse) inertial matrix and right hand forcing function. The model is unfortunately too slow by a factor of approximately 5 after many modifications to speed it up. Is there someone who could give me some advice on what improvement the Sundials methods could give me and whether or not a increase in throughput of 5x can reasonably be expected? I saw in a online discussion that Sundials has a parallel processing capability. Does this mean that it has GPU acceleration using CUDA? This seems my best bet. If this sounds like I’m asking too much from this kind of solver is there a recommended single pass solver (or other alternative) for stiff systems where I can just simplify my model until the stiff part is stable? Thank you for your help.

Dave Mittleider

To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV

Hindmarsh, Alan Carleton

unread,

May 31, 2022, 11:21:55 AM5/31/22

to SUNDIAL...@listserv.llnl.gov

Hello David Mittleider,

Here are a few offhand thoughts:

First, a problem of size 53 is most likely not

large enough to benefit from a sparse treatment

of the Jacobian. If you are getting a solution

with DLSODIS, I suggest trying DLSODI with

the dense Jacobian treatment. You seem to have

a good approximate Jacobian in closed form,

so that can be supplied to DLSODE (in dense

form).

You say DLSODIS is too slow by a factor of 5.

What is that compared with? What solver is

giving you results 5 times faster?

Size 53 is almost certainly too small to take

advantage of parallelism. Also, parallel solution

of a stiff system involves a more complicated

treatment of the Jacobian and related linear

systems. The dense and sparse solvers are

not usable in a parallel environment.

If you would like to try SUNDIALS, I suggest

IDA, with the dense linear solver, and the same

user-supplied Jacobian you have. It should

perform similarly to DLSODI. But there is no

reason to expect a major difference in speed.

If you can identify a stiff and nonstiff parts of

the right-hand side function (additively), you

might also try ARKODE.

-Alan H

From: sundials-users <sundial...@llnl.gov> on behalf of Mittleider (US), Dave N <dave.n.m...@BOEING.COM>
Sent: Tuesday, May 24, 2022 1:46 PM
To: sundials-users <sundial...@llnl.gov>
Subject: [sundials-users] dlsodis to sudials conversion

Balos, Cody Joe

unread,

May 31, 2022, 11:49:22 AM5/31/22

to SUNDIAL...@listserv.llnl.gov

In addition to what Alan said, here are a few more thoughts:

Generally the GPU acceleration in SUNDIALS requires a much larger problem, or the ability to group many small (independent) systems together to really be beneficial. This is because the overhead of launching kernels on the GPU is typically on the order of several microseconds no matter the problem size. A costly, but highly parallelizable, right-hand-side function and Jacobian evaluation function may benefit from GPU acceleration, but again with only 53 states it is unlikely that the GPU acceleration will be beneficial here.

SUNDIALS has both an OpenMP and a PThreads N_Vector that can be used for threaded parallelism and can be paired with the dense and sparse solvers, but the linear solvers themselves are not threaded. You could then use threads in your right-hand-side and Jacobian evaluation too. However, with 53 states, again you will be limited with the amount of speedup you can obtain (if any).

You mentioned this is for a real-time simulation. So are your speed requirements coming from needing to evaluate the problem within some sort of real-time control context? I would assume this means there is not a way for you to evaluate multiple systems at a time (e.g. if you have data is streaming in), but if you could, this may be one route to achieving a bigger speedup.

Cheers,

Cody

--

Cody Balos

Computer Scientist

Center for Applied Scientific Computing

Applications, Simulations, & Quality

Lawrence Livermore National Laboratory

Mittleider (US), Dave N

unread,

May 31, 2022, 12:10:20 PM5/31/22

to SUNDIAL...@listserv.llnl.gov

Mr Hindmarsh

Thankyou for getting back to me. The 5x speed I referred to was the need to run faster than real-time to leave time so other processes (displays etc) can also be run in the simulation as part of a real-time process. The idea I was trying to accomplish was to have a flexible solver that I could use with a variety of complex/stiff systems but still have a good throughput rate. Even though the solver is complex compared to a single step solver I had convinced myself (for no particular reason other than wishful thinking) that if the problem is small enough that the throughput would still be good and I could take advantage of the really good stability properties of the solver with stiff systems. The matrices are quite sparse because a number of the equations are just second order so have a very simple second state that expands the size of the first order system model used by the solver. I guess if the solver itself can’t be parallelized I’m probably not going to get this to work. I will try something else. Thanks again for your help.

Dave Mittleider

From: sundial...@llnl.gov <sundial...@llnl.gov> On Behalf Of Hindmarsh, Alan Carleton
Sent: Saturday, May 28, 2022 2:15 PM
To: SUNDIAL...@LISTSERV.LLNL.GOV
Subject: [EXTERNAL] Re: [sundials-users] dlsodis to sudials conversion

EXT email: be mindful of links/attachments.

Mittleider (US), Dave N

unread,

May 31, 2022, 4:58:22 PM5/31/22

to SUNDIAL...@listserv.llnl.gov

Cody

Thanks for the response. I just wrote an email to a reference Alan gave me (Raymond Spiteri) so I’m including it here so you can see a little more of where I’m coming from.

Anyway, the model is just a typical helicopter (think Apache or AH-6) where I’m trying to be much more complete in the various couplings and system details. We have lots of modeling like this but they are meant for detailed off-line studies so have no real-time application. I’m trying to improve the fidelity of the modeling while retaining a real-time capability primarily to support piloted evaluations that often have started with the pilot first complaining that the simulation doesn’t represent very well a detail of the helicopter characteristics that we are attempting to evaluate. In bringing in structural and control system compliance modeling along with some detailed aerodynamics and the normal aircraft rigid body modes some of this missing information has been shown to significantly alter the ability of the model to duplicate some of these characteristics without resorting to fudge factors and other things that turn the simulation into a cartoon rather than providing real engineering information. The model I’m currently working with (53 states) is about as simple as one can get but I’m hoping once I get this working to add a lot more to it. Unfortunately I’m already stuck with regard to throughput. I had thought these models are really small compared to the kinds of systems the LSODE integrators were normally used for (i.e nuclear bombs) so speed wouldn’t be an issue. Clearly I was wrong. So my question is do you have a suggestion for some single pass integration package that does a good job on stiff systems? If I can start with something state-of-the-art then I can play around with it to see what I can get away with in model detail.

If you’re interested, I’ve attached the system in a hover at one time step. I’m normally getting a cycle time around .002 seconds and I need it to be no more than about .0005 seconds or less.

Again, thanks again for the explanation of how SUNDIALS works.

Dave Mittleider

From: sundial...@llnl.gov <sundial...@llnl.gov> On Behalf Of Balos, Cody Joe
Sent: Tuesday, May 31, 2022 8:49 AM
To: SUNDIAL...@LISTSERV.LLNL.GOV
Subject: [EXTERNAL] Re: [sundials-users] dlsodis to sudials conversion

EXT email: be mindful of links/attachments.

In addition to what Alan said, here are a few more thoughts:

system_data.txt

Mehmet Erol Sanliturk

unread,

Jun 1, 2022, 11:39:58 AM6/1/22

to SUNDIAL...@listserv.llnl.gov

I am not able to understand your problem properly , but

if you consider integration routines , I can suggest

https://gams.nist.gov/

https://gams.nist.gov/cgi-bin/serve.cgi

https://gams.nist.gov/cgi-bin/serve.cgi/Class/H

https://gams.nist.gov/cgi-bin/serve.cgi/Class/H2

Class H2: Quadrature (numerical evaluation of definite integrals)

By continuing from these pages you may find suitable packages

perhaps useful for your needs , such as

https://gams.nist.gov/cgi-bin/serve.cgi/Class/H2a1a1

Class H2a1a1: Automatic 1-D finite interval quadrature
(user need only specify required accuracy),
integrand available via user-defined procedure

In the above page , or in other pages , you may find very

good integrators .

I am using

Package QUADPACK (Downloadable)

in my programs . It is wonderful for my needs .

https://gams.nist.gov/cgi-bin/serve.cgi/Package/QUADPACK

Package QUADPACK

By inspecting other pages , you may find more useful information with

respect to your needs .

These mostly are in Fortran .

Either you may continue with Fortran as a whole computing package

for your needs

OR

you may use F2C to convert useful routines to C and use these C routines ,

( you may find repositories about converted routines to C )

OR

you may generate Libraries from Fortran routines and use these libraries

to link into your C programs ,

OR

any other ways suitable for you which I do not know .

These are my suggestions I can supply now .

With my best wishes .

Mehmet Erol Sanliturk

Alan C. Hindmarsh

unread,

Jun 1, 2022, 6:51:23 PM6/1/22

to SUNDIAL...@listserv.llnl.gov

Hello again David M,

I understand much better now what you are trying to do.
It may well be that no solver can achieve the speed you
need for the real-time capability you want, but here is
my best suggestion for trying to obtain one that can:

First, do an experiment with DLSODI vs DLSODIS. I take it you are
getting satisfactory results with DLSODIS as far as solution quality,
if not speed. I suspect that much of the speed loss is in the overhead
of the sparse matrix solver routines in DLSODIS, which are really
meant for much larger systems. So it would be valuable to know if you
can also get good solutions with DLSODI, run with the dense treatment
of the Jacobian.

I'm not clear on what your simplified Jacobian is. Are you referring
to the true Jacobian of the ODE system, or an approximation to it that
is simpler? In either case, I suggest running DLSODI with the
internal (difference-quotient) Jacobian option (mf = 22) first, and
then with a user-supplied Jacobian, or your approximation to it if
that is different (mf = 21). Those two runs should give the same
solutions, and mf = 21 should be faster, unless there are bugs in your
user-supplied Jacobian routine. If you are supplying an approximate
Jacobian that differs from the true one, you might need to experiment
further with that approximation. There will be a tradeoff between
speed of the Jacobian evaluation and speed of convergence of the
internal Newton iterations that use the Jacobian.

If, as I suspect, you find that DLSODI with mf = 21 is running faster
than DLSODIS, then you can pursue ways to speed up DLSODI further.
First, it uses dense linear solvers from LINPACK, and you may have
access to vendor-supplied versions of those that are faster.

For either DLSODI or DLSODIS, you can experiment with the tolerances
to see if the solver gives acceptable solutions (at lower cost) with
looser tolerances. There is a limit to how much you can loosen,
before getting solutions that are either too inaccurate, or even
unstable.

Once you have settled all the issues raised above, I suggest
giving the SUNDIALS solver IDA a try, if you are willing to
provide the required C routines. IDA, with the dense linear
solver option and user-supplied Jacobian, should perform
similarly to DLSODI, but it uses a different version of the
underlying stiff methods, so may (or may not) be faster.

One other question: In your system, written as A(t,y) dy/dt = g(t,y),
is your matrix A always non-singular? If it is singular at t = 0, how
are you providing the initial value of dy/dt? There may be some
possible speedup there. If A is always non-singular, is it possible
to form A^{-1} g analytically? If so, that would give you a DLSODE
problem instead, and potentially greater efficiency, except that
then the Jacobian is (d/dy)(A^{-1} g), which you may or may not be
able to supply analytically.

I hope this helps. Best of luck.

Alan H

> From: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov>> on behalf of Mittleider (US), Dave N <dave.n.m...@BOEING.COM<mailto:dave.n.m...@BOEING.COM>>

> Sent: Tuesday, May 24, 2022 1:46 PM

> To: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov>>

> Subject: [sundials-users] dlsodis to sudials conversion
>
>
> Hello
>
> I've been using the legacy LSODE (specifically DLSODIS) group of solvers to create a real-time simulation of a stiff problem. The model is only 53 states so I thought DLSODIS would likely be fast enough. I calculate a simplified version of a fairly complex (but sparse) Jacobian (seems to be good enough) and a fairly complex (but sparse) inertial matrix and right hand forcing function. The model is unfortunately too slow by a factor of approximately 5 after many modifications to speed it up. Is there someone who could give me some advice on what improvement the Sundials methods could give me and whether or not a increase in throughput of 5x can reasonably be expected? I saw in a online discussion that Sundials has a parallel processing capability. Does this mean that it has GPU acceleration using CUDA? This seems my best bet. If this sounds like I'm asking too much from this kind of solver is there a recommended single pass solver (or other alternative) for stiff systems where I can just simplify my model until the stiff part is stable? Thank you for your help.
>
>
>
> Dave Mittleider
>
> ________________________________
>
> To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
>
> ________________________________
>
> To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
>

> ############################

>
> To unsubscribe from the SUNDIALS-USERS list:
> write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
>

############################

Mittleider (US), Dave N

unread,

Jun 2, 2022, 10:53:59 AM6/2/22

to SUNDIAL...@listserv.llnl.gov

Alan and Ray
Thank you for the suggestions. FYI, the A matrix is a complex combination of rotating inertias that are a function of blade lag, flap, azimuth angles and various changes due to the controls etc etc. The A matrix also includes inertial terms for rotating and oscillating airmasses, airframe rigid body inertias, and simple drivetrain terms. The forcing side of the equations aren't really equations at all except for some final summations of components with numerous subroutines for aerodynamics, landing gear forces, and other external forces feeding them. In general the models are sort of simple at hover or sitting on the ground but get very complicated and more coupled as speed is increased. I have taken your suggestion Ray and played around with the order of the solver and it has made a difference in the right direction. I really don't understand that so I am putting together a list of questions that at some point I'm going to dump on you so be prepared!!! At the moment I'm trying the RADAU5 solver that you suggest I try just to see what it does. Right now it is running slow as molasses even in Release mode so there must be something I'm doing wrong. I'm working through it and once I have some results I'll put them all together and show you what I've learned.

Regards
Dave

Hindmarsh, Alan Carleton

unread,

Jun 8, 2022, 12:08:15 PM6/8/22

to SUNDIAL...@listserv.llnl.gov

[resending, as this may not have been sent 6/1]

provide the required C routines. IDA, with the dense linear

solver option and user-supplied Jacobian, should perform
similarly to DLSODI, but it uses a different version of the
underlying stiff methods, so may (or may not) be faster.

One other question: In your system, written as A(t,y) dy/dt = g(t,y),
is your matrix A always non-singular? If it is singular at t = 0, how
are you providing the initial value of dy/dt? There may be some
possible speedup there. If A is always non-singular, is it possible
to form A^{-1} g analytically? If so, that would give you a DLSODE
problem instead, and potentially greater efficiency, except that
then the Jacobian is (d/dy)(A^{-1} g), which you may or may not be
able to supply analytically.

I hope this helps. Best of luck.

Alan H

On 5/31/22 9:08 AM, Mittleider (US), Dave N wrote:

Subject: [EXTERNAL] Re: [sundials-users] dlsodis to sudials conversion


EXT email: be mindful of links/attachments.




Hello David Mittleider,

Here are a few offhand thoughts:

First, a problem of size 53 is most likely not
large enough to benefit from a sparse treatment
of the Jacobian.   If you are getting a solution
with DLSODIS, I suggest trying DLSODI with
the dense Jacobian treatment.  You seem to have
a good approximate Jacobian in closed form,
so that can be supplied to DLSODE (in dense
form).

You say DLSODIS is too slow by a factor of 5.
What is that compared with?  What solver is
giving you results 5 times faster?

Size 53 is almost certainly too small to take
advantage of parallelism.  Also, parallel solution
of a stiff system involves a more complicated
treatment of the Jacobian and related linear
systems.  The dense and sparse solvers are
not usable in a parallel environment.

If you would like to try SUNDIALS, I suggest
IDA, with the dense linear solver, and the same
user-supplied Jacobian you have.  It should
perform similarly to DLSODI.  But there is no
reason to expect a major difference in speed.
If you can identify a stiff and nonstiff parts of
the right-hand side function (additively), you
might also try ARKODE.

-Alan H

________________________________
From: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov>> on behalf of Mittleider (US), Dave N <dave.n.m...@BOEING.COM<mailto:dave.n.m...@BOEING.COM>>
Sent: Tuesday, May 24, 2022 1:46 PM
To: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov>>
Subject: [sundials-users] dlsodis to sudials conversion


Hello

                I've been using the legacy LSODE (specifically DLSODIS) group of solvers to create a real-time simulation of a stiff problem. The model is only 53 states so I thought DLSODIS would likely be fast enough. I calculate a simplified version of a fairly complex (but sparse) Jacobian (seems to be good enough) and a fairly complex (but sparse) inertial matrix and right hand forcing function. The model is unfortunately too slow by a factor of approximately 5 after many modifications to speed it up. Is there someone who could give me some advice on what improvement the Sundials methods could give me and whether or not a increase in throughput of 5x can reasonably be expected? I saw in a online discussion that Sundials has a parallel processing capability. Does this mean that it has GPU acceleration using CUDA? This seems my best bet. If this sounds like I'm asking too much from this kind of solver is there a recommended single pass solver (or other alternative) for stiff systems where I can just simplify my model until the stiff part is stable? Thank you for your help.



Dave Mittleider

________________________________

To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV


________________________________

To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV


  ############################

  To unsubscribe from the SUNDIALS-USERS list:
  write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV

Mehmet Erol Sanliturk

unread,

Jun 8, 2022, 12:52:47 PM6/8/22

to SUNDIAL...@listserv.llnl.gov

I am resending this message , because it did not appear in the archive .

It seems that there is a ( or perhaps more ) missing step(s) in the

mailing list system implied by

Hindmarsh, Alan Carleton
7:08 PM (1 hour ago)
to SUNDIAL...@listserv.llnl.gov

[resending, as this may not have been sent 6/1]

in

https://groups.google.com/g/sundials-users/c/fu2m2jert2k

[sundials-users] dlsodis to sudials conversion

Mehmet Erol Sanliturk

---------- Forwarded message ---------
From: Mehmet Erol Sanliturk <m.e.sa...@gmail.com>
Date: Wed, Jun 1, 2022 at 8:56 PM
Subject: dlsodis to sudials conversion
To: <SUNDIAL...@listserv.llnl.gov>

For stiff problems , you may use

https://www.google.com/search?q=bailey+arbitrary+precision&sxsrf=ALiCzsbKBOXjx53rytyf_5XSAuNzR3uASA%3A1654105113836&source=hp&ei=GaSXYse5ML2Pxc8Pxs66kA8&iflsig=AJiK0e8AAAAAYpeyKaoCbOWvMlHFdQNno09imdZznwQD&oq=bailey+&gs_lcp=Cgdnd3Mtd2l6EAEYADIECCMQJzIFCAAQywEyBQguEMsBMgUIABDLATIFCAAQgAQyBQgAEMsBMgUIABDLATIFCAAQywEyBQgAEMsBMgUIABDLAToLCC4QgAQQxwEQowI6BQguEIAEOggILhCABBDUAjoLCC4QgAQQxwEQ0QNQAFjNF2CKLWgAcAB4AIABkQGIAYQHkgEDMC43mAEAoAEB&sclient=gws-wiz

bailey arbitrary precision

https://www.davidhbailey.com/dhbsoftware/

High-Precision Software Directory
Updated 18 May 2022

In that page you can find a suitable arbitrary precision arithmetic computation package .

When you use arbitrary precision arithmetic , you may compare results with computer precision arithmetic ,

and select one which is suitable for you .

https://www.davidhbailey.com/dhbpapers/mpfun2020.pdf

A technical paper describing the package and providing details for usage is available here: MPFUN2020 technical paper.

I could not remember the related entry now ,

but there should be a

program from Bailey being a COMPILER to compile an

ordinary FORTRAN program and execute it in

arbitrary precision arithmetic .

If you study related programs it is very likely that you will find it .

( I am not sufficiently well to study the issue more closely

to help you more . )

Obviously you may find many C or C++

arbitrary precision arithmetic packages .

With my best wishes for all ,

Mehmet Erol Sanliturk

Mittleider (US), Dave N

unread,

Jun 8, 2022, 12:52:48 PM6/8/22

to SUNDIAL...@listserv.llnl.gov

Ray and Alan

I’ve run some cases with the order reduced to 2 and 1, with DLSODIS replaced by RADAU5, and DLSODIS replaced by DLSODI.

Also, Alan asked a couple of questions in the last email that I appear to have not answered so to set the record straight on what the system is:

When I talk about the simplified Jacobian I’m referring to taking the analytical Jacobian and deleting all the terms that are normally minor off-axis details in the coupled blade and airframe equations. This reduces the Jacobian from being two multi-megabyte files to one, < 100K file, with only the necessary terms. There may be something to gain by including some of the terms I threw out, but for now I believe I’ve got the primary on-axis terms and the other terms are negligible.
The model has been run a number of times when debugging, with and without, the internal Jacobian being generated. Even got to the point where I was plotting the internally generated derivatives against the analytical Jacobian so the speed difference that is obtained with the analytical Jacobian has been characterized. With this model the difference is significant.
I believe that the A matrix is always non-singular.
Because the model starts up mis-trimmed I don’t bother trying to set initial conditions very accurately except for a rough guess at flap/lag angles, pitch/roll angles, sideslip angle, and a guess at control positions. The model runs for a few seconds while holding the pitch and roll angles and control positions so the blades can get to an equilibrium flapping. By that I mean that the rotor disc angles are constant but the blades themselves flap as required to keep the disc angles constant. Then control derivatives are calculated and used to trim the pitch,roll,yaw,ax,ay,az accelerations. The flapping blades are moved to form a disc that provides trimmed moments and forces with all the other fuselage, wings, and empennage details. I freeze the DLSODIS output vector components associated with the required trim states that need to be adjusted for trim and the terms in RWORK that pair with them. During the trim cycle I move the held value for each of the DLSODIS states until I get the desired zero body accelerations.

And now for some results and lots of questions.

For a default 5^th order integration the times were generally around .002 seconds real clock time for a .002 second simulation time step with occasional jumps to .01 seconds. Reducing the order in DLSODIS from 5->2 seemed to speed the model up so I changed 5->1. For an order of (1) the cycling time looks like the following:

The time function I’m using apparently has a .001 second resolution so here we are seeing the cycle time has now been reduced to approximately .0005 sec which is what I have been looking for. The occasional jumps to .003 seconds are likely not a problem because I’m really trying to fit a group of cycles into a outer .016 second cycle time the simulation hardware operates with. So an occasional blip should just get averaged out with the lower values around it.

I really don’t understand this. I thought for a stiff system one needs a 5^th order solution. But instead the model runs with what is a backward Euler integration. So what does that mean? What is this system?

In changing DLSODIS to DLSODI the timing changed to the following

Here the average is more like .001 seconds but at least the peaks are somewhere below .002 seconds. This makes me think that the sparse methods are applicable to this model even with the overhead needed to make it work.

In changing DLSODIS to RADAU5 I could never get the model to run stable. I tried every combination of tolerances and minor adjustments to physical properties (mainly damping and stiffnesses in some of the elastic terms) and nothing worked. I did improve the instability and made the exponential departure a little slower but not enough to fix it.

I still have to run the model in some more critical conditions, mainly high speed and agility maneuvers and hard landings for shock effects to see if the model is stable enough to be useful. But I guess I’d like to know what this model is from a numerical stability standpoint. What does it mean when a stiff model runs stable with a first order integrator. Does it mean it really isn’t stiff? Does it mean that some other part of the DLSODIS method (i.e. iterative improvement, or ???) is really what is stabilizing the model?

Since I want to add more detail to the model which will slow it down, with a model that has the character shown here, are there some additional steps/methods that could be done to get additional speed out of it?

Regards

Dave

P.S. after some comments from other users I have the following additions:

With regard to the Jacobian one of reasons I started looking for help was because when I made my earlier large change to the Jacobian where I dropped many small terms the cycle time hardly changed at all. That was one of the reasons I came looking for help was because it was clear the bottle neck was somewhere other than in my routines since I was fairly sure the derivatives were right. I had gone back and forth a number of times comparing those calculated internally and the analytically derived ones. I originally found errors in both but after working cycle organization and my ability to calculate long chain derivatives and integrate them properly over the blades and around the disc I finally got them all to match. Now that the bottle neck in the solver seems to be addressed, I’m going to see if I can create a table lookup for the derivatives so I only have to calculate the derivatives based on (rotor azimuth, speed ?, G’s ?) occasionally and the rest of the time just read from a table. It may be possible to calculate at the beginning of a run and use table lookup depending on azimuth for the rest of the run. The derivatives come in two parts, inertial and aero with the aero being by far the most complex. At the same time the aero seems fairly consistent so there may be a way to very simply create tables from the output. That is my hope.

OK, so I thought I got all the bugs out of RADAU5 but apparently I didn’t since it should work like DLSODIS as a backward euler, I’ll go back and see if I can get it working. Compare the output to DLSODIS to try and find why it’s not stable.

I have brought the idea of the sparsity index lookup tables into a lot of what I’ve done but lately I commented a lot of it out just out of an abundance of caution that I wasn’t getting it right somewhere and I wanted to run without it until I was sure I wasn’t causing problems that were hiding other issues. Now that we have a model that seems to do what is expected (i.e. speed up when logical steps are applied) I’m going to go back and reintroduce the sparsity index lookups and verify the answers don’t change. These calculations are relatively few compared to what is going on in the solver but at this point every little bit helps

Alan C. Hindmarsh

unread,

Jun 8, 2022, 4:42:09 PM6/8/22

to SUNDIAL...@listserv.llnl.gov

Hi David,

Several comments on all the various emails ...

Your development of the approximate Jacobian sounds exactly like
what I recommend for cases like this: Get the elements that are
most important numerically and drop the rest, especially if they
are costly. I am curious: In supplying (d/dy)(g - A s) in the
JAC routine, does the dropping of elements occur in dg/dy, or
in d(As)/dy, or both? Finally, I recommend that you check on
the effectiveness of this Jacobian by monitoring the performance
counters nst, nre, nje, etc. at the end of your runs.
The Jacobian is working well if nje/nst (J evaluations per step)
is small. If you do more dropping and see this ratio go up,
you have probably done too much.

Good to know A is always nonsingular. This means that you have
other options for solvers, namely LSODE, VODE, or CVODE with
dy/dt = f = A^{-1}g. But that involves a tradeoff that may or
may not improve the speed.

When you set initial conditions for LSODI/LSODIS, the value of y(0)
is not critical, but ydot(0) should be set accurately by solving
A ydot = g at t = 0, or letting the solver do this. Otherwise the
solver is dealing with a nonzero residual from the start.

About stiffness and orders, Ray is right -- BDF is stiffly stable at
all orders 1 to 5. The higher orders are helpful if your accuracy
needs are greater, but I'm guessing yours are not. Also, restricting
to order 3 (or 2 in extreme cases) is helpful to avoid a loss of
stability if there are highly oscillatory modes present in the system.
I trust you use the MAXORD input for that. If you monitor the method
order nqu with and without the order restricted, you can see where the
differences occur and (presumably) verify that the intervals with
higher order are more costly.

So DLSODIS runs faster that DLSODI (all else being equal)?
That is a bit surprising to me, because it means that the sparse
matrix operations (with lots of index lists etc.) are still
competitive with the N^2 and N^3 matrix operations for the dense LU
factor/backsolve method, even for N = 53. But that all depends on the
amount of sparsity, and yours must be enough to make it faster. What
is your value of NNZ/N^2 by the way?

I doubt that you will ever succeed in getting RADAU5 to run as fast as
LSODI or LSODIS, but that's my bias.

I do not think your problem calls for arbitrary precision software.
That would raise the cost with little or no benefit.

-Alan

On 6/8/22 9:13 AM, Mittleider (US), Dave N wrote:
> Ray and Alan
> I've run some cases with the order reduced to 2 and 1, with DLSODIS replaced by RADAU5, and DLSODIS replaced by DLSODI.
>
> Also, Alan asked a couple of questions in the last email that I appear to have not answered so to set the record straight on what the system is:
>

> 1. When I talk about the simplified Jacobian I'm referring to taking the analytical Jacobian and deleting all the terms that are normally minor off-axis details in the coupled blade and airframe equations. This reduces the Jacobian from being two multi-megabyte files to one, < 100K file, with only the necessary terms. There may be something to gain by including some of the terms I threw out, but for now I believe I've got the primary on-axis terms and the other terms are negligible.
> 2. The model has been run a number of times when debugging, with and without, the internal Jacobian being generated. Even got to the point where I was plotting the internally generated derivatives against the analytical Jacobian so the speed difference that is obtained with the analytical Jacobian has been characterized. With this model the difference is significant.
> 3. I believe that the A matrix is always non-singular.
> 4. Because the model starts up mis-trimmed I don't bother trying to set initial conditions very accurately except for a rough guess at flap/lag angles, pitch/roll angles, sideslip angle, and a guess at control positions. The model runs for a few seconds while holding the pitch and roll angles and control positions so the blades can get to an equilibrium flapping. By that I mean that the rotor disc angles are constant but the blades themselves flap as required to keep the disc angles constant. Then control derivatives are calculated and used to trim the pitch,roll,yaw,ax,ay,az accelerations. The flapping blades are moved to form a disc that provides trimmed moments and forces with all the other fuselage, wings, and empennage details. I freeze the DLSODIS output vector components associated with the required trim states that need to be adjusted for trim and the terms in RWORK that pair with them. During the trim cycle I move the held value for each of the DLSODIS states until I get the desired zero body accelerations.

>
> And now for some results and lots of questions.
>
> For a default 5th order integration the times were generally around .002 seconds real clock time for a .002 second simulation time step with occasional jumps to .01 seconds. Reducing the order in DLSODIS from 5->2 seemed to speed the model up so I changed 5->1. For an order of (1) the cycling time looks like the following:
>

> [cid:image0...@01D87B18.02FBD640]

> The time function I'm using apparently has a .001 second resolution so here we are seeing the cycle time has now been reduced to approximately .0005 sec which is what I have been looking for. The occasional jumps to .003 seconds are likely not a problem because I'm really trying to fit a group of cycles into a outer .016 second cycle time the simulation hardware operates with. So an occasional blip should just get averaged out with the lower values around it.
>
> I really don't understand this. I thought for a stiff system one needs a 5th order solution. But instead the model runs with what is a backward Euler integration. So what does that mean? What is this system?
>
> In changing DLSODIS to DLSODI the timing changed to the following

> [cid:image0...@01D87B18.02FBD640]

> To: SUNDIAL...@LISTSERV.LLNL.GOV<mailto:SUNDIAL...@LISTSERV.LLNL.GOV>

> From: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov><mailto:sundial...@llnl.gov><mailto:sundial...@llnl.gov>> on behalf of Mittleider (US), Dave N <dave.n.m...@BOEING.COM<mailto:dave.n.m...@BOEING.COM><mailto:dave.n.m...@BOEING.COM><mailto:dave.n.m...@BOEING.COM>>
>
> Sent: Tuesday, May 24, 2022 1:46 PM
>

> To: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov><mailto:sundial...@llnl.gov><mailto:sundial...@llnl.gov>>

Reply all

Reply to author

Forward