Hello
I’ve been using the legacy LSODE (specifically DLSODIS) group of solvers to create a real-time simulation of a stiff problem. The model is only 53 states so I thought DLSODIS would likely be fast enough. I calculate a simplified version of a fairly complex (but sparse) Jacobian (seems to be good enough) and a fairly complex (but sparse) inertial matrix and right hand forcing function. The model is unfortunately too slow by a factor of approximately 5 after many modifications to speed it up. Is there someone who could give me some advice on what improvement the Sundials methods could give me and whether or not a increase in throughput of 5x can reasonably be expected? I saw in a online discussion that Sundials has a parallel processing capability. Does this mean that it has GPU acceleration using CUDA? This seems my best bet. If this sounds like I’m asking too much from this kind of solver is there a recommended single pass solver (or other alternative) for stiff systems where I can just simplify my model until the stiff part is stable? Thank you for your help.
Dave Mittleider
To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
In addition to what Alan said, here are a few more thoughts:
Generally the GPU acceleration in SUNDIALS requires a much larger problem, or the ability to group many small (independent) systems together to really be beneficial. This is because the overhead of launching kernels on the GPU is typically on the order of several microseconds no matter the problem size. A costly, but highly parallelizable, right-hand-side function and Jacobian evaluation function may benefit from GPU acceleration, but again with only 53 states it is unlikely that the GPU acceleration will be beneficial here.
SUNDIALS has both an OpenMP and a PThreads N_Vector that can be used for threaded parallelism and can be paired with the dense and sparse solvers, but the linear solvers themselves are not threaded. You could then use threads in your right-hand-side and Jacobian evaluation too. However, with 53 states, again you will be limited with the amount of speedup you can obtain (if any).
You mentioned this is for a real-time simulation. So are your speed requirements coming from needing to evaluate the problem within some sort of real-time control context? I would assume this means there is not a way for you to evaluate multiple systems at a time (e.g. if you have data is streaming in), but if you could, this may be one route to achieving a bigger speedup.
Cheers,
Cody
--
Cody Balos
Computer Scientist
Center for Applied Scientific Computing
Applications, Simulations, & Quality
Lawrence Livermore National Laboratory
Mr Hindmarsh
Thankyou for getting back to me. The 5x speed I referred to was the need to run faster than real-time to leave time so other processes (displays etc) can also be run in the simulation as part of a real-time process. The idea I was trying to accomplish was to have a flexible solver that I could use with a variety of complex/stiff systems but still have a good throughput rate. Even though the solver is complex compared to a single step solver I had convinced myself (for no particular reason other than wishful thinking) that if the problem is small enough that the throughput would still be good and I could take advantage of the really good stability properties of the solver with stiff systems. The matrices are quite sparse because a number of the equations are just second order so have a very simple second state that expands the size of the first order system model used by the solver. I guess if the solver itself can’t be parallelized I’m probably not going to get this to work. I will try something else. Thanks again for your help.
Dave Mittleider
From: sundial...@llnl.gov <sundial...@llnl.gov>
On Behalf Of Hindmarsh, Alan Carleton
Sent: Saturday, May 28, 2022 2:15 PM
To: SUNDIAL...@LISTSERV.LLNL.GOV
Subject: [EXTERNAL] Re: [sundials-users] dlsodis to sudials conversion
|
Cody
Thanks for the response. I just wrote an email to a reference Alan gave me (Raymond Spiteri) so I’m including it here so you can see a little more of where I’m coming from.
Anyway, the model is just a typical helicopter (think Apache or AH-6) where I’m trying to be much more complete in the various couplings and system details. We have lots of modeling like this but they are meant for detailed off-line studies so have no real-time application. I’m trying to improve the fidelity of the modeling while retaining a real-time capability primarily to support piloted evaluations that often have started with the pilot first complaining that the simulation doesn’t represent very well a detail of the helicopter characteristics that we are attempting to evaluate. In bringing in structural and control system compliance modeling along with some detailed aerodynamics and the normal aircraft rigid body modes some of this missing information has been shown to significantly alter the ability of the model to duplicate some of these characteristics without resorting to fudge factors and other things that turn the simulation into a cartoon rather than providing real engineering information. The model I’m currently working with (53 states) is about as simple as one can get but I’m hoping once I get this working to add a lot more to it. Unfortunately I’m already stuck with regard to throughput. I had thought these models are really small compared to the kinds of systems the LSODE integrators were normally used for (i.e nuclear bombs) so speed wouldn’t be an issue. Clearly I was wrong. So my question is do you have a suggestion for some single pass integration package that does a good job on stiff systems? If I can start with something state-of-the-art then I can play around with it to see what I can get away with in model detail.
If you’re interested, I’ve attached the system in a hover at one time step. I’m normally getting a cycle time around .002 seconds and I need it to be no more than about .0005 seconds or less.
Again, thanks again for the explanation of how SUNDIALS works.
Dave Mittleider
From: sundial...@llnl.gov <sundial...@llnl.gov>
On Behalf Of Balos, Cody Joe
Sent: Tuesday, May 31, 2022 8:49 AM
To: SUNDIAL...@LISTSERV.LLNL.GOV
Subject: [EXTERNAL] Re: [sundials-users] dlsodis to sudials conversion
|
In addition to what Alan said, here are a few more thoughts:
Subject: [EXTERNAL] Re: [sundials-users] dlsodis to sudials conversion EXT email: be mindful of links/attachments. Hello David Mittleider, Here are a few offhand thoughts: First, a problem of size 53 is most likely not large enough to benefit from a sparse treatment of the Jacobian. If you are getting a solution with DLSODIS, I suggest trying DLSODI with the dense Jacobian treatment. You seem to have a good approximate Jacobian in closed form, so that can be supplied to DLSODE (in dense form). You say DLSODIS is too slow by a factor of 5. What is that compared with? What solver is giving you results 5 times faster? Size 53 is almost certainly too small to take advantage of parallelism. Also, parallel solution of a stiff system involves a more complicated treatment of the Jacobian and related linear systems. The dense and sparse solvers are not usable in a parallel environment. If you would like to try SUNDIALS, I suggest IDA, with the dense linear solver, and the same user-supplied Jacobian you have. It should perform similarly to DLSODI. But there is no reason to expect a major difference in speed. If you can identify a stiff and nonstiff parts of the right-hand side function (additively), you might also try ARKODE. -Alan H ________________________________ From: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov>> on behalf of Mittleider (US), Dave N <dave.n.m...@BOEING.COM<mailto:dave.n.m...@BOEING.COM>> Sent: Tuesday, May 24, 2022 1:46 PM To: sundials-users <sundial...@llnl.gov<mailto:sundial...@llnl.gov>> Subject: [sundials-users] dlsodis to sudials conversion Hello I've been using the legacy LSODE (specifically DLSODIS) group of solvers to create a real-time simulation of a stiff problem. The model is only 53 states so I thought DLSODIS would likely be fast enough. I calculate a simplified version of a fairly complex (but sparse) Jacobian (seems to be good enough) and a fairly complex (but sparse) inertial matrix and right hand forcing function. The model is unfortunately too slow by a factor of approximately 5 after many modifications to speed it up. Is there someone who could give me some advice on what improvement the Sundials methods could give me and whether or not a increase in throughput of 5x can reasonably be expected? I saw in a online discussion that Sundials has a parallel processing capability. Does this mean that it has GPU acceleration using CUDA? This seems my best bet. If this sounds like I'm asking too much from this kind of solver is there a recommended single pass solver (or other alternative) for stiff systems where I can just simplify my model until the stiff part is stable? Thank you for your help. Dave Mittleider ________________________________ To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
________________________________ To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
############################ To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV
In that page you can find a suitable arbitrary precision arithmetic computation package .
Ray and Alan
I’ve run some cases with the order reduced to 2 and 1, with DLSODIS replaced by RADAU5, and DLSODIS replaced by DLSODI.
Also, Alan asked a couple of questions in the last email that I appear to have not answered so to set the record straight on what the system is:
And now for some results and lots of questions.
For a default 5th order integration the times were generally around .002 seconds real clock time for a .002 second simulation time step with occasional jumps to .01 seconds. Reducing the order in DLSODIS from 5->2 seemed to speed the model up so I changed 5->1. For an order of (1) the cycling time looks like the following:
The time function I’m using apparently has a .001 second resolution so here we are seeing the cycle time has now been reduced to approximately .0005 sec which is what I have been looking for. The occasional jumps to .003 seconds are likely not a problem because I’m really trying to fit a group of cycles into a outer .016 second cycle time the simulation hardware operates with. So an occasional blip should just get averaged out with the lower values around it.
I really don’t understand this. I thought for a stiff system one needs a 5th order solution. But instead the model runs with what is a backward Euler integration. So what does that mean? What is this system?
In changing DLSODIS to DLSODI the timing changed to the following
Here the average is more like .001 seconds but at least the peaks are somewhere below .002 seconds. This makes me think that the sparse methods are applicable to this model even with the overhead needed to make it work.
In changing DLSODIS to RADAU5 I could never get the model to run stable. I tried every combination of tolerances and minor adjustments to physical properties (mainly damping and stiffnesses in some of the elastic terms) and nothing worked. I did improve the instability and made the exponential departure a little slower but not enough to fix it.
I still have to run the model in some more critical conditions, mainly high speed and agility maneuvers and hard landings for shock effects to see if the model is stable enough to be useful. But I guess I’d like to know what this model is from a numerical stability standpoint. What does it mean when a stiff model runs stable with a first order integrator. Does it mean it really isn’t stiff? Does it mean that some other part of the DLSODIS method (i.e. iterative improvement, or ???) is really what is stabilizing the model?
Since I want to add more detail to the model which will slow it down, with a model that has the character shown here, are there some additional steps/methods that could be done to get additional speed out of it?
Regards
Dave
P.S. after some comments from other users I have the following additions:
With regard to the Jacobian one of reasons I started looking for help was because when I made my earlier large change to the Jacobian where I dropped many small terms the cycle time hardly changed at all. That was one of the reasons I came looking for help was because it was clear the bottle neck was somewhere other than in my routines since I was fairly sure the derivatives were right. I had gone back and forth a number of times comparing those calculated internally and the analytically derived ones. I originally found errors in both but after working cycle organization and my ability to calculate long chain derivatives and integrate them properly over the blades and around the disc I finally got them all to match. Now that the bottle neck in the solver seems to be addressed, I’m going to see if I can create a table lookup for the derivatives so I only have to calculate the derivatives based on (rotor azimuth, speed ?, G’s ?) occasionally and the rest of the time just read from a table. It may be possible to calculate at the beginning of a run and use table lookup depending on azimuth for the rest of the run. The derivatives come in two parts, inertial and aero with the aero being by far the most complex. At the same time the aero seems fairly consistent so there may be a way to very simply create tables from the output. That is my hope.
OK, so I thought I got all the bugs out of RADAU5 but apparently I didn’t since it should work like DLSODIS as a backward euler, I’ll go back and see if I can get it working. Compare the output to DLSODIS to try and find why it’s not stable.
I have brought the idea of the sparsity index lookup tables into a lot of what I’ve done but lately I commented a lot of it out just out of an abundance of caution that I wasn’t getting it right somewhere and I wanted to run without it until I was sure I wasn’t causing problems that were hiding other issues. Now that we have a model that seems to do what is expected (i.e. speed up when logical steps are applied) I’m going to go back and reintroduce the sparsity index lookups and verify the answers don’t change. These calculations are relatively few compared to what is going on in the solver but at this point every little bit helps