Dump and restore: shift in time by dt init

693 views
Skip to first unread message

Anthony Wachs

unread,
Sep 6, 2019, 7:26:09 PM9/6/19
to basilisk-fr
Hi everyone,
This is a long post about dumping and restoring simulations. Either I am doing something crazy wrong, or there is a little problem with time when restarting a simulation.

In the below, I assume that I understood properly that in Basilisk when we compute from t=t1 to tnext=t1+dt, the values of t is t1 until all events have been executed and eventually t is updated to tnext=t1+dt. This can easily be checked by outputting t over the events (which is what I did by adding lines to the attached centered.h)

Attached you can find the following files:
- cavity2D.c: a simple 2D lid driven cavity to illustrate/reproduce the problem and that you can run yourself. The problem happened in a much more advanced Basilisk code.
- navier-stokes/centered with many outputs of time and a new function dtnext_aw
- grid/events.h with at the bottom a new function dtnext_aw
- 3 logs of runs of cavity2D compiled with -events:
* basilisk-from_t=0.log: from a simulation from t=0 to t=0.01 and we dump the fields,

* basilisk-reload_with_dtnext.log: from a simulation from t=0.01 to 0.02 that is restarted from the dump file and using dtnext in line 216 of the attached centered.h. The log shows a shift in time by exactly dt = 0.1*mydt/1.1 as computed by timestep in timestep.h as previous is always 0 at the 1st call to the timestep function, and the simulation actually restarts from t=0.01 + dt. Note that with mydt=1e-4, you have dt = 0.1*mydt/1.1 ~  9.09e-6 which is exactly the shift you see in the log, in fact in the log file you see for the 1st actual time step:
events (i = 115, t = 0.0100091)

* basilisk-reload_with_dtnext_aw.log: from a simulation from t=0.01 to 0.02 that is restarted from the dump file and using dtnext_aw in line 217 of the attached centered.h. The log does not show any shift in time and indeed we have om the log file:
events (i = 115, t = 0.01)


Now some comments:
1) Major issue
When the code restarts from dump, it executes in "events(i=0,t=0)"  the events defaults, init, properties, stability and then is done for this sequence of events. And increments the time t by dt.
The problem is that stability, and precisely dtnext returns a non zero dt, and hence the time is incremented by dt. However, nothing has been solved, this is simply an initialization of the fields and properties. You can see that the other events in centered.h are not executed, while they are when we start from t=0 without reloading from a dump file (compare the logs file).
So the problem is that the code does not solve anything, but still increments the time, I believe that (i) either this should not happen or (ii) the other events should also be executed as when we start from t=0.

After searching which method does what for 2 days, I eventually came up with my own version of dtnext, called temporarily dtnext_aw, that essentially returns 0 if the simulation is a restart from a dump file, such that over "events(i=0,t=0)" the time t is not incremented.
This seemed to fix the problem but am not sure if this is in the "Basilisk spirit". Maybe Stephane can comment.

2) Minor issue 1
I also had problems with stopping a simulation at exactly a prescribed time. And sometimes Basilisk would compute an additional time step that I do not want to happen.
I ended up thinking that this is a classical problem of comparison of doubles and to avoid this I implemented a condition of the form "t - maxtime < RoundDoubleCoef * dt". Doing this, the code stops exactly when I want it too.
Any comment ?

3) Minor issue 2
Strangely the variable "t" is not assigned the right value in the event "stability" over "events(i=0,t=0)" for a restarted simulation only. If you look closely into the log files, you will see the following sequence of outputs:

INIT TIME 0.010000
Before stab 1.000000e-02 1.200000e-02 1.000000e+00
stab t = 0.000000e+00 0 1.200000e-02
stab t after dtnext = 0.000000e+00 9.090909e-06 0
After stab 1.000000e-02 1.000909e-02 9.090909e-06

The first value of always the time of restart. Here it is supposed to be 0.01, but in stability it is 0 (in red). This is not the case over the subsequent time steps (check the log files). I thought "t" is a "global" variable that has the same value everywhere in the code.
This seems to be the case except in stability. Any clue about why this is happening ?

4) Minor issue 3
The value of the last dt used before a simulation stops is not dumped. So when a simulation is restarted, it has no knowledge of the value of dt over the previous time step. In fact, "previous" is always initialize to 0 in the 1st call of timestep. This can lead to big jump of the value of dt between the last dt over the previous run and the dt over the 1st time step of the restarted simulation.
There would be an interest in dumping the last value of dt by for instance adding a variable dt to the struct DumpHeader and then reading it over a restart and initialize "previous" in the timestep function with this value read. After all, the equation dtmax = (previous + 0.1*dtmax)/1.1 in timestep is meant to limit the change of dt between 2 consecutive time steps, so it would be great if this would work too between 2 consecutive time steps of 2 consecutive simulations.


Once again, in case I am doing anything awfully wrong, please let me know. Otherwise, happy to read your comments/advice.
Cheers,
Anthony

basilisk-from_t=0.log
basilisk-reload_with_dtnext.log
basilisk-reload_with_dtnext_aw.log
cavity2D.c
centered.h
events.h

Stephane Popinet

unread,
Sep 12, 2019, 6:14:17 AM9/12/19
to basil...@googlegroups.com
Hi Anthony,

Thanks for the bug report.

> 1) Major issue
> So the problem is that the code does not solve anything, but still
> increments the time, I believe that (i) either this should not happen or
> (ii) the other events should also be executed as when we start from t=0.

I agree, although I would not necessarily qualify this as a "major
issue" since the initial timestep is very small, and thus so is the
error in restart time.

Could you please try the following fix:

navier-stokes/centered.h:199
- event ("stability");
+ tnext = t;

and tell me if you are happy with it.

> After searching which method does what for 2 days, I eventually came up
> with my own version of dtnext, called temporarily dtnext_aw, that
> essentially returns 0 if the simulation is a restart from a dump file,
> such that over "events(i=0,t=0)" the time t is not incremented.
> This seemed to fix the problem but am not sure if this is in the
> "Basilisk spirit". Maybe Stephane can comment.

This is indeed not in the "Basilisk spirit" since the issue here is
solver-dependent (i.e. how the timestepping is done in
navier-stokes/centered.h), not really a problem with events and/or
dtnext. If the fix above works, it is clearly better.

> 2) Minor issue 1
> I also had problems with stopping a simulation at exactly a prescribed
> time. And sometimes Basilisk would compute an additional time step that
> I do not want to happen.
> I ended up thinking that this is a classical problem of comparison of
> doubles and to avoid this I implemented a condition of the form "t -
> maxtime < RoundDoubleCoef * dt". Doing this, the code stops exactly when
> I want it too.
> Any comment ?

This is not "really" a bug since you would have the same issue with any
for loop etc. Floating-point comparisons are always tricky. That said,
one could think of modifying events so that this does what one expects
(rather than what is "floating-point" correct).

Generally speaking, as you probably painfully found out, the "events"
code is one of the crappiest part of Basilisk and would benefit from a
complete rewrite.

> 3) Minor issue 2
> Strangely the variable "t" is not assigned the right value in the event
> "stability" over "events(i=0,t=0)" for a restarted simulation only.
> This seems to be the case except in stability. Any clue about why this
> is happening ?

Not sure. I haven't looked closely enough.

> 4) Minor issue 3
> The value of the last dt used before a simulation stops is not dumped.

This is actually part of a wider problem which will be addressed at some
point: dump files do not generally store enough info. In particular they
do not store any face/vertex fields and boundary values. This makes
restarting much more complicated since each solver must ensure that
these fields/values are properly reconstructed after a restore().

cheers,

Stephane

Anthony Wachs

unread,
Oct 1, 2019, 5:30:06 PM10/1/19
to basilisk-fr
Hi Stephane,
I have looked at the problem of time of restart and initial dt over a restart. Please see my answers below.

Could you please try the following fix:

navier-stokes/centered.h:199
-  event ("stability");
+  tnext = t;

and tell me if you are happy with it.

Yes this works and is indeed simpler than my solution. However, for a run starting from t=0, i.e. without any restart from a dump file, this will change the computation of the initial time step from (0.1/1.1)*dtmax to ((0.1/1.1+0.1)/1.1)*dtmax.
In fact, setting t = tnext makes the condition tnext > t in the function dtnext not satisfied and tnext is set to t + dt, but dt is computed by calling 2 times the function timestep, which results in the expression ((0.1/1.1+0.1)/1.1)*dtmax.
Since 0.1/1.1 ~ 0.1 and (0.1/1.1+0.1)/1.1 ~ 0.2, the difference is that the first time step of a simulation starting from t=0 will be ~0.2*dtmax instead of ~0.1*dtmax.
This is not a problem at all.
Could you please add the correction you suggested to the next release of Basilisk ? Thank you in advance.
 
> 4) Minor issue 3
> The value of the last dt used before a simulation stops is not dumped.

This is actually part of a wider problem which will be addressed at some
point: dump files do not generally store enough info. In particular they
do not store any face/vertex fields and boundary values. This makes
restarting much more complicated since each solver must ensure that
these fields/values are properly reconstructed after a restore().

I am attaching a very basic fix to ensure the continuation of the computation of dt over a restart.
Basically, the modifications are as follows:
- 2 new global variables are created in common.h: dtinit and trestart.
dtinit is the last time step in the previous simulation. trestart is more for convenience purpose.
- in the output.h file: dt is added to the dump file and then read from the dump file in case of restart. The value of trestart is also assigned there in case of restart.
- the function timestep in the file timestep.h initializes the variable previous to dtinit using a static counter.

The 3 modified files are attached. All modifications are embedded in comments of the form:
// --------------------------------
// Modification A. Wachs 11/09/2019

// End Modification
// --------------------------------

In the type of simulations we run, we have noticed a non negligible effect of abruptly changing the time step magnitude from the last time step of one simulation to the first time step of a restarted simulation, while limiting the variation of dt between 2 time steps (which again is what the function timestep already does with dtmax = (previous + 0.1*dtmax)/1.1) has made the restart procedure totally transparent.
Again, my modifications might not be general enough (if timestep is called by multiple modules, the static counter will not do the right job), but when only navier-stokes is solved, they work.
In general, it may be helpful to have a boolean as a global variable that is set to true (restarted simulation) or false (standard simulation from t=0) and use this boolean throughout the whole code to do anything specific to a restarted simulation. Just food for thought.

Best,
Anthony
common.h
output.h
timestep.h
Reply all
Reply to author
Forward
0 new messages