Hi everyone,
This is a long post about dumping and restoring simulations. Either I am doing something crazy wrong, or there is a little problem with time when restarting a simulation.
In the below, I assume that I understood properly that in Basilisk when we compute from t=t1 to tnext=t1+dt, the values of t is t1 until all events have been executed and eventually t is updated to tnext=t1+dt. This can easily be checked by outputting t over the events (which is what I did by adding lines to the attached centered.h)
Attached you can find the following files:
- cavity2D.c: a simple 2D lid driven cavity to illustrate/reproduce the problem and that you can run yourself. The problem happened in a much more advanced Basilisk code.
- navier-stokes/centered with many outputs of time and a new function dtnext_aw
- grid/events.h with at the bottom a new function dtnext_aw
- 3 logs of runs of cavity2D compiled with -events:
* basilisk-from_t=0.log: from a simulation from t=0 to t=0.01 and we dump the fields,
* basilisk-reload_with_dtnext.log: from a simulation from t=0.01 to 0.02 that is restarted from the dump file and using dtnext in line 216 of the attached centered.h. The log shows a shift in time by exactly dt = 0.1*mydt/1.1 as computed by timestep in timestep.h as previous is always 0 at the 1st call to the timestep function, and the simulation actually restarts from t=0.01 + dt. Note that with mydt=1e-4, you have dt = 0.1*mydt/1.1 ~ 9.09e-6 which is exactly the shift you see in the log, in fact in the log file you see for the 1st actual time step:
events (i = 115, t = 0.0100091)
* basilisk-reload_with_dtnext_aw.log: from a simulation from t=0.01 to 0.02
that is restarted from the dump file and using dtnext_aw in line 217 of the
attached centered.h. The log does not show any shift in time and indeed we have om the log file:
events (i = 115, t = 0.01)
Now some comments:
1) Major issue
When the code restarts from dump, it executes in "events(i=0,t=0)" the events defaults, init, properties, stability and then is done for this sequence of events. And increments the time t by dt.
The problem is that stability, and precisely dtnext returns a non zero dt, and hence the time is incremented by dt. However, nothing has been solved, this is simply an initialization of the fields and properties. You can see that the other events in centered.h are not executed, while they are when we start from t=0 without reloading from a dump file (compare the logs file).
So the problem is that the code does not solve anything, but still increments the time, I believe that (i) either this should not happen or (ii) the other events should also be executed as when we start from t=0.
After searching which method does what for 2 days, I eventually came up with my own version of dtnext, called temporarily dtnext_aw, that essentially returns 0 if the simulation is a restart from a dump file, such that over "events(i=0,t=0)" the time t is not incremented.
This seemed to fix the problem but am not sure if this is in the "Basilisk spirit". Maybe Stephane can comment.
2) Minor issue 1
I also had problems with stopping a simulation at exactly a prescribed time. And sometimes Basilisk would compute an additional time step that I do not want to happen.
I ended up thinking that this is a classical problem of comparison of doubles and to avoid this I implemented a condition of the form "t - maxtime < RoundDoubleCoef * dt". Doing this, the code stops exactly when I want it too.
Any comment ?
3) Minor issue 2
Strangely the variable "t" is not assigned the right value in the event "stability" over "events(i=0,t=0)" for a restarted simulation only. If you look closely into the log files, you will see the following sequence of outputs:
INIT TIME 0.010000
Before stab 1.000000e-02 1.200000e-02 1.000000e+00
stab t = 0.000000e+00 0 1.200000e-02
stab t after dtnext = 0.000000e+00 9.090909e-06 0
After stab 1.000000e-02 1.000909e-02 9.090909e-06
The first value of always the time of restart. Here it is supposed to be 0.01, but in stability it is 0 (in red). This is not the case over the subsequent time steps (check the log files). I thought "t" is a "global" variable that has the same value everywhere in the code.
This seems to be the case except in stability. Any clue about why this is happening ?
4) Minor issue 3
The value of the last dt used before a simulation stops is not dumped. So when a simulation is restarted, it has no knowledge of the value of dt over the previous time step. In fact, "previous" is always initialize to 0 in the 1st call of timestep. This can lead to big jump of the value of dt between the last dt over the previous run and the dt over the 1st time step of the restarted simulation.
There would be an interest in dumping the last value of dt by for instance adding a variable dt to the struct DumpHeader and then reading it over a restart and initialize "previous" in the timestep function with this value read. After all, the equation dtmax = (previous + 0.1*dtmax)/1.1 in timestep is meant to limit the change of dt between 2 consecutive time steps, so it would be great if this would work too between 2 consecutive time steps of 2 consecutive simulations.
Once again, in case I am doing anything awfully wrong, please let me know. Otherwise, happy to read your comments/advice.
Cheers,
Anthony