Problem with restarting

1,244 views
Skip to first unread message

Nadeem Malik

unread,
Aug 14, 2020, 1:56:28 PM8/14/20
to Nek5000

Hello Neks,

I tried to continue my previous run with restart file, but it began running from the start again, overwriting all my previous files. 
Obviously I am missing something -- greatly appreciate if this could be sorted out asap.

(1) A ran e3q with the following parameters, printing out some velcoity, vorticity files every 5000 steps:

#
# nek parameter file
#
[GENERAL]
stopAt = numsteps
numsteps = 5000
dt = 5.0e-04
timeStepper = bdf3
targetCfl = 0.5

TIMESTEP
writeInterval = 5000
.....

This has run ok, and at the end of 5000 timestps, my directly contains the files: 
build.log
e3q0.f00001
e3q.box
e3q.f
e3q.ma2
e3q.par
e3q.re2
e3q.usr
jobscript
makefile
nek5000
obj
SIZE
V_circulation.dat
vele3q0.f00001
vele3q0.f00002
vele3q0.f00003
vele3q0.f00004
vele3q0.f00005
vele3q0.f00006
vele3q.nek5000
vore3q0.f00001
vore3q0.f00002
vore3q0.f00003
vore3q0.f00004
vore3q0.f00005
vore3q0.f00006
vore3q.nek5000


(2) Now I want to continue from the above for an other 1000 timesteps (say), so I submitted exactly the same, except new *.par entries:

#
# nek parameter file
#
[GENERAL]
stopAt = numsteps
numsteps = 6000                      ! THIS IS THE ONLY DIFFERENCE
dt = 5.0e-04
timeStepper = bdf3
targetCfl = 0.5

TIMESTEP
writeInterval = 5000
.....

As you c an see, the only difference with the initial run is numsteps=6000.

When I submit this, it starts from the beginning isteps=1, simply overwriting the previous files -- no progress.

Obviously, there must be a flag for restart and reading the dump file (which is the dump file anyway ?).
Or do I have to explicitly printout the dump file for restart inside the *.usr file in my first run?

Thanks
Nadeem  

YuHsiang Lan

unread,
Aug 14, 2020, 2:13:01 PM8/14/20
to Nek5000
Hi Nadeem,

Nek5000 always start from istep=1, but if you restart from a file, it will continue the "time" from the solution.
Yes, Nek will dump file from f0001, so you have to move files to certain location to avoid overwrite.

Here is my workflow:

  mkdir run1
  mv *0.f0* run1
  cp run1/e3q0.f00006 r1.fld 

Note:
  - e3q0.f00006 is the latest one, make sure the simulation didn't stop in the middle of dumping files so the file is complete.
  - I like to rename the extension is ".fld" so I can mv *0.f0* without moving the restart file.
  - A memory efficient way is not "cp" file, but "link" file. Personally, "cp" is more robust for me.

Then set the restart file in par file by adding
  startFrom = r1.fld
under [general]


Hope this helps,
Yu-Hsiang
--

Philipp Schlatter

unread,
Aug 14, 2020, 2:17:32 PM8/14/20
to nek...@googlegroups.com

Hi,

probably important for your case (as I assume you want to do high-fidelity simulations); make sure to do a "full" restart based on 3 consecutive velocity fields as to not reduce the temporal order. Not sure whether there is an example that supports that, if not, we can help you setting it up.


Best,

Philipp

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.

Stefan K.

unread,
Aug 14, 2020, 2:21:31 PM8/14/20
to Philipp Schlatter, Nek5000

nadeem...@cantab.net

unread,
Aug 14, 2020, 3:08:08 PM8/14/20
to Philipp Schlatter, nek...@googlegroups.com

Yes, this is important for me.

 

How do I adjust the *.usr file (attached) to do this?

 

Thanks

Nadeem

e3q.usr

Nadeem Malik

unread,
Aug 18, 2020, 9:23:17 PM8/18/20
to Nek5000


I have looked at the link, see at the end (below). It is not quite clear to me.

In first run 1, I first need to dump the last 3 time steps, say istep=998, 999, and 1000. How do I do this?
If I assume that these last 3 dump files are labelled run1.f00008, run1.f00009, and run1.f00010 respectively, what do I do next?

From Stefan's workflow example, should I do;

 mkdir run2 
 mv *0.f0* run2 
 cp run2/run1.f00008 r1.fld 
 cp run2/run1.f00009 r2.fld 
 cp run2/run1.f00010 r3.fld 
 
Next, should I update the *.par file to,

[GENERAL]
startFrom = r1.fld 
stopAt = numsteps
numsteps =1000
dt = 2.50e-04
timeStepper = bdf3
targetCfl = 0.5

TIMESTEP
writeInterval = 1000

... but what about r2.fld and r3.fld?

Next, how I modify userchk? Should I un-comment the commented lines?


subroutine userchk()
include 'SIZE'
include 'TOTAL'
common /SCRNS/ w1 (lx1*ly1*lz1*lelv),
& w2 (lx1*ly1*lz1*lelv),
& omg(lx1*ly1*lz1*lelv,ldim)
character*80 fnames(3)
n = nx1*ny1*nz1*nelv
c if (.false.) then
c call blank(fnames,size(fnames)*80)
c fnames(1) ='rs6tgv0.f00001'
c fnames(2) ='rs6tgv0.f00002'
c fnames(3) ='rs6tgv0.f00003'
c call full_restart(fnames,3) ! replace istep=0,1,..
c endif
c
c iostep_full = iostep
c call full_restart_save(iostep_full)
if (mod(istep,50).ne.0) return
sum_e1 = 0.
sum_e2 = 0.
call curl(omg,vx,vy,vz,.false.,w1,w2)
do i = 1,n
vv = vx(i,1,1,1)**2 + vy(i,1,1,1)**2 + vz(i,1,1,1)**2
oo = omg(i,1)**2 + omg(i,2)**2 + omg(i,3)**2
sum_e1 = sum_e1 + vv*bm1(i,1,1,1)
sum_e2 = sum_e2 + oo*bm1(i,1,1,1)
enddo
e1 = 0.5 * glsum(sum_e1,1) / volvm1
e2 = 0.5 * glsum(sum_e2,1) / volvm1
if (nid.eq.0) write(6,2) time, e1, e2
2 format(1p3e13.4,' monitor')
return
end


Thanks
Nadeem

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

YuHsiang Lan

unread,
Aug 19, 2020, 12:06:25 AM8/19/20
to Nek5000
Hi Nadeem,

It has tons of details. I recommend to test it before applying onto the large computations.

- First, you need to dump the full-restart files.
  They area little bit different than the "normal" nek field fields. They are in double precision, and sometimes do extra interpolation for pressure onto pN grid (if needed, meaning you are using PnPn-2).
 
  This can be achieved by simply put the following into the userchk.


c iostep_full = iostep

c call full_restart_save(iostep_full)

  To save the memory, Nek will dump restart files alternatively into two sets [1-3] or [4-6].
  The outcome is, you will get 3 or 6 files, named rs6<case>0.f000[1-6]. (assuming you are using bdf2+ext3 or bdf3+ext3).

  Let suppose you have files rs6e3q0.f0000[1-3], which are dumped at iostep, iostep+1, and iostep+2 th timestep.
  You can double check the time/timesteps of the field files by reading the header, `head -1 re6*0.f0*`
  Recommend backup those 3 or 6 files for every jobs.

- The second, or the successive runs, should include the following into the beginning of the userchk
    character*80 fnames(3)

  call blank(fnames,size(fnames)*80)
  fnames(1) ='rs6tgv0.f00001'
  fnames(2) ='rs6tgv0.f00002'
  fnames(3) ='rs6tgv0.f00003'
  call full_restart(fnames,3) ! replace istep=0,1,.. 

  Replace the fnames by the full restart files you got from previous runs.
  The userchk at the zero-th, first, and the second timestep (again, assume you have 3 saved states) will read the filename stored in the "fname(1~3)".
  That will do the trick to flush out the history for the bdf+ext timestepping. You should get solution recovered after third or fourth timesteps.

 

- Some notes:
  - You probably won't get all 16-digits recover. It's achievable under certain condition. 
    Reasons: Some noise from mpi communication, some extra history (can up to 20+ states) used in residual projection, or maybe bad implementation to introduce more lag in time.
  - How to test this?
    Simple but easy to make mistakes.
    - referenced job: just run from 1 to 20 steps.
    - test job,
      - run1: run 1 to 20 step, but set iostep=10, it will dump full-restart files at step 10 11 12
      - run2: full-restart from step run1, run 10 steps.     
        run1  istep=10   istep=11  istep=12
        run2  istep=0     istep=1    istep=2
      Then compared the results between ref (20th step) and run2(10-th) step.

YuHsiang Lan

unread,
Aug 19, 2020, 12:12:27 AM8/19/20
to Nek5000

Philipp Schlatter

unread,
Aug 19, 2020, 10:22:13 AM8/19/20
to Nek5000

For completeness I wanted to mention another alternative. You could go via our KTH Framework. The ext_cyl_DNS example under https://github.com/KTH-Nek5000/KTH_Examples uses our full restart, which can be controlled fully via the input par file. The documentation for all of that is here (doxygen):

https://kth-nek5000.github.io/KTH_Framework/index.html
being more specific
https://kth-nek5000.github.io/KTH_Framework/group__chkpoint.html
and
https://kth-nek5000.github.io/KTH_Framework/group__chkpoint__mstep.html

Philipp

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/f1450d5e-b978-4c55-ab74-d7bc14110738o%40googlegroups.com.

nadeem...@cantab.net

unread,
Aug 19, 2020, 5:20:04 PM8/19/20
to Philipp Schlatter, Nek5000

Thanks. I will test these out .. and get back if needed.

 

Regards

Nadeem

 

From: nek...@googlegroups.com <nek...@googlegroups.com> On Behalf Of Philipp Schlatter
Sent: Wednesday, August 19, 2020 9:22 AM
To: Nek5000 <nek...@googlegroups.com>
Subject: Re: [nek5000] Re: Problem with restarting

 

For completeness I wanted to mention another alternative. You could go via our KTH Framework. The ext_cyl_DNS example under https://github.com/KTH-Nek5000/KTH_Examples uses our full restart, which can be controlled fully via the input par file. The documentation for all of that is here (doxygen):

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/9305eef1-19bf-4cc3-7a9d-724ef270ff00%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/f1450d5e-b978-4c55-ab74-d7bc14110738o%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

Nadeem Malik

unread,
Aug 22, 2020, 11:15:49 AM8/22/20
to Nek5000
Hi neks,

I have tried a couple of restart options, but no luck so far.

First, my *.par file looks like:

[GENERAL]
startFrom = r1.fld
stopAt = numsteps
numsteps =50
dt = 2.50e-04
timeStepper = bdf3
targetCfl = 0.5

userparam01 = 25
userparam02 = 50
userparam03 = 5

Second, the relevant portion of my *.usr file is:

       character*80 fnames(3)

      ifield = 1  ! for outpost
      n    = nx1*ny1*nz1*nelv
      n2   = nx2*ny2*nz2*nelv

      visc = param(2)
      irate1 = uparam(1)
      irate2 = uparam(2)
      iostep = uparam(3)

      call blank(fnames,size(fnames)*80)
      fnames(1) ='rs1.fld'
      fnames(2) ='rs2.fld'
      fnames(3) ='rs3.fld'
      call full_restart(fnames,3) ! replace istep=0,1,..
....
....

However,  when I run, the error message is:

 call userchk
 Reading checkpoint data
/work/06396/tg856952/stampede2/WorkNew/e3q_0/24/rs1.fld
byte_read() :: fopen failure2!
ERROR: Error reading restart header in mfi_prepare  ierr=  1

an error occured: dying ...

TACC:  MPI job exited with code: 1
TACC:  Shutdown complete. Exiting.


Can anyone help please.
Thanks
Nadeem

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

nadeem...@cantab.net

unread,
Aug 22, 2020, 11:23:38 AM8/22/20
to Nadeem Malik, Nek5000

Sorry, that should be ..

 

[GENERAL]

startFrom = rs1.fld           ß correction

stopAt = numsteps

….

 

Same error message.

 

From: nek...@googlegroups.com <nek...@googlegroups.com> On Behalf Of Nadeem Malik
Sent: Saturday, August 22, 2020 10:16 AM
To: Nek5000 <nek...@googlegroups.com>
Subject: Re: [nek5000] Re: Problem with restarting

 

Hi neks,

Dr Nadeem Malik

unread,
Aug 22, 2020, 3:15:29 PM8/22/20
to Nadeem Malik, Nek5000

In fact, I think that I have now sorted most of the issues. Just a couple of fine points.

First, I have been able to dump 3 files and restart by the following workflow:

(1)  My *.usr contains the following:

       integer e,eg,ex,ey,ez,f,irate1,irate2,irest
       character*80 fnames(3)


      irate1 = uparam(1)
      irate2 = uparam(2)
      irest  = uparam(3)
      iostep = uparam(4)

      if (irest.eq.0) then
          iostep_full = iostep
          call full_restart_save(iostep_full)
      else

         call blank(fnames,size(fnames)*80)
         fnames(1) ='rs1.fld'
         fnames(2) ='rs2.fld'
         fnames(3) ='rs3.fld'
         call full_restart(fnames,3) ! replace istep=0,1,..
      end if
...

(2) My first run dumps steps 10, 11, and 12, so my *.par contains:

[GENERAL]
#startFrom = rs1.fld
stopAt = numsteps
numsteps = 20

dt = 2.50e-04
timeStepper = bdf3
targetCfl = 0.5

userparam01 = 10
userparam02 = 20
userparam03 = 0
userparam04 = 10

(3) I see 3 files named rs_e3q0.f00001 *01, and *03; I rename these to rs1.fld, rs2.fld, and rs3.fld
Then I rerun  for a further 10 steps starting from 10 (or is it 11?), with the new *.par entries:

[GENERAL]
startFrom = rs1.fld
stopAt = numsteps
numsteps = 10

dt = 2.50e-04
timeStepper = bdf3
targetCfl = 0.5

userparam01 = 10
userparam02 = 10
userparam03 = 1
userparam04 = 8

This seems to restart and run, and produce new files. Some questions remain:

(i) How accurate is the restart; when I compare the last files contain some calculations, I see a small difference
between the straight  20 steps, and the restarted 20 steps:

Straight runs, 20 steps:
  321   5.00000000E-03   9.50012809E-03   2.45792784E+04   2.60201618E+02   3.93944173E+03
  325   5.00000000E-03   9.93029522E-03   2.56922315E+04   2.25725041E+02   3.93972916E+03
  329   5.00000000E-03   1.02087679E-02   2.64127121E+04   2.01700075E+02   3.94021456E+03
  333   5.00000000E-03   1.02852992E-02   2.66107184E+04   1.94992163E+02   3.94018812E+03
  337   5.00000000E-03   1.06277492E-02   2.74967247E+04   1.63251932E+02   3.94010784E+03
  341   5.00000000E-03   1.10478591E-02   2.85836572E+04   1.24098509E+02   3.94037922E+03
  345   5.00000000E-03   1.12491924E-02   2.91045584E+04   1.03449011E+02   3.94075114E+03
  349   5.00000000E-03   1.13701112E-02   2.94174063E+04   9.24367589E+01   3.94059821E+03
  353   5.00000000E-03   1.17605194E-02   3.04274928E+04   5.21455888E+01   3.94052500E+03
  357   5.00000000E-03   1.21509276E-02   3.14375794E+04   1.42264930E+01   3.94074582E+03

Restart run1 after 10 steps, 20 steps:
  321   5.00000000E-03   9.50012809E-03   2.45792784E+04   2.60545154E+02   3.93943715E+03
  325   5.00000000E-03   9.93029522E-03   2.56922315E+04   2.26024128E+02   3.93972472E+03
  329   5.00000000E-03   1.02087679E-02   2.64127121E+04   2.01921705E+02   3.94020690E+03
  333   5.00000000E-03   1.02852992E-02   2.66107184E+04   1.95225458E+02   3.94018301E+03
  337   5.00000000E-03   1.06277492E-02   2.74967247E+04   1.63469569E+02   3.94010255E+03
  341   5.00000000E-03   1.10478591E-02   2.85836572E+04   1.24259106E+02   3.94037416E+03
  345   5.00000000E-03   1.12491924E-02   2.91045584E+04   1.03580570E+02   3.94074187E+03
  349   5.00000000E-03   1.13701112E-02   2.94174063E+04   9.25536050E+01   3.94059328E+03
  353   5.00000000E-03   1.17605194E-02   3.04274928E+04   5.22158129E+01   3.94051886E+03
  357   5.00000000E-03   1.21509276E-02   3.14375794E+04   1.42361019E+01   3.94074075E+03

In the last 2 columns, there is a small difference --  I would like this to be smaller?

(ii) Following on from the above, it is important that I dump double precision files otherwise there is no point in
dumping 3 consecutive files. Is this default or does it need to be specified -- how?

(iii) suppose I want a consequence of restarts. I .e. in the 2nd and subsequent runs I want both a restart and then 
dump files at the end in preparation for the next run. How can I do this?

(iv) If the last time step is 100, say, and I want to dump at the end, should I set iostep=98? (This will dump 98,99,100 ?)
But then on restart, it will continue from 98 or 100 ?

(v) After 2 runs, how can I concatenate 2 results files, so that their time step numbers and times are continuous?

Thanks
Nadeem


-------------------------
Nadeem A. Malik

YuHsiang Lan

unread,
Aug 22, 2020, 5:08:13 PM8/22/20
to Nek5000
Hi Nadeem,

Some notes:
For (1)
  a. iostep is a Nek global variable, you can set it inside par file via the key "writeInterval" under [general]
      I guess what you want is a independent parameter to control fullrestart-dump. In that case, you are free to use iostep_full = uparam(4)
  b. uparam is real-type, and irest is integer-type. The line `irest  = uparam(3)` actually assign `irest  = int(uparam(3))` which is good since you want (irest.eq.0) later.
  c. The logic should be put at the restart part.
      We can dump files all the time, but we can only restart from the second runs. I suggest the follows to switch between 2 sets easily.
          iostep_full = iostep
          call full_restart_save(iostep_full)
      if (irest.eq.1) then
         call blank(fnames,size(fnames)*80)
         fnames(1) ='
rs_e3q0.f00001'
         fnames(2) ='
rs_e3q0.f00002'
         fnames(3) ='
rs_e3q0.f00003'

         call full_restart(fnames,3) ! replace istep=0,1,..
      elseif (irest.eq.2) then
         call blank(fnames,size(fnames)*80)
         fnames(1) ='
rs_e3q0.f00004'
         fnames(2) ='
rs_e3q0.f00005'
         fnames(3) ='
rs_e3q0.f00006'

         call full_restart(fnames,3) ! replace istep=0,1,..
      endif
For (3)
  d. You can find the info from the logfile to see when a file is read or write.
      Search the filename in the logfile
 Reading checkpoint data
       FILE:<my_path_to_the_case>/<file_name>
        0  1.0000E-03 done :: Read checkpoint data


      Together with the "header" of the files:
      head -1 rs_eddy_fullrestart_run10.f00001
      #std 8  8  8  1        256        256  0.1000000000000E-02        10      0      1 XUP         0.0000000E+00 T                      �a�@
      The first "8" indicates the file is under double presicion.
      Others are listed at here:
      This should help you figure out the timing.
          rs_files      file1        file2        file3
          dump at iostep     iostep+1  iostep+2
          read at   istep=0   istep=1    istep=2

  (3) (i), what is the numbers you print?
  (3) (ii), inside fill_restart_save, the p63=1 is set, you should find the files are already under double precision (check the header)
  (3) (iii), see c. above.
  (3) (iv), This is up to you. In my case, I don't need my files dumped at the specific timestep. I usually control the wall time of the job instead of numbers of timesteps.
     Usually, it's hard to predict when the simulation is done, so my iostep is small enought to make sure my last save is close to the end of the simulation but large enough to reduce cost. of dumping files.
     That's also why we need two sets or full_restart files, one of them is for backup to prevent simulation stopping in the middle of dumping files.
    
  (3) (v) I don't quite understand this. You should have correct time as long as you choose the right files to restart. Timesteps, by default, are reset to 0.


Yu-Hsiang
--

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

nadeem...@cantab.net

unread,
Aug 22, 2020, 5:47:44 PM8/22/20
to YuHsiang Lan, Nek5000

Thanks.

 

Double precision ok.

 

I have modified the *.usr :

 

       integer e,eg,ex,ey,ez,f,irate1,irate2,irest

       character*80 fnames(3)

       irest  = nint(uparam(3))

       iostep = nint(uparam(4))

 

       iostep_full = iostep

       if (irest.ge.1) call full_restart_save(iostep_full)

       if (irest.eq.2) then

         call blank(fnames,size(fnames)*80)

         fnames(1) ='rs1.fld'

         fnames(2) ='rs2.fld'

         fnames(3) ='rs3.fld'

         call full_restart(fnames,3) ! replace istep=0,1,..

       end if

 

Furthermore, in *.par I already have:

 

TIMESTEP

writeInterval = 10000

 

is this the same? Then this writes 3-dump files every 10000 steps?

So after 20000 steps does it overwrite the previous 3 or creates new files?

 

About continuous time, I mean that if I dump files # 10,11,12  then does the  restart start calculating from  step 11 or 13? It makes a difference since I may lose 2 time steps. So if I want to stop—start in groups of 10 time steps, I should dump with iostep=8?

 

The numbers are simply some quantities (not important here) that I calculate, and they should be identical if the restart is perfect.

To simply, the first line is from a single straight run of 2o timesteps; the 2nd line compares with a stop-restart of 10 time steps each:

 

357   5.00000000E-03   1.21509276E-02   3.14375794E+04   1.42264930E+01   3.94074582E+03

357   5.00000000E-03   1.21509276E-02   3.14375794E+04   1.42361019E+01   3.94074075E+03

 

The last 2 columns differ a bit – really they should  be equal. This suggest that either a restart is never perfect, or I have missed a time step or two in the restart workflow?

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/b9641ea6-4dd3-4d6f-9f96-24bc0c4c6a13o%40googlegroups.com.

YuHsiang Lan

unread,
Aug 24, 2020, 3:06:07 PM8/24/20
to Nek5000
Hi Nadeem,

Then this writes 3-dump files every 10000 steps?
Yes.
 

So after 20000 steps does it overwrite the previous 3 or creates new files?

At the 20000 th step, it will start to dump the second set, rs6<case>0.f0000[4-6].
At the 30000 th step, it will start to overwrite the first set, rs6<case>0.f0000[1-3].
 

The last 2 columns differ a bit...

Right, so my actual question is, why the 3rd and 4th column are matched but the last two only have 6 digit.
I also look at your user file, but I cannot figure out why fbax and fbay behaves different.
    i,time,zbar(i,1),fbax(i,1),fbay(i,1),fbaz(i,1)

Have you checked if vx, vy and vz restart accurately (14 - 16 digits)?
Or do you have some calculation or pre-processing depended on the timesteps?

Yu-Hsiang
--

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+unsubscribe@googlegroups.com.

Nadeem Malik

unread,
Aug 24, 2020, 3:36:45 PM8/24/20
to Nek5000

> Right, so my actual question is, why the 3rd and 4th column are matched but the last two only have 6 digit.
> I also look at your user file, but I cannot figure out why fbax and fbay behaves different. 
>    i,time,zbar(i,1),fbax(i,1),fbay(i,1),fbaz(i,1)

This is a problem which I am trying to sort out. I have modified *.usr so that I am now printing out  just the x- and z-
circualtions (fbax,fbaz) and x and z plane coords (xbar,zbar):

            write(57,3) i,time,xbar(i,1),zbar(i,1),fbax(i,1),fbaz(i,1)

However, the ranges of x and z should both be in the range [-pi,pi]; but I am getting the following type of output:

    1   5.00000000E-03  -1.53929693E+06  -1.20944950E+02  -2.07214900E+03  -8.09433203E+00
    5   5.00000000E-03  -1.53929693E+06  -1.20944927E+02  -2.07247789E+03  -8.09561677E+00
    9   5.00000000E-03  -1.53929693E+06  -1.20944934E+02  -2.07261202E+03  -8.09614069E+00

As you can see, the xbar values are -1e06, and zbar values are -1e02, both very large. I have tried some variations,
but no luck so far. For example, for the x-coord and x-circulation I try (the ines starting "!" are previous attempts):

!         x-component of circulation in x-planes: Gx = Int_{x-planes} [Vor_x (dS)x]
          eg = lglel(e)     ! Which processor has which elements
          call get_exyz(ex,ey,ez,eg,nelyz,1,nelx)
          do k=1,lx1
          do i=1,ny1*nz1
             fbax(k,ex) = fbax(k,ex)+area(i,1,fx,e)*abs(vort(i,1,k,e,1)) ! x-coord of vorticity
!            xbar(k,ex) = xbar(k,ex)+area(i,1,fx,e)*xm1(i,1,k,e)         ! x-coord on mesh 1
!            xbar(k,ex) = xm1(i,1,k,e)         ! x-coord on mesh 1
             xbar(k,ex) = xm1(1,1,1,e)         ! x-coord on mesh 1
             wghx(k,ex) = wghx(k,ex)+area(i,1,fx,e)
          enddo
          enddo

       call gop(fbax,worx,'+  ',lx1*nelv)
!      call gop(xbar,worx,'+  ',lx1*nelv)
       call gop(wghx,worx,'+  ',lx1*nelv)
       do i=1,lx1*nelx
          fxav(i,1)=fbax(i,1)/wghx(i,1) ! If you want the average
!         xbar(i,1)=xbar(i,1)/wghx(i,1)
       enddo

[Simialar for z- ]

        write(57,'(A,1pe14.7,1x,i7,1x,i7)') '# time,irate1,irate2 = ',
     $                                          time,irate1,irate2
         do i = 1,lz1*nelz,4
            write(57,3) i,time,xbar(i,1),zbar(i,1),fbax(i,1),fbaz(i,1)
  3        format(i5,1x,5(1pe16.8,1x))

(Note, lx1=lz1=15, and nelx=nelz=24.)

The probelm appears gto be in, "xbar(k,ex) = xm1(1,1,1,e)" by which I mean xbar contains the
x-ccord of the (1,1,1) grid point in element e. This should be in the [-pi,pi] range? (same of Z-ccord.)

> Have you checked if vx, vy and vz restart accurately (14 - 16 digits)?
> Or do you have some calculation or pre-processing depended on the timesteps?

No. I am relying on the circulation calculation to tell me that -- but I am now worried that this may not be correct?

-nadeem

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/nek5000/e7ad923f-03b6-4dc4-a6a9-023ad768f9c3o%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Nek5000" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Nek5000" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nek5000+u...@googlegroups.com.

Nadeem Malik

unread,
Aug 24, 2020, 3:43:32 PM8/24/20
to Nek5000
p.s. The last 2 colums should be positive, because I am calualting the absolute value of the circulation:

             fbax(k,ex) = fbax(k,ex)+area(i,1,fx,e)*abs(vort(i,1,k,e,1)) ! x-coord of vorticity

So, there is clearly somthing wrong, maybe an index exceeding its limit?

-nadeem

Nadeem Malik

unread,
Aug 26, 2020, 1:22:44 PM8/26/20
to Nek5000
Thanks, restart problem is now sloved.
Reply all
Reply to author
Forward
0 new messages