Full_restart crash due to MPI_Win_Create bug

81 views
Skip to first unread message

Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 21, 2025, 4:03:00 AMOct 21
to Nek5000
Dear Nek users,

I want to perform a full restart in on a simulation I try to launch n supercomputer.

Before that I wanted to practice the full_restart on a smaller case.
I used the tgv nek5000 test case: I generated the mesh, uncommented the lines in the *.usr file to enable the restart, and launch the simulation a first time.
No problem on the first execution, the rs_* files are correctly generated.

Then, I set the full_restart option to true in the *.usr (line 85), and launch the simulation a second time.
Nek5000 exits with an error when using the MPI_win_create in the mfi subroutine (ic.f file).

Does anyone have ever encountered the same problem ?

Looking forward for your insights,
Thanks,
Bastien

YuHsiang Lan

unread,
Oct 21, 2025, 8:59:35 AMOct 21
to Nek5000
Hi Bastien,

Can you post the error message?
This might associated with latest RMA issue or an recent bug.

Thanks,
Yu-Hsiang
--

Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 21, 2025, 9:23:31 AMOct 21
to Nek5000
Hi,

Here is the error message:

*** An error occurred in MPI_Win_create
*** reported by process [1424949250,0]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_WIN: invalid window
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages 

This morning I tried several version of the ic.f file, I found that commit n° 636d0b5 introduced this unexpected behaviour.

Thanks,
Bastien

YuHsiang Lan

unread,
Oct 21, 2025, 9:33:19 AMOct 21
to Nek5000
Hi Bastien,

636d0b5 is the one introduces MPI RMA, but it doesn't explain why it fails.

Are you using ubuntu 22.04? Because, on my laptop, when I run with single processer, I have to manually use
   mpirun --mca osc ucx
or I will have similar issue. This is a known bug and should be resolved with newer MPI.

If you are using recent MPICH, I'm also aware that RMA is not very stable on some HPC machines.
  MPIR_CVAR_CH4_OFI_ENABLE_RMA=0

Does this happens at regular restart?

If you use the latest Nek5000 (updated 5 hrs ago), we have added an alternative method using crystal router.
To bypass RMA, you can try setting this in usrdat2.
  ifcrrs.true.

Hope this helps,

Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 21, 2025, 9:59:46 AMOct 21
to Nek5000
Hi Yu-Hsiang,

I am currently on Debian-11 with openMPI 4.1.0.
But it also happened on my WSL2-ubuntu on my home computer.

Well, the fulll_restart worked initially.
Then on several procs, the simulation froze. But the restart was still working on one proc (still using the nekmpi command).
And finally, the bug won over my computer who now refuse to pass the fulll_restart line even with one proc.
If I comment this line of code, everything is back to normal.

Ok, I will try this new method,
Thanks for the tips!
Bastien

Daniele Massaro

unread,
Oct 21, 2025, 5:14:41 PMOct 21
to Nek5000
Hi Bastien,

Just to double-check: have you started to see this MPI failure after the latest stable Debian update?
I've seen something similar, see the error message below, after the latest upgrade, which introduced the default compiler GCC-14.
Manually installing GCC-12, exporting it and recompiling my case has solved the issue.

Kind regards,
Daniele

 *** An error occurred in MPI_Comm_size
 *** reported by process [1343356929,0]
 *** on communicator MPI_COMM_WORLD
 *** MPI_ERR_COMM: invalid communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***    and potentially your MPI job)



Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 22, 2025, 4:47:13 AMOct 22
to Nek5000
Hi!

@Daniele
The IT Department is not eager to update my work station so I didn't have any stable Debian update for a while now: I am still in gcc-10...

@Yu-Hsiang
The trick you proposed worked !!
For everyone: the ifcrrs flag should already be in usrdat2 on the run that produces the restart file ;)

However with the PnPn-2 formulation the pressure field of the restart file looks very funny:
Most of the GLL points have a zero pressure value in each elements. When the simulation restart, the pressure is then 0 across the domain with the PnPn-2 formulation. (see imageVisit.png attached)

Forcing if_full_pres = .False. in the full_restart_save subroutine did the trick to get the correct pressure field in the restart file, and in the next simulation run  (see imageVisitFixed.png attached) with the PnPn and PnPn-2 formulations. 
 
Huge thanks for your help!

Best of wishes,
Bastien
imageVisit.png
imageVisitFixed.png

YuHsiang Lan

unread,
Oct 22, 2025, 8:55:30 AMOct 22
to Nek5000
Hi Bastien,

> Most of the GLL points have a zero pressure value in each elements.
This is by design.

For PnPn-2, pressure lives on the xm2 grid, which uses lx2=lx1-2 Gauss-Legendre points in each direction.
It's a different mesh than xm1 grid (Gauss-Lobatto-Legendre, lx1 points in 1 direction).

To save the pressure into file, there are two ways in Nek:
1. if_full_pres=F interpolates pressure back to xm1 grid. This way, user can view it in paraview / visit because it now matches the grids in checkpoint file.
2. if_full_pres=T: In each element, we use the first lx2^ldim points out of the allocated lx1^ldim points to directly store the pressure and leave the unused points zero.
    My guess is, if_full_pres=T is to avoid potential interpolation error. Especially at low order.
    If you use if_full_pres=T, you do not suppose to visualize the pressure in paraview.

> When the simulation restart, the pressure is then 0 across the domain with the PnPn-2 formulation.
Now, this sounds suspicious. Are you saying the pr is all 0 after reading a restart with if_full_pres=T?

Thanks,
Yu-Hsiang
--

Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 22, 2025, 10:35:26 AMOct 22
to Nek5000
Hi Yu-Hsiang,

Not all of them, somes GLL points near some element boundaries still have non-zero values but I am not sure if that is the correct value that is plotted either.

My best guess is that the map_pm1_to_pr subroutine from the ic.f file does not behave correctly.
From what I understand:
* the copy method from prepost_map stores all the data from pr into pm1, leaving the zeros at the end of the array.
* the map_pm1_to_pr subroutine reads  lx1*ly1*lz1 consecutive memory space for elements (because if_full_pres is True)

If it is true, then I think that, in map_pm1_to_pr, the last mesh element will read only zeros pressure values and for all the previous ones, the pr variable will discard (lx1-lx2)*(ly1-ly2)*(lz1-lz2) non-zeros values.

But I am reaching the limits of my knowledge on the inner working of fortran, so I might be making a fool of myself.

I am eager to read your insights on this,

Thanks,
Bastien

YuHsiang Lan

unread,
Oct 22, 2025, 7:13:28 PMOct 22
to Nek5000
Hi Bastien,

> the map_pm1_to_pr subroutine reads  lx1*ly1*lz1 consecutive memory space for elements
No. It's not.

When PnPn-2 is used and if_full_pres=T
In prepost_map,

      common /scrcg/ pm1    (lx1,ly1,lz1,lelv)

         elseif (if_full_pres) then
            call rzero(pm1,ntot1)
            do e=1,nelt
               call copy(pm1(1,1,1,e),pr(1,1,1,e),nxyz2)
            enddo
         else

In map_pm1_to_pr

      real pm1(lx1*ly1*lz1,lelv)

         do e=1,nelv
            if (if_full_pres) then
               call copy  (pr(1,1,1,e),pm1(1,e),nxyz2)
            else
               call map12 (pr(1,1,1,e),pm1(1,e),e)
            endif
         enddo

It's symmetric. The gap is the same for both copy-in and copy-out.

As long as you are not changing the polynomial order, and you don't mix the files between PnPn-2 and PnPn, the if_full_press=T + double precision output should preserve all digits of an instantaneous pressure field.

if_full_press=F will lose some accuracy digits (small though) since there is additional interpolation error.

Hope this helps,
Yu-Hsiang
--


Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 23, 2025, 3:09:46 AMOct 23
to Nek5000
Hi!

Yes it does helps a lot! Thanks!

However, I really do not know why the restart pressure from the next run is incorrect though... I have tested on several cases leading to the same observations :/

Thanks again,
Bastien

YuHsiang Lan

unread,
Oct 23, 2025, 9:05:37 AMOct 23
to Nek5000
Hi Bastien,

How do you tell your restart pressure is incorrect?

Thanks,
Yu-Hsiang
--

Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 23, 2025, 9:56:58 AMOct 23
to Nek5000
Hi,
When if_full_pres is true, if I output the pressure after the last iteration of the last run, and compare it with the pressure from the first iteration of the next run, the two pressure field have nothing in common.
The one from the previous run looks smooth across the elements,
On the other, we loose the pressure patterns, and the value are very differents.

The attached file shows you the pressure fields at the end of the run1 (t = 0.05 or 50 timesteps) and at the begenning of the run2 for the taylor-green test case in PnPn-2.

Hope this helps,
Bastien
compare_if_full_pres.png

YuHsiang Lan

unread,
Oct 23, 2025, 10:10:10 AMOct 23
to Nek5000
Hi,

As I mentioned before, you cannot visualize the pressure field when the file uses if_full_pres=T.
Visualization tools have no knowledge of the fact that It's on a different grid.

I'd suggest checking some other quantities based on xm2 grid in userchk. For example, min, max, average, L2 norm, ...

Yu-Hsiang
--

Bastien B. - LHEEA - Centrale Nantes

unread,
Oct 23, 2025, 11:15:19 AMOct 23
to Nek5000
Hi,

Oh the min/max values in the userchk subroutine are the same between the two runs!
I think I understand now!

If I want to postprocess the first time-steps of my simulation (corresponding to the restart points), I have to use the result files from the previous run.
On the new run, the if_full_pres flag is turned on while the code reads the restart files, and during the entire usrchk subroutine.
It means that pr is copied from lx2 to lx1 when I publish the 2 or 3 first timesteps of the new run.

But for the next iterations , the if_full_press flag is back to normal in the usrchk subroutine and the pr field is published after being interpolated on the lx1 grid.
Did I finally get that right?

Thanks a lot Yu-Hsiang!
Bastien

YuHsiang Lan

unread,
Oct 24, 2025, 10:27:10 AMOct 24
to Nek5000
Hi Bastien,

Because you are using full_restart, if_full_pres is turned on. Otherwise, it's off by default and pressure is interpolated onto lx1 grid when dumping a checkpoint file.

You can manually overwrite it right before an additional outpost so you can he some files for restart and some others for visualization.

if_full_pres = .false.
ifvo = .false.
ifto = .false.
ifpo = .true. ! only dump pressure into file
call outpost(vx,vy,vz,pr,t,'ppp')

ifvo = .true. ! recover 
ifto = .true.


Thanks,
Yu-Hsiang
--
Reply all
Reply to author
Forward
0 new messages