dump/restore in multigrid MPI

231 views
Skip to first unread message

wmoste...@gmail.com

unread,
Jan 23, 2018, 5:42:48 PM1/23/18
to basilisk-fr
Dear all,

I have been having some issues trying to get dump/restore to work with multigrid MPI. I am interested in 3D grids in particular, but 2D grids seem to have distinct problems.

Observations:
1. I am able to dump and restore on 3D simulations at lower depths, e.g. src/examples/isotropic.c  at N=maxlevel=7 without issue, as long as I run the restored version with the same number of MPI procs as the dumping version.
2. From what I've seen, higher-depth simulations, e.g. N=9 fail to restore with the error "grid depths do not match". In an attempt to diagnose this in output.h I noticed that during restore(), the value header.depth value was 5, instead of 8 (expected).
3. 2D simulations restore at lower depths apparently without the need for matching MPI procs
4. By contrast 2D sims at higher depths fail on an MPI proc count error despite ensuring the proc counts match.

I did notice that ../src/examples/isotropic.c remarks that "dump() and restore() do not work with MPI". Given 1 and 3 above I'd been under the impression that this remark was not up to date, but it is possible 2 and 4 may be symptoms of my mistake! But the compatibility matrix seems to think it's ok.

Has anyone encountered this issue? As always please forgive me if I am missing something obvious. 

Thanks once again,
Wouter


Stephane Popinet

unread,
Feb 9, 2018, 12:28:46 PM2/9/18
to basil...@googlegroups.com
Hi Wouter,

> 2. From what I've seen, higher-depth simulations, e.g. N=9 fail to
> restore with the error "grid depths do not match". In an attempt to
> diagnose this in output.h I noticed that during restore(), the value
> header.depth value was 5, instead of 8 (expected).

Can you try to replace line 1074 of src/output.h with:

long cell_size = sizeof(unsigned) + header.len*sizeof(double);

and see if this changes things?

cheers

Stephane

wmoste...@gmail.com

unread,
Feb 12, 2018, 11:27:43 AM2/12/18
to basilisk-fr
Hi Stephane,

Apologies for my delay. Sadly the change to line 1074 doesn't seem to fix the error in 3D at least, and header.depth is still the wrong value for me. However I did notice that changing line 1000 of src/output.h to read:
init_grid(1 << depth+1);
bypassed the issue although I'm deeply skeptical that it's a good solution. On that note, it seems that running in serial (no OpenMP, for what it's worth) doesn't suffer from this issue, although I haven't tested this extensively.

Thanks,
Wouter

Alessio Innocenti

unread,
Feb 13, 2018, 5:36:09 AM2/13/18
to basilisk-fr
Hi Wouter,

What is the grid depth and the number of processors that you use at the beginning of this last simulation, which seems to work when you put "depth+1"? The problem may be due to an incorrect number of processors. See Non-cubic domains .
Best,

Alessio

wmoste...@gmail.com

unread,
Feb 13, 2018, 1:06:14 PM2/13/18
to basilisk-fr
Hi Alessio,

For this last simulation I ran at refinement level 8 (that is, resolution (2^8)^3), and I ran on 27 processors for a cubic domain. With the 'bandaid', restoring works fine even if I use a non-cubic number of procs, such as 25, although the domain becomes only cuboidal as expected. i.e. I obviously need to be running cubic proc numbers, but this issue seems to be independent of whether I'm doing so. Only the resolution appeared to have an effect.

Thanks,
Wouter

Stephane Popinet

unread,
Feb 13, 2018, 2:51:39 PM2/13/18
to basil...@googlegroups.com
Hi Wouter,

Could you please write a simple bug report reproducing the problem?

See here:

http://basilisk.fr/sandbox/bugs/README

Note that you will probably need to write two files: one to dump and one
to restore. Try to make them as simple as possible (while still
triggering the problem of course).

If file sizes/runtimes are an issue to run on the server, just set a
low-level of refinement and explains in the comments how to increase
this level to trigger the bug.

cheers,

Stephane

wmoste...@gmail.com

unread,
Feb 13, 2018, 5:23:48 PM2/13/18
to basilisk-fr
Hi Stephane,

Certainly, I'll set them up and upload on the bug page soon.

Thanks,
Wouter

wmoste...@gmail.com

unread,
Feb 15, 2018, 5:30:09 PM2/15/18
to basilisk-fr
Hi Stephane,

A related question, and forgive me if I missed an easy way to do this: is there a way to compile/run codes on the wiki with MPI? This bug (if indeed it isn't simply a problem on my end) only tends to show up in MPI.

Thanks,
Wouter

Stephane Popinet

unread,
Feb 26, 2018, 12:46:30 PM2/26/18
to basil...@googlegroups.com
Hi Wouter,

I have fixed the problem. The patch will be in the next release.

cheers

Stephane

Wouter Mostert

unread,
Feb 26, 2018, 12:48:18 PM2/26/18
to basilisk-fr
Hi Stephane,

That's marvellous, thank you!

Wouter

Brandon Aranda

unread,
May 3, 2025, 9:01:53 AMMay 3
to basilisk-fr
Hi,

I've encountered an issue similar to this, where the simulation fails to restore and gives the error "grid depths do not match" when using a large aspect ratio. When making a 8x1x1 channel, the simulation restores and runs successfully but when increased in ratio to a 10x1x1, the error occurs. Could this be a related issue and if so, is there any solution to this already?

Thanks,
Brandon Aranda

Reply all
Reply to author
Forward
0 new messages