Cylinder 3D case freezes on P100

99 views
Skip to first unread message

Robert Sawko

unread,
Oct 26, 2018, 1:15:30 PM10/26/18
to pyfrmai...@googlegroups.com



















































Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

gdb.txt

Freddie Witherden

unread,
Oct 26, 2018, 1:27:53 PM10/26/18
to pyfrmai...@googlegroups.com
Hi Robert,

Looking at the stack trace it appears as if something is hooking
malloc/free (probably MPI or some related library). This is almost
always a bad idea as such code is extremely difficult to get right.
PyFR is particularly sensitive to such hooking on account of the fact
that we load MPI and friends at runtime. Thus, the hooking is done
after a large number of pointers have already been allocated by the
original (un-hooked) malloc. When these pointers are later freed the
hooked free often mistakenly believes they came from the hooked malloc.
Hilarity ensues.

In my experience there is usually a way to prevent such hooking.

Regards, Freddie.

signature.asc

Robert Sawko

unread,
Oct 27, 2018, 5:20:17 AM10/27/18
to Freddie Witherden, pyfrmai...@googlegroups.com
Freddie,

Thanks for a quick reply.

Do you think it will help if I force OpenMPI to use TCP instead of Infiniband? It should
then avoid that specific function ucm_*.

Isn't it surprising though that other examples work fine and that the said example works
on the login node. Surely the hooking is the same?

I understand that this is all runtime stuff, but do you think that my, unusual perhaps,
marriage of anaconda and lmod may be causing it. I use lmod to account for compiler-mpi
hierarchy but perhaps putting anaconda into my gcc/6.4, openmpi/3.1 branch doesn't make
much sense.

Finally, I will also try downgrading openmpi as I am almost sure that only a few months
ago I was running on P100 without putting any thought into it.

Best wishes,
Robert
--
Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
WA4 4AD
United Kingdom
--
Email (IBM): RSa...@uk.ibm.com
Email (STFC): robert...@stfc.ac.uk
Phone (office): +44 (0) 1925 60 3967
Phone (mobile): +44 778 830 8522
Profile page:
http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
--

-----pyfrmai...@googlegroups.com wrote: -----
To: pyfrmai...@googlegroups.com
From: Freddie Witherden
Sent by: pyfrmai...@googlegroups.com
Date: 10/26/2018 06:28PM
Subject: Re: [pyfrmailinglist] Cylinder 3D case freezes on P100
--
You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyfrmailingli...@googlegroups.com.
To post to this group, send an email to pyfrmai...@googlegroups.com.
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.


[attachment "signature.asc" removed by Robert Sawko/UK/IBM]Unless stated otherwise above:

Freddie Witherden

unread,
Oct 29, 2018, 11:20:45 AM10/29/18
to pyfrmai...@googlegroups.com
Hi Robert,

On 27/10/2018 10:20, Robert Sawko wrote:
> Do you think it will help if I force OpenMPI to use TCP instead of Infiniband? It should
> then avoid that specific function ucm_*.
>
> Isn't it surprising though that other examples work fine and that the said example works
> on the login node. Surely the hooking is the same?
>
> I understand that this is all runtime stuff, but do you think that my, unusual perhaps,
> marriage of anaconda and lmod may be causing it. I use lmod to account for compiler-mpi
> hierarchy but perhaps putting anaconda into my gcc/6.4, openmpi/3.1 branch doesn't make
> much sense.
>
> Finally, I will also try downgrading openmpi as I am almost sure that only a few months
> ago I was running on P100 without putting any thought into it.

Falling back to TCP may help. However, this can also come with
substantial performance implications. My advice would therefore be to
build OpenMPI yourself. This way you can be sure that no libraries are
hooking themselves into application code.

Regards, Freddie.

Robert Sawko

unread,
Oct 29, 2018, 4:39:33 PM10/29/18
to Freddie Witherden, pyfrmai...@googlegroups.com
Freddie,

We got the code to work, by reverting to OPAL hooks. Your suggestion was
correct, but I fear some more work is needed. The code runs with this command:

mpirun \
--mca pml_ucx_opal_mem_hooks 1 \
-report-bindings \\
pyfr run -b cuda mesh.pyfrm ../config.ini

For details, please read below. Are you running PyFR on Summit? I am not 100%
sure, but I think this may become relevant for you at some point.

I actually build OpenMPI myself. So the my build the following transport
layers are enabled:

Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
Intel SCIF: no
Intel TrueScale (PSM): no
Mellanox MXM: yes
Open UCX: yes <- This guy seems to be the culprit.
OpenFabrics Libfabric: no
OpenFabrics Verbs: yes
Portals4: no
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: yes
Shared memory/XPMEM: no
TCP: yes

When I build OpenMPI I need to point it to mellanox libraries. I validate my
build of OMPI with Intel Memory Benchmark. To maximise performance on
Infiniband OMPI needs to be able to find these libs.



Now here's how the train of thought on UCX went:

1) As per your suggestion, memory hooks are an issue here.

2) gdb top most backtrace said

#0 0x00003fff82994740 in ucm_malloc_mmaped_ptr_remove_if_exists (ptr=0x3eff0dd9bdd0) at malloc/malloc_hook.c:153

3) What is ucm? We go into openmpi and we look for "ucm_"
openmpi-3.1.2$grep -r ucm_*
ompi/mca/pml/ucx/pml_ucx_component.c:#include <ucm/api/ucm.h>
ompi/mca/pml/ucx/pml_ucx_component.c: ucm_vm_munmap(buf, length);
ompi/mca/pml/ucx/pml_ucx_component.c: ucm_set_external_event(UCM_EVENT_VM_UNMAPPED);

So UCX component is the only thing that uses it.

4) We run a command:

ompi_info --param pml ucx --level 9

MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.2)
MCA pml ucx: ---------------------------------------------------
MCA pml ucx: parameter "pml_ucx_verbose" (current value: "0", data source: default, level: 9 dev/all, type: int)
Verbose level of the UCX component
MCA pml ucx: parameter "pml_ucx_priority" (current value: "51", data source: default, level: 3 user/all, type: int)
Priority of the UCX component
MCA pml ucx: parameter "pml_ucx_num_disconnect" (current value: "1", data source: default, level: 3 user/all, type: int)
How may disconnects go in parallel
MCA pml ucx: parameter "pml_ucx_opal_mem_hooks" (current value: "false", data source: default, level: 3 user/all, type: boo
Use OPAL memory hooks, instead of UCX internal memory hooks
Valid values: 0: f|false|disabled|no|n, 1: t|true|enabled|yes|y

We use the last one to suppress UCX memory hooks. Code seems to work. Elementary?


Now I am going to test a few more examples. It's still not clear why this
manifests itself in 3D cylinder but not in 2D examples provided by you. It
baffles me why this works on the login node?

I need to test it with Spectrum MPI too, but DL has an older version of
Spectrum and I think it may take a while to get the new one on the system
and I want to test cuda-aware comms.

Hope this report helps. I think UCX may be important for you in the future so
it would be good to test PyFR with it. It's possible that my old builds of
OpenMPI did not include it and it's why I had a recollection that it all worked
smoothly in the past, but I have this hopeless habit of installing the latest
software whenever I start a new project...

Best wishes,
Robert
--
Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
WA4 4AD
United Kingdom
--
Email (IBM): RSa...@uk.ibm.com
Email (STFC): robert...@stfc.ac.uk
Phone (office): +44 (0) 1925 60 3967
Phone (mobile): +44 778 830 8522
Profile page:
http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
--Unless stated otherwise above:

Freddie Witherden

unread,
Oct 29, 2018, 4:54:55 PM10/29/18
to pyfrmai...@googlegroups.com
Hi Robert,

On 29/10/2018 20:39, Robert Sawko wrote:
> We got the code to work, by reverting to OPAL hooks. Your suggestion was
> correct, but I fear some more work is needed. The code runs with this command:
>
> mpirun \
> --mca pml_ucx_opal_mem_hooks 1 \
> -report-bindings \\
> pyfr run -b cuda mesh.pyfrm ../config.ini
>
> For details, please read below. Are you running PyFR on Summit? I am not 100%
> sure, but I think this may become relevant for you at some point.

I suspect the reason you only encounter this problem for the larger (3D)
cases is a consequence of how Python manages memory. Small allocations
are handled by a memory pool and thus never result in a malloc/free
operation. Thus it is possible that the issue is only triggered when
running larger cases whose allocations bypass the pool.

Either way this is almost certainly a bug in UCX.

In terms of Summit we have run successfully with Spectrum MPI.
Performance and scaling were both very impressive. I do not believe
that any special modifications or MPI parameters were required.

Regards, Freddie.
signature.asc

Robert Sawko

unread,
Nov 21, 2018, 5:57:22 PM11/21/18
to Freddie Witherden, pyfrmai...@googlegroups.com
Freddie and Eduardo,

Just to let you know. I didn't have a lot of time to look into this memory hook
problem until now, but I have been to SC and went to OpenMPI BoF to accost
someone there. I was advised to put it on their issue tracker so here it is.

https://github.com/open-mpi/ompi/issues/6101

You may be interested to follow this. I am sure you both knew that MPIs are
moving towards UCX so it may be relevant for you in the future. That was all
news to me.

Best wishes,
Robert
--
Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
WA4 4AD
United Kingdom
--
Email (IBM): RSa...@uk.ibm.com
Email (STFC): robert...@stfc.ac.uk
Phone (office): +44 (0) 1925 60 3967
Phone (mobile): +44 778 830 8522
Profile page:
http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
--

-----pyfrmai...@googlegroups.com wrote: -----
To: pyfrmai...@googlegroups.com
From: Freddie Witherden
Sent by: pyfrmai...@googlegroups.com
Date: 10/29/2018 08:55PM
Subject: Re: [pyfrmailinglist] Cylinder 3D case freezes on P100

--
You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyfrmailingli...@googlegroups.com.
To post to this group, send an email to pyfrmai...@googlegroups.com.
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.


[attachment "signature.asc" removed by Robert Sawko/UK/IBM]Unless stated otherwise above:

Robert Sawko

unread,
Dec 2, 2018, 11:14:47 AM12/2/18
to Freddie Witherden, pyfrmai...@googlegroups.com, Daniele Buono
Hi,

This is probably irrelevant by now, but I just want to close the issue.

Updating UCX to 1.4.0 and rebuilding OpenMPI against it solves the freezing
problem. Not sure if UCX is part of OFED but we do have a relatively old
version of it on the cluster so I will try to ask sys admins to update it
centrally on the system as well.

I am currently testing across several nodes and it's definitely working. I
still cannot get cuda-aware version to work, but I'll send a separate email
about this when I get my head around what's going on.

Best wishes,
Robert
--
Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
WA4 4AD
United Kingdom
--
Email (IBM): RSa...@uk.ibm.com
Email (STFC): robert...@stfc.ac.uk
Phone (office): +44 (0) 1925 60 3967
Phone (mobile): +44 778 830 8522
Profile page:
http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
--

-----pyfrmai...@googlegroups.com wrote: -----
To: pyfrmai...@googlegroups.com
From: Freddie Witherden
Sent by: pyfrmai...@googlegroups.com
Date: 10/29/2018 08:55PM
Subject: Re: [pyfrmailinglist] Cylinder 3D case freezes on P100

--
You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyfrmailingli...@googlegroups.com.
To post to this group, send an email to pyfrmai...@googlegroups.com.
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.


[attachment "signature.asc" removed by Robert Sawko/UK/IBM]Unless stated otherwise above:
Reply all
Reply to author
Forward
0 new messages