Running UPC++ using Infiniband

153 views
Skip to first unread message

Jérémie Lagravière

unread,
Dec 12, 2017, 8:41:25 AM12/12/17
to UPC++
Dear UPC++ Users,

I am trying to use UPC++ on a supercomputer these days.

Apparently, when I install UPC++, infiniband conduit is installed automatically during the GASNET configuration/compiling process. 
This is fine, I guess.

When I compile my program like this: 
g++ main.cpp tools.cpp mainComputation.cpp fileReader.cpp timeManagement.cpp -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -D_REENTRANT -I/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/gasnet.opt/include -I/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/gasnet.opt/include/ibv-conduit -I/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/upcxx.O3.gasnet1_seq.ibv/include -std=c++11 -O3 --param max-inline-insns-single=35000 --param inline-unit-growth=10000 --param large-function-growth=200000 -Wno-unused -Wno-unused-parameter -Wno-address -O2 -mavx -march=sandybridge -funroll-loops -fomit-frame-pointer -std=c++11 -L/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/upcxx.O3.gasnet1_seq.ibv/lib -lupcxx -lpthread -L/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/gasnet.opt/lib -lgasnet-ibv-seq -libverbs -lpthread -lrt -L/cluster/software/VERSIONS/gcc-5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -lgcc -lrt -lm -I. -Iincludes/ -DUPCXX_SEGMENT_MB=256 -DGASNET_MAX_SEGSIZE=7GB  -lm -lrt -o upcxxProgram/upcxxSpmv

This apparently works and compiles with no errors.


However when I try to run my program using this command line
export UPCXX_GASNET_CONDUIT=ibv &&  export UPCXX_SEGMENT_MB=3800 && export GASNET_MAX_SEGSIZE=64000MB && export GASNET_PSHM_NODES=4 && export LD_LIBRARY_PATH=/cluster/software/VERSIONS/gcc-5.2.0/lib64 && upcxxProgram/upcxxSpmv ../dataset/D67MPI3Dheart.55 100

I get this error message:
*** FATAL ERROR: Requested spawner "(not set)" is unknown or not supported in this build
WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before gasneti_backtrace_init
/bin/sh: line 1: 17463 Aborted                 upcxxProgram/upcxxSpmv ../dataset/D67MPI3Dheart.55 100
make: *** [run] Error 134

How to solve this problem?
What is the correct way to compile and run a UPC++ program using Infiniband network conduit ?

(It is very probably a matter of setting up some environment variables to the right values.
However, despite using UPC and now UPC++ for quite some time, I am far from being an expert at configuring/using GASNET.)

Thank you in advance for your help.

Jérémie Lagravière

unread,
Dec 12, 2017, 8:55:25 AM12/12/17
to UPC++
I realize that I can add some info about the Infiniband setup on the super computer I am using.

$ ibv_devices 
    device             node GUID
    ------          ----------------
    mlx4_0          0002c903003d5190

And
$ ibv_devinfo 
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.40.5030
node_guid: 0002:c903:003d:5190
sys_image_guid: 0002:c903:003d:5193
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x1
board_id: MT_1100120019
phys_port_cnt: 1
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 404
port_lid: 741
port_lmc: 0x00
link_layer: InfiniBand



Jérémie Lagravière

unread,
Dec 12, 2017, 10:26:16 AM12/12/17
to UPC++
Re:

I got something running by defining MPICC and MPICXX before installing UPC++.
However, I would prefer to use directly the Infiniband conduit...not "MPI over Infiniband".

So the question remains:
How to use Infiniband network conduit with UPC++?

Jérémie Lagravière

unread,
Dec 12, 2017, 11:18:59 AM12/12/17
to UPC++
And in fact by using the MPI conduit...I get problems when trying to use more than one node.
So anyway, I think using Infiniband should be better :)

Jérémie Lagravière

unread,
Dec 12, 2017, 5:34:43 PM12/12/17
to UPC++
I also tried using upcxx-run like this:
/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/bin/upcxx-run 16 upcxxProgram/upcxxSpmv ../dataset/D67MPI3Dheart.55 100

And got this result:
*** Failed to start processes on c6-26
*** FATAL ERROR: One or more processes died before setup was completed
WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before gasneti_backtrace_init
Aborted (core dumped)

This was done with the UPC++ installation that does not use MPI (so networks conduits are: smp or ibv). In this case I set it to ibv.



For info the compile command was this one:

Dan Bonachea

unread,
Dec 12, 2017, 7:38:41 PM12/12/17
to Jérémie Lagravière, UPC++, Steven Hofmeyr
Jeremie - ibv-conduit is a distributed conduit, so presumably you actually want to run across multiple IBV-connected nodes in your cluster. If you only actually have one node then you should set UPCXX_GASNET_CONDUIT=smp at app compile time to have upcxx-meta use the shared-memory-only smp conduit. This of course assumes your modifications to upcxx-meta have not broken that script somehow.

If you actually have multiple nodes and want to run across an InfiniBand cluster, then you need to make sure you are setup correctly for distributed job spawning.

There are basically the following options here for spawning distributed ibv-conduit jobs:

1. upcxx-run (which internally invokes gasnetrun_ibv) to perform ssh-based spawning.
     - This option requires you to correctly setup password-less SSH authentication from at least your head node to all the compute nodes - this document describes how to do that in the context of BUPC (which also uses GASNet) and the information is analogous for UPC++
    - This option additionally requires passing the host names in env GASNET_SSH_SERVERS="host1 host2 host3..."
    - The gasnetrun_ibv -v option is often useful for troubleshooting site-specific problems that may arise here.

2. mpirun (possibly invoked from upcxx-run) - uses MPI for job spawn ONLY, then IBV for communication
    - This requires UPC++/GASNet was configured/built/installed with MPI support (usually by setting CXX=mpicxx)
    - Also requires that (non-GASNet) MPI programs spawn correctly via mpirun (and whatever MPI-implementation-specific tweaking is required to make that work)
    - It's also best to use TCP-based MPI if possible for this purpose, to prevent the MPI library from consuming IBV resources that won't be used by the app. There is more info on that topic in this document
    - mpirun often has a -v option to provide spawn status for troubleshooting

3. PMI spawning (mentioned only for completeness, probably does not apply in your case)

Hope this helps..
-D

--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Hargrove

unread,
Dec 12, 2017, 7:50:18 PM12/12/17
to Dan Bonachea, Jérémie Lagravière, UPC++, Steven Hofmeyr
Jérémie,

I noticed this in your earlier emails:
export LD_LIBRARY_PATH=/cluster/software/VERSIONS/gcc-5.2.0/lib64

I suspect this means that directory is not in your default shared library path, and this may be the cause of your
*** Failed to start processes on c6-26
*** FATAL ERROR: One or more processes died before setup was completed

The UPC++ documentation includes information on dealing with this situation.
In particular, if LD_LIBRARY_PATH is used then it must be set on all nodes (we recommend setting it in ~/.bashrc or ~/.tcshrc for that reason)

-Paul

--
Paul H. Hargrove <PHHar...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory

Jérémie Lagravière

unread,
Dec 12, 2017, 8:42:22 PM12/12/17
to UPC++
Yep this is usually not a problem.
I replicate a safe and identical environment in my job scripts so that everything runs as it should.
Thanks for the advice!
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Paul Hargrove

unread,
Dec 12, 2017, 8:51:39 PM12/12/17
to Jérémie Lagravière, UPC++
Jérémie,

I am not sure if you misunderstood the issue I am describing, or I may have misunderstood your response.  
 
Setting LD_LIBRARY_PATH in your job script sets it for the one node parsing the shell script, but not the remote nodes.

Can you try the following (all one line)?

/usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/gasnet.opt/bin/gasnetrun_ibv -n 16 /usr/bin/env LD_LIBRARY_PATH=/cluster/software/VERSIONS/gcc-5.2.0/lib64 upcxxProgram/upcxxSpmv ../dataset/D67MPI3Dheart.55 100

-Paul

To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jérémie Lagravière

unread,
Dec 12, 2017, 8:59:17 PM12/12/17
to UPC++
Indeed, I am using multiple nodes.
And clearly I am trying to get the best of them and from the network.
My work is about performance, so the faster the better.
Right now, I am doing a very basic test on the supercomputer, using only two nodes, nothing fancy...

Prefered solution:
SSH based spawning: never used it so far (even with UPC), I always suspected this solution to be slow. If you tell me that this way is the way that is suggested to use Infiniband with UPC++ then this is the solution I want.
Now my question is probably a bit trivial, if I need to define the list of nodes before the program is called through upcxx-run, how do I get this list of nodes (host names, I guess)? 

MPI based spawing: this will be my backup plan.
But because I like to be careful, I already compiled upc++ with MPI support.
However, I tried to use it without success:
Compiling worked with no problem, however when running I got some errors. (I will post here later what I actually got as error messages)
In the meantime, what I do not know is how to use MPI+upc++
Having the correct LD_LIBRARY_PATH and calling upcxx-run is enough? 


Other question:
What conduit do you use when you test your upc++ programs on a super computer equipped with Infiniband network?
In terms of performance, which is the fastest SSH-based spawning? or MPI based spawning?

To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.

Jérémie Lagravière

unread,
Dec 12, 2017, 9:01:10 PM12/12/17
to UPC++
Oh ok, indeed I misunderstood.

So here is what I got:
$ /usit/abel/u1/jeremie/myRepo/pgm-jlg-upc-svn/trunk/otherThanSpmv/UPCXX/upcxxPackage/upcxxForAbel/installed/gasnet.opt/bin/gasnetrun_ibv -n 16 /usr/bin/env LD_LIBRARY_PATH=/cluster/software/VERSIONS/gcc-5.2.0/lib64 upcxxProgram/upcxxSpmv ../dataset/D67MPI3Dheart.55 100
*** Failed to start processes on c4-23
*** FATAL ERROR: One or more processes died before setup was completed
WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before gasneti_backtrace_init
Aborted (core dumped)

For info, in this case, I use the upc++ version that does not have any kind of MPI support, just smp and ibv.

Paul Hargrove

unread,
Dec 12, 2017, 9:27:26 PM12/12/17
to Jérémie Lagravière, UPC++
Jérémie,

In most cases I have used mpi-based spawning on InifniBand-connected systems because it requires less setup.  
One can usually assume that a well-administered system will support mpirun without much effort. 

Regarding getting the list of hostnames, we have automated that in most batch environments.
The fact that you see "Failed to start processes on c6-26" suggests to me that our ssh-spawner found c6-26 in a list of nodes provided by the batch system, unless you set GASNET_SSH_SPAWNERS manually.

There is very little performance difference between use of ssh or MPI for spawning, at least in my experience.  
On many systems, mpirun is using ssh too.

-Paul 




To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jérémie Lagravière

unread,
Dec 12, 2017, 9:35:51 PM12/12/17
to UPC++
Ok, then, lets go for MPI.

If I understood what you said in another thread: one should not compose/mix upcxx-run and mpirun
Would it be possible to get a basic version of a job script you would use?
Does not need to be 100% accurate on the syntax.

In my usual setup, I launch my job scripts with sbatch (not srun)
I can also work on interactive login (this is what I have been doing so far...more convenient)

Thank you for your help!

Jérémie Lagravière

unread,
Dec 12, 2017, 9:47:34 PM12/12/17
to UPC++
I am preparing my setup to use MPI/upc++ again.
I will post soon what I get from this solution.

--

Paul Hargrove

unread,
Dec 13, 2017, 1:09:10 AM12/13/17
to Jérémie Lagravière, UPC++
Jérémie,

Just "upcxx-run 4 ./a.out [your arguments]" should be sufficient.

If you are using MPI-based spawning, then "mpirun -np 4 ./a.out [your arguments]" will probably work for ibv-conduit UPC++ jobs as well.

Both case assumes that "mpirun" was in your PATH when you built UPC++, and that it work to launch MPI applications.
However, if you would normally use "srun" to launch MPI applications, let me know.

-Paul

To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jérémie Lagravière

unread,
Dec 13, 2017, 9:45:54 AM12/13/17
to UPC++
Re,

Following your instructions, I managed to have my program running on two nodes.
Now, I am waiting for the other jobs using more nodes to complete.

What I did:
1. compile upc++ with MPI support
2. compile my program with ibv network conduit
3. run my program using the following command line : mpirun -n <ranks> myProgram <args>

It's correct, right?

A question just to be sure:
GASNET_PHYSMEM_MAX
Is it per node, or total?

For instance, each node I am using has 64GB of RAM.
If I am using 2 nodes, should I use:
GASNET_PHYSMEM_MAX=64GB or GASNET_PHYSMEM_MAX=128GB ?


Thank you a lot for your help!

--

Jérémie Lagravière

unread,
Dec 13, 2017, 9:46:37 AM12/13/17
to UPC++

Paul Hargrove

unread,
Dec 14, 2017, 12:23:44 AM12/14/17
to Jérémie Lagravière, UPC++
GASNET_PHYSMEM_MAX is per-node.
You should not exceed about 85% of the true physical memory.
So for 64GB of DRAM, try 54GB.

-Paul

--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jérémie Lagravière

unread,
Dec 14, 2017, 3:27:10 AM12/14/17
to Paul Hargrove, Xing Cai, UPC++
Dear Paul, 

Thanks for your help.

So this is where I am now
Some of my jobs run correctly, i.e. my program reads a big file containing some data and then performs  some operations on this data.

Some of my jobs crash, very certainly for memory configuration reasons. 


Details
What I did then is to reduce a bit all the values I am using for the memory configuration, so I did this:
export GASNET_MAX_SEGSIZE=53000MB //I tried many value here...does not seem to be the source of the problem. When in doubt I use 3000MB as a value for this variable.
export UPCXX_SEGMENT_MB=2000
export GASNET_PHYSMEM_MAX=54GB
GASNET_PHYSMEM_NOPROBE=1
mpirun -n 32 upcxxProgram/upcxxSpmv ../dataset/D90MPI3Dheart.57 100

And everything runs perfectly...but this is unsatisfying because clearly I am far from being able to use the available physical memory.

UPCXX_SEGMENT_MB seems to be the source of the problems:
When I use a value equal or inferior to 2000MB everything works fine.
But, I am using 16 cores per node.
Each node has 64GB of RAM.
If I can use "only" 85% of this amount of RAM: 54GB then each thread/process should be able to use around 3300MB of RAM.

Now, I did my test using interactive login, using 2 nodes.
So in theory I have 64GBx2 of RAM available. Or 108GB (2x54GB) in practice.
So having 2000MB of RAM per thread is far from 108GB.

For info:
The CPU on the super computer I am using have Hyper-Threading capability, but this is supposed to be disabled on compute nodes.
So I checked carefully that the node where I am launching mpirun -n 32, runs "only" 16 threads (i.e. the other 16 threads, must be on the other node that I cannot "see" in the interactive login)

My question:
What is your advice to solve this memory configuration issue?


Thank you in advance for your help.

Best Regards,
Jeremie

Dan Bonachea

unread,
Dec 14, 2017, 6:01:17 AM12/14/17
to Jérémie Lagravière, Paul Hargrove, Xing Cai, UPC++
Jeremie -
Sounds like you might have hit a real bug we should track down.

with complete information.

Please be sure to include the exact crash messages you are seeing, which were not included in your previous mail.

Please also try running in debug mode (compile with UPCXX_CODEMODE=debug) to rule out a large class of potential problems.

Finally, please include `ident` output for your executable (assuming you have the ident tool installed).

Thanks,
-D

Jérémie Lagravière

unread,
Dec 14, 2017, 10:56:43 AM12/14/17
to Dan Bonachea, Paul Hargrove, Xing Cai, UPC++
Ok.

I am preparing a complete bug report.

--

Jérémie Lagravière

unread,
Dec 14, 2017, 11:43:29 AM12/14/17
to Dan Bonachea, Paul Hargrove, Xing Cai, UPC++
This took me some time, but now the bug report is available at this address:

Best Regards,
Jeremie.
Reply all
Reply to author
Forward
0 new messages