Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Arizona State University UPC++ Build Questions

126 views
Skip to first unread message

Kirtus Leyba

unread,
Sep 2, 2022, 5:21:08 PM9/2/22
to UPC++
Greetings,

I'm a CS Ph D student at ASU working on several projects that use UPC++. I've built UPC++ myself as well as used a module that the IT team here installed on our cluster for a variety of projects, but each install (including the module version) has it's own series of problems. What I'm looking to do is to use UPC++ with its memory kinds feature to do multi node runs where each node owns 1 or more GPU.

I'm currently using ASU's Agave supercomputer which a subset of nodes are connected with Infiniband, and a subset of those nodes have GPUs.

Previously I have done an IB network build of upc++ without GPUs, and I thought I had got it working with GPUs but I was just running more ranks on a single node with that build and multinode was actually failing. So I set about rebuilding upc++ and here is my situation.

Here is my configuration for upc++ that I am trying:

UPC++ configure: $UPCXX_SOURCE/configure --prefix=$UPCXX_INSTALL --enable-cuda --with-cxx=mpicxx

I load the following modules for the compilers to use with upc++:
module load gcc/10.3.0 (the most recent one on the cluster with a corresponding openmpi module)
module load openmpi/4.1.1-gcc-10.3.0
module load cuda/11.6.0

I have a few environment variables set as well:
export UPCXX_NETWORK=ibv
export UPCXX_GASNET_CONDUIT=ibv

For some of the nodes this build process has different issues, and different things happen. For instance, I can get all the way to the "make check" phase and a few things can happen.

First, I often get this problem when running that the compiled binaries require a version of glibc that cannot be found. Here is a sample error message:
"/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found"
The correct version of libstdc++.so.6 is found within the package manager on the cluster (something like /packages/gcc/gcc-10.3.0/ and so on), but the binaries are looking in /lib64/ for some reason. Interestingly I never have had this issue when building SMP executables. If I explicitly point to the correct library files (using LD_LIBRARY_PATH) or changing the make files, the compile will succeed. In some cases I get a timeout in the run phase of the tests. I get the following message:
"
WARNING: There was an error initializing an OpenFabrics device.                
                                                                               
  Local host:   s76-2                                                          
  Local device: mlx4_0                                                          
--------------------------------------------------------------------------      
--------------------------------------------------------------------------      
WARNING: Open MPI failed to TCP connect to a peer MPI process.  This            
should not happen.                                                              
                                                                               
Your Open MPI job may now hang or fail.                                        
                                                                               
  Local host: s76-3                                                            
  PID:        18468                                                            
  Message:    connect() to 169.254.0.2:1026 failed                              
  Error:      Operation now in progress (115)                                  
--------------------------------------------------------------------------      
[s76-2.agave.rc.asu.edu:13898] 3 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[s76-2.agave.rc.asu.edu:13898] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
--------------------------------------------------------------------------      
Primary job  terminated normally, but 1 process returned                        
a non-zero exit code. Per user-direction, the job has been aborted.            
--------------------------------------------------------------------------      
--------------------------------------------------------------------------      
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:                      
                                                                               
  Process name: [[13867,1],0]                                                  
  Exit code:    124                                               
"

Another strange error occurs when I attempt to run the make check step on nodes with GPUs. I get this error:
error while loading shared libraries: libXNVCtrl.so.0: cannot open shared object file: No such file or directory

During the compile step.

There are IB connected nodes, with GPUs on the cluster I am working on, so theoretically what I am trying to do should be possible but I keep running into hangups.

Any ideas what is going wrong? I would greatly appreciate any and all help!

Regards,
Kirtus Leyba

Paul H. Hargrove

unread,
Sep 2, 2022, 6:48:17 PM9/2/22
to Kirtus Leyba, UPC++
Kirtus,

I can help with at least the first couple issues and have a guess on the last one.

First, independent of your problems, I would recommend configuring using `--with-default-network=ibv` instead of (or in addition to) using the two environment variables you mention.  It means one less thing to go wrong later.

Regarding "/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found"
Your use of LD_LIBRARY_PATH is one option to address this particular problem.
We document that approach and others it docs/local-gcc.md 
I think the first option on that page is the best, since (like `--with-default-network=...` above) it reduces the number of things that need to be "just right" later on.

The message from Open MPI is just that: a message from an MPI implementation.
Because you have configured using `--with-cxx=mpicxx` (and not `--disable-mpi-compat`), UPC++ is using MPI for job launch.
This is normally a good default, since most systems have a working installation of MPI before UPC++ is built/installed.
Since that doesn't seem to be the case, there are two options: fix MPI or avoid MPI.
For the latter, please try a build configured using `--disable-mpi-compat` instead of `--with-cxx=mpicxx`.
Depending on the resource manager (such as SLURM or PBS) in use on your cluster, that might "just work".
If it does not, then you probably need to work with your local sysadmin to ensure you can at least launch a simple MPI application with `mpirun`.

Regarding the libXNVCtrl.so.0 issue: I am afraid I have only a guess.
You say "During the compile step", but that looks to me more likely to be a run time thing.
I suggest you try `make tests` which will *only* compile the tests but not run them.
Then `make run-tests` to run them.
I suspect the error is in the run step.
If so, then it is possibly another case where LD_LIBRARY_PATH needs an addition.


-Paul

--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/upcxx/edf150f2-dcd8-40be-b07f-de062582dea2n%40googlegroups.com.


--
Paul H. Hargrove <PHHar...@lbl.gov>
Pronouns: he, him, his
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory

Kirtus Leyba

unread,
Sep 2, 2022, 7:27:52 PM9/2/22
to UPC++
Thanks for the quick response Paul. This is very helpful info.

I have this new configuration: $UPCXX_SOURCE/configure --prefix=$UPCXX_INSTALL --enable-cuda --with-default-network=ibv --with-ldflags=-Wl,-rpath=$GXXLIBDIR --disable-mpi-compat

and I am getting exitcode=15 on the tests. Compiling succeeds, each test fails with exitcode=15.

Also a question I just thought of: Are the tests supposed to be run from the login node of a cluster or on the allocated nodes? I am allocating 2 nodes that are in the same partition that come with the IB feature, indicating, at least from what I can tell, they should be part of the Infiniband network. From that interactive allocation I am running make check. This cluster uses slurm as a workload manager.

Kirtus Leyba

unread,
Sep 2, 2022, 7:34:06 PM9/2/22
to UPC++
XML output of one of the failed tests:

<?xml version="1.0" encoding="UTF-8"?>                                          
 <testsuite errors="1" failures="0" name="run" test-cases="1" tests="1">        
  <testcase name="vis-ibv-run" skip="0" tests="1" time="40.53">                
    <error>                                                                    
Fri Sep  2 16:29:02 MST 2022                                                    
+ /home/kleyba/Software/upcxx/build/bld/upcxx.assert1.optlev0.dbgsym1.gasnet_seq.ibv/bin/upcxx-run -np 4 -network ibv -- env UPCXX_WARN_EMPTY_RMA=0 UPCXX_WARN_EMPTY_RMA=0 timeout --foreground -k 420s 300s ./test-vis-ibv
WARNING: Beginning a potentially slow probe of max pinnable memory...          
real 40.53                                                                      
user 0.27                                                                      
sys 0.06                                                                        
Fri Sep  2 16:29:42 MST 2022                                                    
    </error>                                                                    
  </testcase>                                                                  
 </testsuite>

Paul H. Hargrove

unread,
Sep 2, 2022, 7:52:29 PM9/2/22
to Kirtus Leyba, UPC++
Kirtus,

Regarding "Are the tests supposed to be run from the login node of a cluster or on the allocated nodes?"
That actually depends on how SLURM has been configured at your site.
However, running `upcxx-run` from the login node (I assume within an `salloc`, right?) should normally do the right thing.

Please set `GASNET_PHYSMEM_MAX=2/3` in your environment. 
That should address that warning about a slow probe and might let your test run.
If it does not work, please try smaller values like `0.5` (fractions and real numbers both work here).
If that does work, then I suggest adding `--with-ibv-physmem-max=2/3` (or the lower value) to your configure command line to make that the default.

If you still have problems, let us know.

-Paul

Kirtus Leyba

unread,
Sep 2, 2022, 7:56:54 PM9/2/22
to UPC++
Ok great, I will try that next time I'm on the cluster. It's having some hiccups right now so I'll report back when I can test.

-Kirtus
Message has been deleted

Kirtus Leyba

unread,
Sep 3, 2022, 3:04:14 PM9/3/22
to UPC++
Sorry for duplicate, posted from the wrong gmail, deleted last message.

Alright I'm beginning to think that job spawning is doing something fishy on Agave. Here is what happens with that environment variable set:

During the make check step, I am getting a bunch of perl warning messages. These are the same messages I get when I log into a node on agave. So without MPI UPC++ is doing something like logging into the nodes and running there? Is there a good way to get a grasp on what UPC++ is doing in terms of the nodes it is running on? Also what exactly does the physmem-max variable control? So with the current configuration i can run the tests on the infiniband nodes but I'm not quite happy with how it behaves. Is there a way to tell which node a rank is launched on, for instance?

-Kirtus

Kirtus Leyba

unread,
Sep 3, 2022, 3:43:14 PM9/3/22
to UPC++
Update: It seems to be working with the latest configuration. I made a batch job submission for 2 nodes on the IB network with 10 tasks each. I ran a hello world program that prints each rank's id plus their rank within their team. I got this output: 
-----------------------------------------------------------------------
Hello from rank 1 with rank within team of 1!
Hello from rank 3 with rank within team of 3!
Hello from rank 9 with rank within team of 9!
Hello from rank 2 with rank within team of 2!
Hello from rank 4 with rank within team of 4!
Hello from rank 5 with rank within team of 5!
Hello from rank 6 with rank within team of 6!
Hello from rank 0 with rank within team of 0!
Hello from rank 7 with rank within team of 7!
Hello from rank 8 with rank within team of 8!
Hello from rank 11 with rank within team of 1!
Hello from rank 13 with rank within team of 3!
Hello from rank 12 with rank within team of 2!
Hello from rank 14 with rank within team of 4!
Hello from rank 15 with rank within team of 5!
Hello from rank 10 with rank within team of 0!
Hello from rank 16 with rank within team of 6!
Hello from rank 17 with rank within team of 7!
Hello from rank 18 with rank within team of 8!
Hello from rank 19 with rank within team of 9!
-------------------------------------------------------------------------
Is the fact that ranks 10-19 are ranks 0-9 on their team enough to indicate that they are running on separate nodes?
-Kirtus

Dan Bonachea

unread,
Sep 3, 2022, 4:33:08 PM9/3/22
to Kirtus Leyba, UPC++
Hi Kirtus -

The GASNET_PHYSMEM_MAX envvar caps the maximum amount of node memory that UPC++/GASNet may register with the IB NIC, which in turn forms a node-wide upper-bound on memory used for the shared heap and internal network buffers. This is just an upper-bound, other variables like UPCXX_SHARED_HEAP_SIZE aka upcxx-run -shared-heap and some ibv-conduit knobs control how much memory is actually used. See more info here

You did not post your test program source code, so I won't try to interpret the output.
upcxx::local_team generally corresponds to compute node boundaries, although there are few very uncommon corner cases where it might be less than the full node.

You can run with: upcxx-run -vv ...  (double verbose)
to get detailed information about high level spawning, recognized envvars, shared heap sizes and process layout.

Or you can get more concise output of the last two varieties by setting env UPCXX_VERBOSE=1, eg:

$ upcxx-run -N 2 -n 8 env UPCXX_VERBOSE=1  a.out                       
//////////////////////////////////////////////////
upcxx::init():
> CPUs Oversubscribed: no "upcxx::progress() never yields to OS"
> Shared heap statistics:
  max size: 0x8000000 (128 MB)
  min size: 0x8000000 (128 MB)
  P0 base:  0x7f4f1be22000
> Local team statistics:
  local teams = 2
  min rank_n = 4
  max rank_n = 4
  min discontig_rank = None
//////////////////////////////////////////////////
UPCXX: Process 2/8 (local_team: 2/4) on pcp-d-5 (16 processors)
UPCXX: Process 0/8 (local_team: 0/4) on pcp-d-5 (16 processors)
UPCXX: Process 3/8 (local_team: 3/4) on pcp-d-5 (16 processors)
UPCXX: Process 1/8 (local_team: 1/4) on pcp-d-5 (16 processors)
UPCXX: Process 5/8 (local_team: 1/4) on pcp-d-6 (16 processors)
UPCXX: Process 4/8 (local_team: 0/4) on pcp-d-6 (16 processors)
UPCXX: Process 6/8 (local_team: 2/4) on pcp-d-6 (16 processors)
UPCXX: Process 7/8 (local_team: 3/4) on pcp-d-6 (16 processors)
...

upcxx-run -vvv ... (triple verbose) will add lower-level spawning details.If you want more details on how job spawning is working, please share that output.

Hope this helps...
-D

Message has been deleted

Kirtus Leyba

unread,
Sep 3, 2022, 4:56:58 PM9/3/22
to UPC++
(Sorry, duplicate google thing again, my bad.)

Ah yes, that is very helpful, thanks!

With -vvv there is a lot of output but here are the relevant sections I believe:

------------------------------------------------------------------------------------------------------------------------
pawning '/home/kleyba/projects/HPC/upcHelloWorld/./hello.out': 20 processes
ENV parameter: GASNET_SPAWN_CONTROL = ssh
ENV parameter: GASNET_SPAWN_ARGS = Mv,4,20,,'/usr/bin/env' 'UPCXX_VERBOSE=1' 'UPCXX_SOURCE=/home/kleyba/Software/upcxx/upcxx-2022.3.0/' 'UPCXX_SHARED_HEAP_SIZE=128 MB' 'UPCXX_INSTALL=/home/kleyba/Software/upcxx/install/' 'UPCXX_BUILD=/home/kleyba/Software/upcxx/build'                                                                                  
ENV parameter: GASNET_SSH_OUT_DEGREE = *not set*                (default)
ENV parameter: GASNET_ENVCMD = /usr/bin/env
ENV parameter: GASNET_SSH_CMD = ssh                             (default)
Configuring for OpenSSH
ENV parameter: GASNET_SSH_OPTIONS = *empty*                     (default)
Constructed ssh command line:
    ssh
    -o
    StrictHostKeyChecking no
    -o
    FallBackToRsh no
    -o
    BatchMode yes
    -o
    ForwardX11 no
    -q
    HOST
    CMD
ENV parameter: GASNET_SSH_KEEPDUP = NO                          (default)
ENV parameter: GASNET_SSH_NODEFILE = *empty*                    (default)
ENV parameter: GASNET_SSH_SERVERS = *not set*                   (default)
ENV parameter: GASNET_NODEFILE = *not set*                      (default)
ENV parameter: PBS_NODEFILE = *not set*                         (default)
ENV parameter: PE_HOSTFILE = *not set*                          (default)
ENV parameter: SSS_HOSTLIST = *not set*                         (default)
ENV parameter: LSB_HOSTS = *not set*                            (default)
ENV parameter: OAR_NODEFILE = *not set*                         (default)
ENV parameter: SLURM_JOB_ID = 17377044
Parsing nodes from command 'scontrol show hostname'
    s76-1
    s76-2
Node count set to available: 2
ENV parameter: GASNET_MASTERIP = *not set*                      (default)
ENV parameter: GASNET_SSH_REMOTE_PATH = *not set*               (default)
[-1] spawning process 0 on s76-1 via ssh
[-1] spawning process 10 on s76-2 via ssh
----------------------------------------------------------------------------------------------------------------------------
and


//////////////////////////////////////////////////
upcxx::init():
> CPUs Oversubscribed: no "upcxx::progress() never yields to OS"
> Shared heap statistics:
  max size: 0x8000000 (128 MB)
  min size: 0x8000000 (128 MB)
  P0 base:  0x2b5ba9bf6000

> Local team statistics:
  local teams = 2
  min rank_n = 10
  max rank_n = 10

  min discontig_rank = None
//////////////////////////////////////////////////
UPCXX: Process  3/20 (local_team: 3/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  5/20 (local_team: 5/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  2/20 (local_team: 2/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  6/20 (local_team: 6/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  9/20 (local_team: 9/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  8/20 (local_team: 8/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  1/20 (local_team: 1/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  4/20 (local_team: 4/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  7/20 (local_team: 7/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  0/20 (local_team: 0/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process 19/20 (local_team: 9/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 14/20 (local_team: 4/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 13/20 (local_team: 3/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 11/20 (local_team: 1/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 10/20 (local_team: 0/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 16/20 (local_team: 6/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 18/20 (local_team: 8/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 15/20 (local_team: 5/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 17/20 (local_team: 7/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 12/20 (local_team: 2/10) on s76-2.agave.rc.asu.edu (24 processors)


So it looks like it spawns the processes with ssh. I guess I am left with many questions about how ssh spawning compares to mpi and how setting the GASNET memory variable will impact my performances, but hopefully I can dig that out of the docs you sent.

-Kirtus

Dan Bonachea

unread,
Sep 3, 2022, 5:12:17 PM9/3/22
to Kirtus Leyba, UPC++
On Sat, Sep 3, 2022 at 4:57 PM Kirtus Leyba <kle...@asu.edu> wrote:
UPCXX: Process  3/20 (local_team: 3/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  5/20 (local_team: 5/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  2/20 (local_team: 2/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  6/20 (local_team: 6/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  9/20 (local_team: 9/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  8/20 (local_team: 8/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  1/20 (local_team: 1/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  4/20 (local_team: 4/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  7/20 (local_team: 7/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process  0/20 (local_team: 0/10) on s76-1.agave.rc.asu.edu (24 processors)
UPCXX: Process 19/20 (local_team: 9/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 14/20 (local_team: 4/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 13/20 (local_team: 3/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 11/20 (local_team: 1/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 10/20 (local_team: 0/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 16/20 (local_team: 6/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 18/20 (local_team: 8/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 15/20 (local_team: 5/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 17/20 (local_team: 7/10) on s76-2.agave.rc.asu.edu (24 processors)
UPCXX: Process 12/20 (local_team: 2/10) on s76-2.agave.rc.asu.edu (24 processors)


So it looks like it spawns the processes with ssh. I guess I am left with many questions about how ssh spawning compares to mpi and how setting the GASNET memory variable will impact my performances, but hopefully I can dig that out of the docs you sent.

Looks like ssh spawning is working correctly for you, which is great!

The main downsides of ssh-spawning are ensuring authentication is working correctly (which clearly it is for you) and it's not compatible with hybrid UPC++/MPI programs. So unless you plan to mix UPC++ and MPI calls in the same program, sticking with your working ssh-spawning setup is probably good. The only other caveat I can think of is if it's relying on an ssh-agent for node authentication then you might get into trouble with an asynchronous batch job spawning while disconnected from your console, but if you plan to do your runs interactively that should not be a problem.

-D

 

Kirtus Leyba

unread,
Sep 3, 2022, 5:21:18 PM9/3/22
to UPC++
Well I launched this run as a batch job so it seems like that works anyway.

Alright, thanks a lot for the help everyone, for the time being looks like I can operate as if everything is working. Much appreciated.

-Kirtus
Reply all
Reply to author
Forward
0 new messages