I am a Legion user and I am trying to install GASNet on Leonardo, a new HPC in Italy.
I’ve installed GASNet 2023.3.0 with the following options: --enable-segment-fast --enable-par --disable-seq --disable-parsync --enable-pthreads --disable-auto-conduit-detect --enable-ibv --enable-pshm --enable-mpi-compat --with-ibv-max-hcas=4 --enable-kind-cuda-uva=probe
When I try to execute some benchmark tests like “testlarge” to check if my installation is working fine, I get the following output
run -p boost_usr_prod --qos=boost_qos_dbg -N 2 --gres=gpu:4 -W 10 --exclusive ./testlarge -m -in 10000 4194304 B
Timer granularity: <= 0.006 us, overhead: ~ 0.008 us
=====> testlarge nprocs=2 config=RELEASE=2023.3.0,SPEC=0.16,CONDUIT=IBV(IBV-2.12/IBV-2.12),THREADMODEL=PAR,SEGMENT=FAST,PTR=64bit,CACHE_LINE_BYTES=64,noalign,pshm,nodebug,notrace,nostats,nodebugmalloc,nosrclines,timers_native,membars_native,atomics_native,atomic32_native,atomic64_native,notiopt compiler=GNU/11.3.0 sys=x86_64-pc-linux-gnu
node 0/2 hostname is: lrdn3095.leonardo.local (supernode=0 pid=98564)
node 1/2 hostname is: lrdn3096.leonardo.local (supernode=1 pid=81609)
node 0/2 Running 10000 iterations of bulk put/get with local addresses inside the segment for sizes: 16...4194304
B: 0 - 16 byte : 10000 iters, throughput 43.759074 MB/sec (PutNBI+DEFER throughput)
B: 0 - 16 byte : 10000 iters, throughput 66.083972 MB/sec (GetNBI throughput)
B: 0 - 32 byte : 10000 iters, throughput 87.442917 MB/sec (PutNBI+DEFER throughput)
B: 0 - 32 byte : 10000 iters, throughput 133.206365 MB/sec (GetNBI throughput)
B: 0 - 64 byte : 10000 iters, throughput 175.287640 MB/sec (PutNBI+DEFER throughput)
B: 0 - 64 byte : 10000 iters, throughput 265.370245 MB/sec (GetNBI throughput)
B: 0 - 128 byte : 10000 iters, throughput 286.415562 MB/sec (PutNBI+DEFER throughput)
B: 0 - 128 byte : 10000 iters, throughput 529.130093 MB/sec (GetNBI throughput)
B: 0 - 256 byte : 10000 iters, throughput 565.926344 MB/sec (PutNBI+DEFER throughput)
B: 0 - 256 byte : 10000 iters, throughput 1064.721435 MB/sec (GetNBI throughput)
B: 0 - 512 byte : 10000 iters, throughput 1107.967438 MB/sec (PutNBI+DEFER throughput)
B: 0 - 512 byte : 10000 iters, throughput 2133.164045 MB/sec (GetNBI throughput)
B: 0 - 1024 byte : 10000 iters, throughput 2125.734654 MB/sec (PutNBI+DEFER throughput)
B: 0 - 1024 byte : 10000 iters, throughput 4242.235013 MB/sec (GetNBI throughput)
B: 0 - 2048 byte : 10000 iters, throughput 4050.445873 MB/sec (PutNBI+DEFER throughput)
B: 0 - 2048 byte : 10000 iters, throughput 8506.641986 MB/sec (GetNBI throughput)
B: 0 - 4096 byte : 10000 iters, throughput 7081.671501 MB/sec (PutNBI+DEFER throughput)
B: 0 - 4096 byte : 10000 iters, throughput 60.838666 MB/sec (GetNBI throughput)
B: 0 - 8192 byte : 10000 iters, throughput 13009.991674 MB/sec (PutNBI+DEFER throughput)
B: 0 - 8192 byte : 10000 iters, throughput 33879.011275 MB/sec (GetNBI throughput)
B: 0 - 16384 byte : 10000 iters, throughput 20462.283918 MB/sec (PutNBI+DEFER throughput)
B: 0 - 16384 byte : 10000 iters, throughput 87.669686 MB/sec (GetNBI throughput)
B: 0 - 32768 byte : 10000 iters, throughput 26130.947404 MB/sec (PutNBI+DEFER throughput)
B: 0 - 32768 byte : 10000 iters, throughput 3091.610605 MB/sec (GetNBI throughput)
B: 0 - 65536 byte : 10000 iters, throughput 34041.394336 MB/sec (PutNBI+DEFER throughput)
B: 0 - 65536 byte : 10000 iters, throughput 2771.028783 MB/sec (GetNBI throughput)
B: 0 - 131072 byte : 10000 iters, throughput 39429.688979 MB/sec (PutNBI+DEFER throughput)
B: 0 - 131072 byte : 10000 iters, throughput 6250.125003 MB/sec (GetNBI throughput)
B: 0 - 262144 byte : 10000 iters, throughput 42084.708100 MB/sec (PutNBI+DEFER throughput)
B: 0 - 262144 byte : 10000 iters, throughput 2720.546547 MB/sec (GetNBI throughput)
B: 0 - 524288 byte : 10000 iters, throughput 45004.095373 MB/sec (PutNBI+DEFER throughput)
B: 0 - 524288 byte : 10000 iters, throughput 2925.322941 MB/sec (GetNBI throughput)
B: 0 - 1048576 byte : 10000 iters, throughput 45655.220903 MB/sec (PutNBI+DEFER throughput)
B: 0 - 1048576 byte : 10000 iters, throughput 2663.579046 MB/sec (GetNBI throughput)
B: 0 - 2097152 byte : 10000 iters, throughput 44549.581457 MB/sec (PutNBI+DEFER throughput)
B: 0 - 2097152 byte : 10000 iters, throughput 2613.585443 MB/sec (GetNBI throughput)
B: 0 - 4194304 byte : 10000 iters, throughput 44293.305842 MB/sec (PutNBI+DEFER throughput)
B: 0 - 4194304 byte : 10000 iters, throughput 3805.424099 MB/sec (GetNBI throughput)
done.
There are some outliers in terms of bandwidth (in this execution the GetNBI 4096 byte and 16384 byte) and most notably repeating the test provides very non-deterministic results.
Updating to a newer version of GASNet does not seem to make any difference.
If I try to use this installation of GASNet with Legion and in my application, I see very slow communications and sometimes deadlocks.
I was wondering if you could suggest any test or configuration options to investigate what is going on and hopefully stabilize the performance of my installation of GASNet on Leonardo.