Gasnet + libfabric/EFA (Elastic Fabric Adapter from AWS)

45 views
Skip to first unread message

Gabriel Tanase

unread,
Oct 1, 2020, 5:44:55 PM10/1/20
to upc-users

Hi all

Currently I am able to build GASNet-1.32.0 using ofi + libfabric/EFA but not all tests are passing. You can see below the config command  and the test failing.

It seems that when switching from medium size to large message size something is not properly implemented in AWS libfabric and I would like to know if there is an 'easy' way to either fix what is missing inside libfabric or if we can work around current limitations in libfabric.

You can see from the test log that I already increased the medium size message from 8192 to 65k but somehow I don’t think this is the solution (to increase medium size to a very big number)

I can always run and provide you logs if somebody is willing to help me debug this a little.

Thank you,

--Gabriel Tanase

 

 

 

This is how I configure:

 

./configure --prefix=/home/ec2-user/GASNET \

            --enable-ofi \

            --enable-force-ofi \

            --with-ofihome=/home/ec2-user/LIBFABRIC  \

            --with-ofi-provider=efa \

            --enable-pthreads --enable-par --enable-segment-fast --with-segment-mmap-max=4GB --disable-seq --disable-parsync --disable-ibv-rcv-thread --disable-aligned-segments --disable-pshm --disable-fca --disable-mxm

 

 

And this is the test that is failing:

 

WARNING: Using OFI provider (efa), which has not been validated to provide

WARNING: acceptable GASNet performance. You should consider using a more

WARNING: hardware-appropriate GASNet conduit. See ofi-conduit/README.

WARNING: Using GASNet's ofi-conduit, which exists for portability convenience.

WARNING: Support was detected for native GASNet conduits: ibv

WARNING: You should *really* use the high-performance native GASNet conduit

WARNING: if communication performance is at all important in this program run.

=====> testcore2 nprocs=2 config=RELEASE=1.32.0,SPEC=1.8,CONDUIT=OFI(OFI-0.5/OFI-0.5),THREADMODEL=PAR,SEGMENT=FAST,PTR=64bit,noalign,nopshm,nodebug,notrace,nostats,nodebugmalloc,nosrclines,timers_native

,membars_native,atomics_native,atomic32_native,atomic64_native compiler=GNU/7.2.1 sys=x86_64-unknown-linux-gnu

node 0/2 hostname is: compute-st-r5n24xlarge-1 (supernode=0 pid=77890)

OFI conduit: v0.5 GASNET_ALIGNED_SEGMENTS=0

gasnet_AMMaxArgs():        16

gasnet_AMMaxMedium():      65536

gasnet_AMMaxLongRequest(): 2147483647

gasnet_AMMaxLongReply():   2147483647

Running multi-threaded AM correctness test with 10 iterations, max_payload=1048576, depth=16...

payload = 1

node 1/2 hostname is: compute-st-r5n24xlarge-2 (supernode=1 pid=77197)

payload = 2

payload = 4

payload = 8

payload = 16

payload = 32

payload = 64

payload = 128

payload = 256

payload = 512

payload = 1024

payload = 2048

payload = 4096

payload = 8192

payload = 16384

payload = 32768

payload = 65536

ERROR: node 0/2 TH0 data mismatch at sz=65536 iter=0 chunk=1 elem=8880 : actual=60 expected=10 in Long Request (at /home/ec2-user/GASNet-1.32.0/tests/testcore2.c:64)

ERROR: node 0/2 TH0 data mismatch at sz=65536 iter=0 chunk=1 elem=8881 : actual=61 expected=11 in Long Request (at /home/ec2-user/GASNet-1.32.0/tests/testcore2.c:64)

ERROR: node 0/2 TH0 data mismatch at sz=65536 iter=0 chunk=1 elem=8882 : actual=62 expected=12 in Long Request (at /home/ec2-user/GASNet-1.32.0/tests/testcore2.c:64)

ERROR: node 0/2 TH0 data mismatch at sz=65536 iter=0 chunk=1 elem=8883 : actual=63 expected=13 in Long Request (at /home/ec2-user/GASNet-1.32.0/tests/testcore2.c:64)

Reply all
Reply to author
Forward
0 new messages