On behalf of the developers of UPC++ and GASNet-EX, I am pleased to announce the availability of some exciting new capabilities.
Recent work in both libraries can enable client software to avoid stalling in communication injection when GASNet-EX detects network backpressure. In the case of GASNet-EX, existing "immediate-mode" API support has been enhanced to optionally expose backpressure at the Commit stage of NPAM via new API calls. Additionally, the ibv-conduit implementation of "immediate-mode" AM injection has been refined to be more precise. In the case of UPC++, new experimental immediate-mode RPC APIs can throw an exception when backpressure would otherwise stall at injection.
For users of GASNet-EX, the new capabilities are available in the "stable" branch[*] in git (or "develop" if you prefer).
For users of UPC++, one should use the "develop" branch in git (in which case use of GASNet-EX's stable branch is the default).
For more information, including performance results on Slingshot-11 and InfiniBand networks, please see our LBNL Technical Report:
Hargrove P, Bonachea D. Investigation into the Performance Benefits of Exposing Network Backpressure in UPC++ and GASNet-EX Lawrence Berkeley National Laboratory Technical Report LBNL-2001668, May 2025. doi:10.25344/S4088R
-Paul H. Hargove
[*] Recent updates to the GASNet-EX stable branch include
Experimental new `gex_AM_Cancel{Request,Reply}{Medium,Long}()` APIs enable a client to abandon a Prepare (called instead of the matching Commit).
Experimental new `gex_AM_Commit*_v2()` APIs extend the `Commit` API family with a `commit_flags` argument and an `int` return type. This enables use of `GEX_FLAG_IMMEDIATE` at commit time.
ibv-conduit:
Support for `GEX_FLAG_IMMEDIATE` with AM injection (both FPAM and NPAM) has been improved to detect full send queues and completion queues, conditions which previously could have resulted in stalls rather than immediate return.
NEW: Support for network offload of atomic operations. See the ibv-conduit README for more information, such as supported op/type pairs.
Improved tracing and statistics support for the `gex_AD_*()` APIs
Fixes for:
bug4772 - regression in low-memory error handling with single-process runs
bug4773 - (partial fix) improved error handling in segment allocation
bug4774 - Add argument-checking to GASNet object query accessors
The complete list of changes since 2024.5.0 appears under the PENDING heading of the ChangeLog