UPC++ 2020.11.0 Memory Kinds Prototype

Skip to first unread message

Paul H. Hargrove

Oct 30, 2020, 7:13:10 PM10/30/20
to UPC++, upcxx-a...@lbl.gov

The Pagoda project at Lawrence Berkeley National Laboratory is proud to announce the prototype release of UPC++ 2020.11.0, now available from upcxx.lbl.gov. This release includes all of the numerous enhancements and usability improvements of the concurrent UPC++ 2020.10.0 release, and adds a new prototype implementation of upcxx::copy().  This implementation leverages support in GASNet-EX 2020.11.0 for the GPUDirect RDMA (GDR) capabilities of modern NVIDIA GPUs and Mellanox InfiniBand networks (such as on the OLCF's Summit).  This enables transfers directly to and from GPU memory (without staging through host memory), resulting in lower latency and higher bandwidth.  Changes are detailed in the ChangeLog.

This prototype release is intended for use by UPC++ application developers with a need to maximize performance of data transfers involving GPU memory via upcxx::copy() calls on a supported system. The microbenchmark graph above demonstrates the performance improvement of upcxx::copy() measured on OLCF Summit using GPUDirect RDMA (GDR) communication between GPU memory of two nodes over a single rail of the EDR InfiniBand network.  With the improved performance comes small risks of (1) investment in coding to APIs subject to change and (2) potential instability in the implementation.  We believe that both risks are relatively low, but are still sufficient to warrant releasing this feature separately from our semi-annual stable release, 2020.10.0.  This work will appear in a future stable release.

We would like to remind users of OLCF's Summit or NERSC's Cori GPU systems that we maintain public installs of UPC++ at both centers, with usage instructions here.  The 2020.11.0 prototype should be installed on these systems within the next few business days.

Please use the issue tracker to report any problems or provide feedback on this prototype.  Alternatively, if you have private feedback or questions not suited to a public venue, you can email: pag...@lbl.gov.  We welcome all feedback.

-Paul H. Hargrove, on behalf of the Pagoda project at LBNL

Change Highlights relative to 2020.10.0:

  • Relax the restriction that a given CUDA device ID may only be opened once per process using cuda_device.

  • Add a device_allocator::is_active() query, and fix several subtle defects with inactive devices/allocators.

  • Resource exhaustion failures that occur while allocating a device segment now throw upcxx::bad_segment_alloc, a new subclass of std::bad_alloc.

  • Debug-mode global_ptr checking for device pointers has been strengthened when using GDR-accelerated memory kinds.

Requirements changes:

  • The PGI/NVIDIA C++ compiler is not supported in this prototype release, due to a   problem with the optimizer. Users are advised to use a supported version of the Intel, GNU or LLVM/Clang C++ compiler instead.

Download filenames and their md5 checksums:

upcxx-spec-2020.11.0-draft.pdf af8b8bd32514c5c3763e7c8d58598a04

upcxx-2020.11.0-memory_kinds_prototype.tar.gz ff875cff6d8331799bf83e1e007397b6

Reply all
Reply to author
0 new messages