Batching future-producing calls

9 views
Skip to first unread message

Daniel Morgado

unread,
Jan 14, 2022, 11:13:17 AM1/14/22
to UPC++
Hello,

In researching UPC++, I'm curious if message aggregation for amortizing network latency is supported. More specifically, some mechanism for batching multiple future-producing calls for efficient delivery. I couldn't find any references in the source or documentation for the latter, though it may just be a terminology mismatch on my part.

I suspect the answer is "no", but in the event that it may be possible to implement a mechanism in client code, I figured it would be best to ask. I also realize this may be too much at odds with UPC++'s paradigm of interleaving asynchronous operations with compute.

In the event that these types of optimizations are happening in UPC++ or GASNet, are there any general recommendations for when large amounts of fine-grained communication are required? Something along the lines of: perform future-producing calls in rapid succession such that GASNet can more readily batch underlying messages (depending on network conduit). 

Thanks!
Daniel

Dan Bonachea

unread,
Jan 15, 2022, 11:45:38 PM1/15/22
to Daniel Morgado, UPC++
Hi Daniel - Thanks for your query.

To answer the question you seem to be asking, UPC++/GASNet do not "transparently" or "implicitly" aggregate fine-grained communication operations that happen to be issued in temporal proximity. Initiating an RMA or RPC destined for an off-node process will inject a network packet to initiate that operation before the initiation call returns.

However UPC++ does provide an explicit API for RMA aggregation - these are the "Non-Contiguous One-sided Communication" APIs in chapter 15 of the Programmer's Guide  and detailed further in the Specification. These allow the programmer to explicitly aggregate RMA destined for the same peer; calls like rput_irregular() and rput_strided() will automatically pack together the specified discontiguous pieces of source data and pipeline sending appropriately large packets on the network, automatically unpacking them at the target to the specified destination memory. Because these use explicit aggregation, the "batching" is fully under the UPC++ programmer's control.

I should also mention that several groups have implemented explicit communication aggregation libraries layered over UPC++/GASNet primitives. One prominent example of this is the AggrStore library in upcxx-utils. Another is the Berkeley Container Library (BCL).

Finally, as you've observed all communication in UPC++ is asynchronous and we strongly encourage programmers to overlap communication latency with other communication and computation. UPC++ provides future/promise and completion callback synchronization mechanisms that support an aggressively asynchronous style of communication to hide network latency. In particular UPC++ includes features that make it easy to "batch together" the synchronization for multiple operations, and even build entire DAGs of asynchronous communication and computation to execute dynamically as dependencies become satisfied.

Hope this helps..

-D

--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/upcxx/92b56b63-50b0-447d-a46d-f11d41756314n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages