Pros and Cons of Supporting Remote Direct Memory Access (RDMA) Transport in fbthrift

152 views
Skip to first unread message

Haiyang Shi

unread,
Aug 4, 2021, 5:26:47 PM8/4/21
to Folly: the Facebook Open-source LibrarY
Hi folks,

This is an fbthrift question.

As offering ultra-low latency and high throughput, RDMA becomes an emergent technique in modern data centers. Amazon Web Services (AWS) proposed an InfiniBand-like network adapter (i.e., Elastic Fabric Adapter (EFA)) to accelerate HPC and DL applications running on AWS. Microsft Azure and Oracle Cloud adopt commodity RDMA-capable NICs (RNICs) in their clouds to keep competitive. Many companies including Facebook are using RDMA (e.g., GPUDirect RDMA) to accelerate distributed training. However, there are few industrial-grade RPC frameworks to leverage the performance potential of RDMA. Therefore, I am curious about the pros and cons of using RDMA as a fast communication transport in fbthrift. Is it worthy to enable it in fbthrift? Is there anyone who is working on enabling it? What would be the challenges and troubles if someone wants to contribute his/her code to fbthrift to support RDMA?

Looking forward to your thoughts.

Thanks,
Haiyang
Reply all
Reply to author
Forward
0 new messages