I use
Ray, a unified framework for scaling AI and Python applications, in an HPC cluster. Ray is based on gRPC. The cluster I used has InfiniBand which has low latency and high bandwidth. I try to use IPoIB(Internet Protocol over InfiniBand). But in this way, I can not make full use of IB's bandwidth.
I want to optimize Ray on an HPC cluster with InfiniBand. RDMA, which can reduce CPU interruptions for network processing and increase CPU utilization, allows for a better performance of the software. I think it is a good choice for me. But so far, gRPC does not support RDMA. May I ask why not? What should I do if I want to take advantage of the RMDA and IB networks?