Would gRPC like to integrate UCX (High Performance Computing transport)?

451 views
Skip to first unread message

Sergey Shalnov

unread,
Aug 8, 2016, 7:31:49 AM8/8/16
to grpc.io

Hi,

I would like to use UCX (http://www.openucx.org/) as “transport”(depends on terminology) level in gRPC. gRPC looks great front-end for the UCX.

 

UCX is a transport layer that abstracts the differences across various hardware architectures and provides a low-level API that enables the implementation of communication protocols (https://github.com/openucx/ucx/wiki/FAQ)

and usually used in HPC clusters with MPI environment (OpenMPI https://www.open-mpi.org/).

 

The UCX high-level interface (UCP) is quite simple https://github.com/openucx/ucx/wiki/High-Level-design and you could find sources here https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h . Usage example “hello world” here https://github.com/openucx/ucx/blob/master/test/examples/ucp_hello_world.c

 

As I understand (also looked for this https://groups.google.com/d/topic/grpc-io/6-DyXDp2WiY/discussion ) I have to implement struct grpc_endpoint_vtable https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/endpoint.h#L49 but it doesn’t work for me.

 

The problem is in UCX wire-up.

In initialization stage, UCX requires its internal addresses be exchanged between client and server to establish data flow channels. This should be done (~300 bytes single message) by external tools (in ucp_hello_world I used simple tcp sockets). If I implement UCX as the underlying layer (instead tcp) for gRPC I can’t use it to exchange simple messages to initialize UCX. I looks like I need something like:


1.       Initialize original TCP transport in gRPC

2.       Use this to exchange addresses between all available processes (to create communication channels)

3.       Replace tcp transport by UCX on the fly

4.       Use UCX based transport to send\recv messages with better performance than tcp stack does

 

I think I need some other solution instead replacing grpc_endpoint because gRPC tcp transport can’t use several endpoints at the same time.

I could be wrong in all these assumptions. Also, terminology might be wrong (for example term “transport” is different)

 

Could you please recommend me the way I can implement UCX support in gRPC?

 

Thank you

Sergey

Reply all
Reply to author
Forward
0 new messages