The remoting library is designed for stateful client/server asynchronous connections. It provides callbacks for both the client and acceptor disconnect events, so failures can be handled gracefully. There is also a read timeout event which can be used to detect an unresponsive system which is still connected.
I use it in production every day. The systems I work on implement graceful disconnect logic to handle failures. For example, if a system loses its connection to a server it will automatically disable until the connection is restored. Jetlang remoting automatically tries to connect when a connection is dropped.
You can have multiple servers and implement some connection logic that automatically round robins on failure or just use a tcp load balancer. Do you expect to have failures?
Mike