options for high-throughput connections between nodes

33 views
Skip to first unread message

Dave Cottlehuber

unread,
Apr 1, 2022, 6:01:38 AM4/1/22
to erlang-q...@erlang.org
I'm investigating how to maximise pushing mostly loosely ordered erlang terms (via `term_to_binary/1` and friends) between a mesh of erlang nodes.

A single TCP stream isn't sufficient, so I'm expecting using either UDP or multiple TCP streams, to push up throughput. I don't expect the usual erlang distribution protocol to be suitable, because of the risk of congestion and head-of-line blocking of distribution-related traffic.

Assume LFN scenario (Long Fat Pipes), up to 3-10GiB/s bandwith, usual inter-continental latency as default. No numbers yet, just "as much as possible / closest to theoretical maximum bandwidth".

- zeromq: https://github.com/zeromq/chumak | https://github.com/zeromq/erlzmq2 | https://github.com/lukaszsamson/erlzmq
- osiris: https://github.com/rabbitmq/osiris
- nanomsg: https://github.com/basho/enm
- https://www.erlang.org/doc/man/gen_sctp.html

Has anybody other suggestions to add to the list?

thanks
Dave

Karl Nilsson

unread,
Apr 1, 2022, 6:49:01 AM4/1/22
to Dave Cottlehuber, Erlang Users' List
Osiris is quite a different type of thing to the other ones in your list in that it will always first write terms to disk and only then replicate them (over TCP). That said it could do a decent replication job if you want a local buffer to decouple the production of terms from the replication part. Osiris does still need a dist erl connection for coordination messages and you'd have to modify the quorum commit semantics to fit your use case (e.g. a "leader" member on the production side and a "replica" member on the other side).

Cheers
Karl
--
Karl Nilsson

Dave Cottlehuber

unread,
Apr 1, 2022, 7:10:42 AM4/1/22
to Karl Nilsson, Erlang Users' List
On Fri, 1 Apr 2022, at 10:48, Karl Nilsson wrote:
> Osiris is quite a different type of thing to the other ones in your
> list in that it will always first write terms to disk and only then
> replicate them (over TCP). That said it could do a decent replication
> job if you want a local buffer to decouple the production of terms from
> the replication part. Osiris does still need a dist erl connection for
> coordination messages and you'd have to modify the quorum commit
> semantics to fit your use case (e.g. a "leader" member on the
> production side and a "replica" member on the other side).
>
> Cheers
> Karl

Thanks Karl,

Osiris is definitely worth considering - there is always the risk of
connection loss, and need to restart from a known checkpoint, the buffer
could come in handy. Would it be able to make use of multiple TCP
connections?

A+
Dave

Karl Nilsson

unread,
Apr 1, 2022, 7:21:56 AM4/1/22
to Dave Cottlehuber, Erlang Users' List
You can have multiple osiris "clusters" and each one will use it's own connection
--
Karl Nilsson
Reply all
Reply to author
Forward
0 new messages