[erlang-questions] very large networks of erlang nodes?

398 views
Skip to first unread message

Miles Fidelman

unread,
Feb 11, 2012, 5:53:06 PM2/11/12
to erlang-questions Questions
Does anybody have experience running an Erlang environment consisting of
100s, 1000s, or more nodes, spread across a network?

I'm thinking about applications like SETI@home and folding@home, that
distribute processing tasks across huge numbers of PCs, taking advantage
of shared cycles to run long-running, data intensive jobs that can be
parallelized -- and more generally about the BOINC platform
(http://boinc.berkeley.edu).

Seems to me that Erlang would be a great platform for such things. Sort
of wondering if anybody has played with anything along such lines.

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra


_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Michael Truog

unread,
Feb 11, 2012, 6:10:02 PM2/11/12
to Miles Fidelman, erlang-questions Questions
On 02/11/2012 02:53 PM, Miles Fidelman wrote:
> Does anybody have experience running an Erlang environment consisting of 100s, 1000s, or more nodes, spread across a network?
>
> I'm thinking about applications like SETI@home and folding@home, that distribute processing tasks across huge numbers of PCs, taking advantage of shared cycles to run long-running, data intensive jobs that can be parallelized -- and more generally about the BOINC platform (http://boinc.berkeley.edu).
>
> Seems to me that Erlang would be a great platform for such things. Sort of wondering if anybody has played with anything along such lines.
>
Conceptually, Erlang would be a wonderful fit. However, distributed Erlang creates a fully connected network topology which is a limiting factor, that limits a distributed Erlang cluster to 50 - 100 nodes roughly. I do not have proof from trying this on my own, just knowledge of the attempts of others. The net tick time within distributed Erlang expects a response every minute with some permissible variation. You can try changing the net tick time, however, I have seen breakage when using links/monitors between distributed Erlang nodes with a net tick time set longer than 60 seconds (so I assume the net tick time was hard-coded internally, in some places). You can always try doing something strange with hidden nodes trying to limit the scope of a fully-connected network, but it seems like more trouble than it is worth.

I have a project called CloudI (http://cloudi.org) that provides a SOA framework that allows you to connect CloudI distributed Erlang clusters together with ZeroMQ bridges, to surpass the distributed Erlang fully connected network topology limitations. When doing things that way, you would create CloudI services that communicated through message passing using the CloudI API, and the internal routing takes care of service communication (through ZeroMQ or distributed Erlang messaging). Its an open source project (BSD license), so you host it yourself "for greater good".

- Michael

Miles Fidelman

unread,
Feb 12, 2012, 2:26:18 PM2/12/12
to erlang-questions Questions
Steve Strong wrote:
> The largest the we have run at id3as so far is 768 erlang nodes, distributed across 5 data centers. The nodes were structured in a hierarchy and used the hidden flag to ensure that the network did not become fully connected.
>
>
cool! can you say a little more about what you were doing with them?

Adrian Roe

unread,
Feb 13, 2012, 11:08:35 AM2/13/12
to erlang-q...@erlang.org
This was a cloud-based solution id3as built for Thomson Reuters that completely virtualised their online communications business that they run for most FTSE / NYSE companies.  Our experience was that Erlang played nicely when in a star topology and (understandably) got very upset at these levels of scale when in a fully connected environment.  We create / throw away servers in response to demand which can be very spikey. 

The project broke my previous "personal best" for numbers of concurrent servers in a single solution by about 750 ;) and I think we may be one of very few companies to have persuaded Amazon Cloud to return "Out of capacity" errors as opposed to "Exceeded account limits"... 

The responsiveness of AWS is remarkable - without hot spares etc, we can create and commission a server entirely from scratch less than a minute after exceeding a load threshold.

Adrian
Dr Adrian Roe
Director

Garrett Smith

unread,
Feb 14, 2012, 10:49:36 AM2/14/12
to Miles Fidelman, erlang-questions Questions
On Sat, Feb 11, 2012 at 4:53 PM, Miles Fidelman
<mfid...@meetinghouse.net> wrote:
> Does anybody have experience running an Erlang environment consisting of
> 100s, 1000s, or more nodes, spread across a network?
>
> I'm thinking about applications like SETI@home and folding@home, that
> distribute processing tasks across huge numbers of PCs, taking advantage of
> shared cycles to run long-running, data intensive jobs that can be
> parallelized -- and more generally about the BOINC platform
> (http://boinc.berkeley.edu).
>
> Seems to me that Erlang would be a great platform for such things.  Sort of
> wondering if anybody has played with anything along such lines.

As has been said, the fully connected mesh practically limits you to
50 - 150 nodes, at least in my experience. You will also find that
connections flap quite a bit across unreliable networks, wreaking
havoc in your application if you don't design for it.

We (CloudBees) have several hundred Erlang VMs running in
non-distributed mode. Our main backbone for messaging is RabbitMQ, but
we're actively moving away from AMQP toward 0MQ.

The cost for moving outside distributed Erlang is you lose the famed
"location transparency" feature that lets you seamlessly distribute
your application without significant changes to your code. The upside
is you can safely move well beyond the limits of a full connected
mesh.

I haven't tried with the hierarchical / hidden node approach that
Adrian mentioned, but I'm very glad to see a number like 750!

If you're talking SETI like applications, that suggests an unbounded
limit. I'd definitely look at 0MQ -- you'll have enough low level
flexibility in your messaging topology to tackle that hard problem.

Garrett

Richard O'Keefe

unread,
Feb 14, 2012, 6:27:37 PM2/14/12
to Garrett Smith, erlang-questions Questions

On 15/02/2012, at 4:49 AM, Garrett Smith wrote:
> On Sat, Feb 11, 2012 at 4:53 PM, Miles Fidelman
> <mfid...@meetinghouse.net> wrote:
>> Does anybody have experience running an Erlang environment consisting of
>> 100s, 1000s, or more nodes, spread across a network?
> As has been said, the fully connected mesh practically limits you to
> 50 - 150 nodes, at least in my experience. You will also find that
> connections flap quite a bit across unreliable networks, wreaking
> havoc in your application if you don't design for it.

What would happen if you had 1000 nodes in a box with a reliable but
not ultrafast interconnect? I'm not talking about multicore here,
although 16 Tileras in a smallish box doesn't seem unlikely any more,
but say 1000 separate-physical-address-space nodes connected as a
tree or a hypercube or something.

Could distributed Erlang be set up in some hierarchical fashion?

It seems to me that there are three issues:
- number of points of authentication
(network: many; cluster-in-a-box: one)
- number of eavesdropping points
(network: many; cluster-in-a-box: one)
- number of communicating devices
(network: many; cluster-in-a-box: many)
and that just thinking in terms of authentication and eavesdropping,
distributed Erlang makes perfect sense for cluster-in-a-box,
IF it works at that scale.

The Magnus project that Fergus O'Brien was involved with would have
been using Erlang in this way, I believe.

Ulf Wiger

unread,
Feb 16, 2012, 4:49:48 AM2/16/12
to Richard O'Keefe, erlang-questions Questions

It should be noted that there is an ongoing EU-funded research project which aims to address the limits of the current fully-connected approach.


The high-level goal is to figure out how to build clusters of up to 10^5 cores. We envision having to improve both SMP scalability (beyond 100 cores per node) and Distributed Erlang scalability (more than 1000 nodes), and offer powerful and fault-tolerant standard libraries for high productivity. One factor that speaks for Erlang in this realm is that MTTCF (mean time to core failure) for a cluster of that many cores, given current technology, could be less than an hour. This necessitates a solution that can handle partial failure.

It would be great if you could lean on those guys with experiences and ideas. Catch any one of e.g. Robert Virding, Tino Breddin, Kostis, Simon Thompson, Francesco, Olivier Boudeville, or Kenneth Lundin and buy them a beer or two - I'm sure you'll find them receptive. ;-)

Using hidden nodes and spinning your own cluster is one thing we discussed, as well as , but our feeling was that this is way too low-level for a respectable Erlang approach. I imagine some sort of routing paradigm integrated into OTP will be needed, if Distribution Transparency is to be preserved without heroic effort.

Regarding heart-beat timeouts, I think this is often due to port back-pressure. If the VM detects busy_dist_port, it will simply suspend the process trying to send, and when the dist_port queue falls back under a given threshold, it resumes all suspended processes at once. This can easily become disastrous in a situation with very high traffic or slow ports (e.g. in a virtual environment).

Things have improved much in this regard in later releases. For one thing, buffers and thresholds have been increased, are more easily tunable (some hacking required still, I think), and overall throughput on the dist port has been improved. There is still much that can be done, but most of the really embarrassing problems should now be a thing of the past. If not, please share your experiences.

BR,
Ulf W

Miles Fidelman

unread,
Feb 16, 2012, 1:40:44 PM2/16/12
to erlang-questions Questions
Now that is cool!

Ulf Wiger wrote:
>
> It should be noted that there is an ongoing EU-funded research project
> which aims to address the limits of the current fully-connected approach.
>
> http://www.release-project.eu/
> http://www.release-project.eu/documents/RELEASEfactsheetv6.pdf
>
> The high-level goal is to figure out how to build clusters of up to
> 10^5 cores. We envision having to improve both SMP scalability (beyond
> 100 cores per node) and Distributed Erlang scalability (more than 1000
> nodes), and offer powerful and fault-tolerant standard libraries for
> high productivity. One factor that speaks for Erlang in this realm is
> that MTTCF (mean time to core failure) for a cluster of that many
> cores, given current technology, could be less than an hour. This
> necessitates a solution that can handle partial failure.
>
>

--

In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

Richard O'Keefe

unread,
Feb 16, 2012, 5:28:17 PM2/16/12
to Ulf Wiger, erlang-questions Questions

On 16/02/2012, at 10:49 PM, Ulf Wiger wrote:
> It would be great if you could lean on those guys with experiences and ideas. Catch any one of e.g. Robert Virding, Tino Breddin, Kostis, Simon Thompson, Francesco, Olivier Boudeville, or Kenneth Lundin and buy them a beer or two - I'm sure you'll find them receptive. ;-)

If they ever visit Dunedin, it will be an honour and a pleasure to do so.

Reply all
Reply to author
Forward
0 new messages