[erlang-questions] massive distribution

Roberto Ostinelli

unread,

Nov 20, 2009, 5:17:20 AM11/20/09

to Erlang

dear all,

we are using erlang for cloud computing applications, and i would like
to know if it's possible to take advantage of the processing power of,
let's say, 50,000 cores [i.e. from 6,000 to 12,000 machines, therefore
erlang nodes]. i'm interested in knowing if someone has the
information on how many interconnected nodes erlang has been proven
able to support.

sorry if this came along already somewhere, i could not find relevant info.

thank you,

r.

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Peter Sabaini

unread,

Nov 23, 2009, 3:57:50 PM11/23/09

to Roberto Ostinelli, Erlang

On Fri, 2009-11-20 at 11:17 +0100, Roberto Ostinelli wrote:
> dear all,
>
> we are using erlang for cloud computing applications, and i would like
> to know if it's possible to take advantage of the processing power of,
> let's say, 50,000 cores [i.e. from 6,000 to 12,000 machines, therefore
> erlang nodes]. i'm interested in knowing if someone has the
> information on how many interconnected nodes erlang has been proven
> able to support.

If you're talking about the built-in Erlang distribution mechanisms --
FWIW: in a quick test I did a while ago I had trouble keeping stable
connections for more than ~80 connected nodes. I must admit I was a bit
surprised as the "Efficiency Guide" seems to imply that the number
should be much higher (?).

Possibly the number of known (but not connected) nodes can be higher
than this.

HTH,
peter.

signature.asc

Roberto Ostinelli

unread,

Dec 1, 2009, 5:32:45 AM12/1/09

to Peter Sabaini, Erlang

> If you're talking about the built-in Erlang distribution mechanisms --
> FWIW: in a quick test I did a while ago I had trouble keeping stable
> connections for more than ~80 connected nodes. I must admit I was a bit
> surprised as the "Efficiency Guide" seems to imply that the number
> should be much higher (?).
>
> Possibly the number of known (but not connected) nodes can be higher
> than this.
>
> HTH,
> peter.

thank you peter.

80 connected nodes seems a low number if one seriously need to build
cloud applications.

i guess that one needs to develop his own mechanisms for
interconnecting nodes in a way more similar to a custom 'mesh
networking' [pardon the conceptually wrong extension of this term]?
would that be more appropriate, to your belief?

cheers,

Kevin A. Smith

unread,

Dec 1, 2009, 9:08:29 AM12/1/09

to Roberto Ostinelli, Peter Sabaini, Erlang

Fully connected meshes suck for large numbers of nodes. Erlang provides a number of knobs to control how a cluster is stitched together such as "-connect_all false" and "-hidden". Also, tuning the net tick time (see man 3 net_kernel and man 6 kernel) can be helpful in keeping a large cluster running.

--Kevin

Peter Sabaini

unread,

Dec 1, 2009, 9:22:06 AM12/1/09

to Erlang

On Tue, 2009-12-01 at 09:08 -0500, Kevin A. Smith wrote:
> Fully connected meshes suck for large numbers of nodes. Erlang provides a number of
> knobs to control how a cluster is stitched together such as "-connect_all false"
> and "-hidden".

Which would entail keeping track of connected nodes and connection
establishment/teardown, correct?

> Also, tuning the net tick time (see man 3 net_kernel and man 6 kernel) can be helpful
> in keeping a large cluster running.

I fiddled around with those a bit. I don't have the exact values at
hand, but I set net_ticktime to rather large values, something like
300s, without substantial improvements in the number of nodes able to
keep a stable connection.

peter.

Peter Sabaini

unread,

Dec 1, 2009, 10:11:21 AM12/1/09

to Erlang

On Tue, 2009-12-01 at 11:32 +0100, Roberto Ostinelli wrote:
> > If you're talking about the built-in Erlang distribution mechanisms --
> > FWIW: in a quick test I did a while ago I had trouble keeping stable
> > connections for more than ~80 connected nodes. I must admit I was a bit
> > surprised as the "Efficiency Guide" seems to imply that the number
> > should be much higher (?).
> >
> > Possibly the number of known (but not connected) nodes can be higher
> > than this.
> >
> > HTH,
> > peter.
>
> thank you peter.
>
> 80 connected nodes seems a low number if one seriously need to build
> cloud applications.

I guess that depends on the application and what "serious" means :-)

> i guess that one needs to develop his own mechanisms for
> interconnecting nodes in a way more similar to a custom 'mesh
> networking' [pardon the conceptually wrong extension of this term]?
> would that be more appropriate, to your belief?

As Kevin hinted at, one can use Erlangs' distribution protocol without
using automatic connection handling. OTOH, if you for instance need more
security than Erlangs distribution mechanism provide, it might be wise
to invest in a custom protocol anyway. FWIW, I imagine the data
serialization/deserialization functions could be reused.

peter.

Kevin A. Smith

unread,

Dec 1, 2009, 10:24:48 AM12/1/09

to Peter Sabaini, Erlang

On Dec 1, 2009, at 10:11 AM, Peter Sabaini wrote:

> On Tue, 2009-12-01 at 11:32 +0100, Roberto Ostinelli wrote:
>>> If you're talking about the built-in Erlang distribution mechanisms --
>>> FWIW: in a quick test I did a while ago I had trouble keeping stable
>>> connections for more than ~80 connected nodes. I must admit I was a bit
>>> surprised as the "Efficiency Guide" seems to imply that the number
>>> should be much higher (?).
>>>
>>> Possibly the number of known (but not connected) nodes can be higher
>>> than this.
>>>
>>> HTH,
>>> peter.
>>
>> thank you peter.
>>
>> 80 connected nodes seems a low number if one seriously need to build
>> cloud applications.
>
> I guess that depends on the application and what "serious" means :-)
>
>
>> i guess that one needs to develop his own mechanisms for
>> interconnecting nodes in a way more similar to a custom 'mesh
>> networking' [pardon the conceptually wrong extension of this term]?
>> would that be more appropriate, to your belief?
>
> As Kevin hinted at, one can use Erlangs' distribution protocol without
> using automatic connection handling. OTOH, if you for instance need more
> security than Erlangs distribution mechanism provide, it might be wise
> to invest in a custom protocol anyway. FWIW, I imagine the data
> serialization/deserialization functions could be reused.
>

I've rolled my own clustering bits over TCP sockets for a couple of clients. term_to_binary, binary_to_term, and {packet, 1/2/4} can significantly ease the work involved. It allows you to focus on the security bits and allow Erlang to handle reassembly and serialization.

Just my $0.02 worth...

--Kevin

Garrett Smith

unread,

Dec 1, 2009, 11:27:17 AM12/1/09

to Peter Sabaini, Erlang

On Tue, Dec 1, 2009 at 8:22 AM, Peter Sabaini <pe...@sabaini.at> wrote:
> On Tue, 2009-12-01 at 09:08 -0500, Kevin A. Smith wrote:
>> Fully connected meshes suck for large numbers of nodes. Erlang provides a number of
>> knobs to control how a cluster is stitched together such as "-connect_all false"
>> and "-hidden".
>
> Which would entail keeping track of connected nodes and connection
> establishment/teardown, correct?
>
>> Also, tuning the net tick time (see man 3 net_kernel and man 6 kernel) can be helpful
>> in keeping a large cluster running.
>
> I fiddled around with those a bit. I don't have the exact values at
> hand, but I set net_ticktime to rather large values, something like
> 300s, without substantial improvements in the number of nodes able to
> keep a stable connection.

What is happening that makes something an unstable connection?

I have a mesh of several dozen nodes and the connections can drop at
any time given the basic unreliability of network connections. Each
node, however, is responsible for trying to reestablish a connection
to a well known 'hub', which tends to keep the mesh in tact even when
some nodes fall off occasionally. (This is a single point of failure,
but the 'hub' could easily be a list, like DNS.)

I've found that setting -connect_all false disables the global process
registry, which makes the setting practically useless. I'm guess I've
missed something here. What is the approach to keeping the global
registry in sync when -connect_all false is set?

I've also read about, but not explored, a pattern of segmenting a mesh
into smaller groups of nodes. From what I understand -- that each node
tries to connect to each node -- a mesh has m(n-1)/2 connections, so
80 nodes would imply 3000+ connections. For most applications, that's
a lot of unneeded overhead -- not ever node is going to need to talk
to every other node.

When networks are small, Erlang's global process registration and
lookup facility is phenomenal. But the out-of-the-box scheme
definitely presents challenges in large networks.

I'm definitely curious to know how others have dealt with this type of problem.

Garrett

Jarrod Roberson

unread,

Dec 1, 2009, 3:39:02 PM12/1/09

to Erlang

the project I am working on (inet_mdns @ github.com) addresses one part of
this solution.
Bonjour / Zeroconf is an auto discovery protocol based on multicast dns.
It is used for example by iTunes to auto-discover and publish music shares.
It is used be SubEthaEdit to auto-discover and publish shared documents.
I have used it very successfully to auto-discover / publish services in a
multi-language platform environment
in production systems. We added Zeroconf to Apache to auto-discover back end
servers, we added it to memcached,
so that it could be auto-discovered, we used it in J2EE applications to
publish and auto-discover dependent services.
We also used it in a Twisted server to auto-discover clustered nodes of
other Twisted servers and publish its services
to clients using the Twisted server.

I am using this project as a kind of Rosetta stone because I have
implemented Zeroconf in Java, Python and worked with C code.
So it is a known problem domain, I am trying to recreate those
implementations in Erlang as a real learning experience.
There is even something called Wide Area Bonjour for service discovery over
the internet proper.

Peter Sabaini

unread,

Dec 1, 2009, 4:44:53 PM12/1/09

to Erlang Users' List

On Tue, 2009-12-01 at 10:27 -0600, Garrett Smith wrote:
> On Tue, Dec 1, 2009 at 8:22 AM, Peter Sabaini <pe...@sabaini.at> wrote:
> > On Tue, 2009-12-01 at 09:08 -0500, Kevin A. Smith wrote:
> >> Fully connected meshes suck for large numbers of nodes. Erlang provides a number of
> >> knobs to control how a cluster is stitched together such as "-connect_all false"
> >> and "-hidden".
> >
> > Which would entail keeping track of connected nodes and connection
> > establishment/teardown, correct?
> >
> >> Also, tuning the net tick time (see man 3 net_kernel and man 6 kernel) can be helpful
> >> in keeping a large cluster running.
> >
> > I fiddled around with those a bit. I don't have the exact values at
> > hand, but I set net_ticktime to rather large values, something like
> > 300s, without substantial improvements in the number of nodes able to
> > keep a stable connection.
>
> What is happening that makes something an unstable connection?

The behaviour was that nodes seemed to randomly produced error messages,
eg.:

=ERROR REPORT==== 9-Jul-2009::13:56:07 ===
The global_name_server locker process received an unexpected message:
{{#Ref<0.0.0.1957>,'xy@z'},false}

Or

=ERROR REPORT==== 9-Jul-2009::14:03:33 ===
global: 'foo@bar' failed to connect to 'qux@baz'

In my test I just tried to run "fully-meshed" ie. every node is
connected to every other node; I ran 50 - 120 nodes distributed across 5
physical machines on a local, otherwise healthy, LAN.

As you say, running "fully-meshed" is a lot of overhead, which might not
be necessary in an actual deployment. On the other hand, the
near-automatic network setup is also very convenient :-)

> I have a mesh of several dozen nodes and the connections can drop at
> any time given the basic unreliability of network connections.

TCP/IP in a local LAN should be way more reliable than that.

peter.

signature.asc

Peter Sabaini

unread,

Dec 1, 2009, 4:51:29 PM12/1/09

to Jarrod Roberson, Erlang

On Tue, 2009-12-01 at 15:39 -0500, Jarrod Roberson wrote:
> the project I am working on (inet_mdns @ github.com) addresses one part of
> this solution.

Cool, thanks!

signature.asc

Garrett Smith

unread,

Dec 1, 2009, 5:34:27 PM12/1/09

to Peter Sabaini, Erlang Users' List

On Tue, Dec 1, 2009 at 3:44 PM, Peter Sabaini <pe...@sabaini.at> wrote:
> On Tue, 2009-12-01 at 10:27 -0600, Garrett Smith wrote:
>> What is happening that makes something an unstable connection?
>
> The behaviour was that nodes seemed to randomly produced error messages,
> eg.:
>
> =ERROR REPORT==== 9-Jul-2009::13:56:07 ===
> The global_name_server locker process received an unexpected message:
> {{#Ref<0.0.0.1957>,'xy@z'},false}
>
> Or
>
> =ERROR REPORT==== 9-Jul-2009::14:03:33 ===
> global: 'foo@bar' failed to connect to 'qux@baz'

Hmm...not to say the node count isn't part of the problem, but there
are *lots* of reasons this could happen, none of which have anything
to do with Erlang.

> In my test I just tried to run "fully-meshed" ie. every node is
> connected to every other node; I ran 50 - 120 nodes distributed across 5
> physical machines on a local, otherwise healthy, LAN.

How often were you running into the "failed to connect" error above
with ~120 nodes? Can you rpc to the nodes and get them to reconnect?

> As you say, running "fully-meshed" is a lot of overhead, which might not
> be necessary in an actual deployment. On the other hand, the
> near-automatic network setup is also very convenient :-)

True. But it's also a bit annoying to see all of the nodes busily
trying to hook up, when I really don't need or want them to. As I've
mentioned before, I use -connect_all false but then lose the global
process registry. What's the solution? I probably just need to dig
deeper.

>> I have a mesh of several dozen nodes and the connections can drop at
>> any time given the basic unreliability of network connections.
>
> TCP/IP in a local LAN should be way more reliable than that.

But not 100%. Something's going to fall over at some point and you're
going to need to deal with it. I use a state machine that monitors a
node's connection to another and goes into a retry mode when something
drops. This works well.

My concern is in how the myriad distributed features of Erlang (global
process registry, again, being just one example) deals with large
meshes. If the errors you're seeing are revealing a problem with
Erlang in large networks, it'd be interesting to get to the underlying
cause.

I think ultimately it's crazy to expect n(n-1) to scale, so at some
point the thing needs to be partitioned.

Jarrod Roberson

unread,

Dec 1, 2009, 6:57:40 PM12/1/09

to Erlang

On Tue, Dec 1, 2009 at 4:06 PM, zabrane Mikael <zabr...@gmail.com> wrote:

> Hi Jarrod !
>
> Is "inet_mdns" ready to use?
> Could you please give me a short example of how to use it in real world?
>
> Regards
> Zabrane
>

it isn't finished by a long shot, but I do have "browsing" or
"subscriptions" working.

you start it like.

{S.Pid}=inet_mdns:start().

then to be notified about say, iTunes you would subscribe to that "domain" (
_daap._tcp.local )

inet_mdns:subscribed("_daap._tcp.local",Pid).
Then start iTunes, it will broadcast about it self on start up. I don't have
sending of queries working yet.

and then you can "browse" for the results by

Results = inet_mdns:getsubscriptions(Pid).

where Results is just a nested Dict of all the subscribed domains.

Note: that the public interface is only like 5 days old, I am going to
change it. Especially with critique and feedback from the group.
I am going to build custom Results objects and add parsing of TXT records
into Dicts. There is lots of work to do.

Peter Sabaini

unread,

Dec 2, 2009, 5:07:36 AM12/2/09

to Garrett Smith, Erlang Users' List

On Tue, 2009-12-01 at 16:34 -0600, Garrett Smith wrote:
> On Tue, Dec 1, 2009 at 3:44 PM, Peter Sabaini <pe...@sabaini.at> wrote:
> > On Tue, 2009-12-01 at 10:27 -0600, Garrett Smith wrote:
> >> What is happening that makes something an unstable connection?
> >
> > The behaviour was that nodes seemed to randomly produced error messages,
> > eg.:
> >
> > =ERROR REPORT==== 9-Jul-2009::13:56:07 ===
> > The global_name_server locker process received an unexpected message:
> > {{#Ref<0.0.0.1957>,'xy@z'},false}
> >
> > Or
> >
> > =ERROR REPORT==== 9-Jul-2009::14:03:33 ===
> > global: 'foo@bar' failed to connect to 'qux@baz'
>
> Hmm...not to say the node count isn't part of the problem, but there
> are *lots* of reasons this could happen, none of which have anything
> to do with Erlang.

My evidence is that these problems appeared (with little load) when I
increased the nodecount, and disappeared (even under heavy load) when
going beyond a threshold (seemed stable with 64 nodes in my case). I
didn't investigate this further though as I was more interested in the
behaviour of my application.

> > In my test I just tried to run "fully-meshed" ie. every node is
> > connected to every other node; I ran 50 - 120 nodes distributed across 5
> > physical machines on a local, otherwise healthy, LAN.
>
> How often were you running into the "failed to connect" error above
> with ~120 nodes? Can you rpc to the nodes and get them to reconnect?

I don't have the setup at hand anymore, but as far as I can remember
with 120 nodes the connection errors would occur quite frequently, and
the nodes would then be considered down.

> > As you say, running "fully-meshed" is a lot of overhead, which might not
> > be necessary in an actual deployment. On the other hand, the
> > near-automatic network setup is also very convenient :-)
>
> True. But it's also a bit annoying to see all of the nodes busily
> trying to hook up, when I really don't need or want them to. As I've
> mentioned before, I use -connect_all false but then lose the global
> process registry. What's the solution? I probably just need to dig
> deeper.
>
> >> I have a mesh of several dozen nodes and the connections can drop at
> >> any time given the basic unreliability of network connections.
> >
> > TCP/IP in a local LAN should be way more reliable than that.
>
> But not 100%. Something's going to fall over at some point and you're
> going to need to deal with it. I use a state machine that monitors a
> node's connection to another and goes into a retry mode when something
> drops. This works well.
> My concern is in how the myriad distributed features of Erlang (global
> process registry, again, being just one example) deals with large
> meshes. If the errors you're seeing are revealing a problem with
> Erlang in large networks, it'd be interesting to get to the underlying
> cause.

Right, thats what I was getting at -- there is no such thing as absolute
reliability, but OTOH the connection problems I observed cannot be
explained by TCP/IP unreliability.

Yeah, would be interesting to explore this further. I was pretty
strapped for time when I tried this, but maybe I get around to do this
one of these days.

peter.

Garrett Smith

unread,

Dec 2, 2009, 10:56:34 AM12/2/09

to Peter Sabaini, Erlang Users' List

On Wed, Dec 2, 2009 at 4:07 AM, Peter Sabaini <pe...@sabaini.at> wrote:
> On Tue, 2009-12-01 at 16:34 -0600, Garrett Smith wrote:
>> On Tue, Dec 1, 2009 at 3:44 PM, Peter Sabaini <pe...@sabaini.at> wrote:
>> > On Tue, 2009-12-01 at 10:27 -0600, Garrett Smith wrote:
>> >> What is happening that makes something an unstable connection?
>> >
>> > The behaviour was that nodes seemed to randomly produced error messages,
>> > eg.:
>> >
>> > =ERROR REPORT==== 9-Jul-2009::13:56:07 ===
>> > The global_name_server locker process received an unexpected message:
>> > {{#Ref<0.0.0.1957>,'xy@z'},false}
>> >
>> > Or
>> >
>> > =ERROR REPORT==== 9-Jul-2009::14:03:33 ===
>> > global: 'foo@bar' failed to connect to 'qux@baz'
>>
>> Hmm...not to say the node count isn't part of the problem, but there
>> are *lots* of reasons this could happen, none of which have anything
>> to do with Erlang.
>
> My evidence is that these problems appeared (with little load) when I
> increased the nodecount, and disappeared (even under heavy load) when
> going beyond a threshold (seemed stable with 64 nodes in my case). I
> didn't investigate this further though as I was more interested in the
> behaviour of my application.

This is good input and a bit of a stab (well, poke) in heart of
Erlang's "distributed" story. 100+ nodes may have historically been a
large cluster, but that's quickly changing. It's not unlike the set of
problems that get kicked up by the new large multicore systems (e.g.
process affinity threads that keep popping up here).

Taking the global process registry alone -- and this seems to a
cornerstone of distributed Erlang -- you'd need a robust peer-to-peer
replication strategy that survives constant linear growth of the
network. Or maybe this is nonsense and one must conceded that Erlang's
out-of-the-box location transparency stops at around 100 nodes over
TCP/IP.

Evans, Matthew

unread,

Dec 2, 2009, 11:54:55 AM12/2/09

to Garrett Smith, Peter Sabaini, Erlang Users' List

I'm entering this thread a bit late.

One thing I have noticed when running on certain processors is that the TCP connection becomes a bottle neck if there is a lot of traffic. There can even be issues with low load caused by things like no_delay set or not set (that can be resolved with modifying the "dist" TCP kernel options - for example {dist_nodelay, true}).

But with the load issue: One thing that would be nice is having the ability to specify a "pool" of TCP connections between VMs, rather than one. Especially with lots of processes where we could be hitting locks in the kernel. If this is possible today, please let me know how to do that :-)

Reply all

Reply to author

Forward