[erlang-questions] large Erlang clusters

23 views
Skip to first unread message

Serge Aleynikov

unread,
Aug 11, 2008, 10:36:12 PM8/11/08
to Erlang Users' List
Does any one have experience running somewhere between 200 and 400 nodes
in production? I recall that Erlang distributed layer had a limit of
256 nodes. Is it still the case?

I suppose that partitioning the cluster in several global_groups should
limit the network load and the number of open file descriptors on each
node would be reduced.

Are there any other concerns one should be aware of when working with
such large clusters.

Serge
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

ERLANG

unread,
Aug 12, 2008, 2:32:29 AM8/12/08
to Erlang Users' List
Hi guys,

Is there any free Amazon S3 and SQS implementation in Erlang ?

Thanks
Y.

Tim Fletcher

unread,
Aug 12, 2008, 4:54:00 AM8/12/08
to erlang-q...@erlang.org
> Is there any free Amazon S3 and SQS implementation in Erlang ?

http://code.google.com/p/erlaws/

ERLANG

unread,
Aug 12, 2008, 5:05:41 AM8/12/08
to Tim Fletcher, erlang-q...@erlang.org
Hi Tim,

Thanks for the pointer. Unfortunatly, I'm not interested by a client
interfaces but
by a real implementation of Amazon's SQS and S3 concepts in Erlang.

cheers
Y.

Le 12 août 08 à 10:54, Tim Fletcher a écrit :

Tim Fletcher

unread,
Aug 12, 2008, 6:20:01 AM8/12/08
to erlang-q...@erlang.org
> Thanks for the pointer. Unfortunatly, I'm not interested by a client  
> interfaces but by a real implementation of Amazon's SQS and S3 concepts in Erlang.

Sorry :) AFAIK there is nothing like that yet. Would RabbitMQ do what
you need (instead of SQS), or are you looking for an *exact* clone of
the Amazon API?

ERLANG

unread,
Aug 12, 2008, 6:38:47 AM8/12/08
to Tim Fletcher, Hynek Vychodil, erlang-q...@erlang.org
Hi guys,

RappidMQ will do the job for SQS.
I've to find something like S3 now ;-)

N.B: I don't need exact clones and anything similar will be OK for the
task I've to implement

cheers
Y.

Le 12 août 08 à 12:20, Tim Fletcher a écrit :

Jan Lehnardt

unread,
Aug 12, 2008, 7:16:22 AM8/12/08
to ERLANG, erlang-q...@erlang.org

On Aug 12, 2008, at 12:38 , ERLANG wrote:
> RappidMQ will do the job for SQS.
> I've to find something like S3 now ;-)

How about CouchDB (http://couchdb.org/) as an approximation?

Cheers
Jan
--

ERLANG

unread,
Aug 12, 2008, 7:32:29 AM8/12/08
to Jan Lehnardt, erlang-q...@erlang.org
Hi Jan,

Like CouchDB's introduction pointed, it's not a reliable DB like S3.

What I'm looking after is a free, simple, and reliable (with
replication suport) library to store large
number (thousands to million) of very big files (>1gb per file) on
secondary storage.

Scalaris project is a right candidate but too big in my opinion.
I'm simply looking for something simple.

Anyway, many thanks.

cheers
Y.

Le 12 août 08 à 13:16, Jan Lehnardt a écrit :

Jan Lehnardt

unread,
Aug 12, 2008, 7:38:52 AM8/12/08
to ERLANG, erlang-q...@erlang.org

On Aug 12, 2008, at 13:32 , ERLANG wrote:

> Hi Jan,
>
> Like CouchDB's introduction pointed, it's not a reliable DB like S3.

Reliable is a bendable term. You can certainly make CouchDB reliable
with not
too much effort.

> What I'm looking after is a free, simple, and reliable (with
> replication suport) library to store large
> number (thousands to million) of very big files (>1gb per file) on
> secondary storage.

CouchDB does have replication and it can store large numbers of big
files.

Feel free to send more questions along my way. If it turns out that
CouchDB
doesn't solve your problem, I don't want to push it onto you :)


> Scalaris project is a right candidate but too big in my opinion.
> I'm simply looking for something simple.

The not-yet available permanent storage would rule out Scalaris for
me. I just
don't have that much RAM (just teasing, I love Scalaris :)

Cheers
Jan
--

Robert Raschke

unread,
Aug 12, 2008, 7:50:42 AM8/12/08
to erlang-q...@erlang.org
On Tue, Aug 12, 2008 at 12:32 PM, ERLANG <erl...@gmail.com> wrote:
> What I'm looking after is a free, simple, and reliable (with
> replication suport) library to store large
> number (thousands to million) of very big files (>1gb per file) on
> secondary storage.

Not quite sure if I'm missing the point, but a filesystem on RAIDed
disks would do that, yes?

Or did you want replicated and distributed?

Robby

ERLANG

unread,
Aug 12, 2008, 8:06:07 AM8/12/08
to Robert Raschke, erlang-q...@erlang.org
Hi Robert,

Le 12 août 08 à 13:50, Robert Raschke a écrit :

> On Tue, Aug 12, 2008 at 12:32 PM, ERLANG <erl...@gmail.com> wrote:
>> What I'm looking after is a free, simple, and reliable (with
>> replication suport) library to store large
>> number (thousands to million) of very big files (>1gb per file) on
>> secondary storage.
>
> Not quite sure if I'm missing the point, but a filesystem on RAIDed
> disks would do that, yes?
>
> Or did you want replicated and distributed?
>

I need both if possible.

cheers
Y.

ERLANG

unread,
Aug 12, 2008, 8:07:02 AM8/12/08
to nor...@alum.mit.edu, erlang-q...@erlang.org

Hello Norton,
> Y -
>
> How about Mnesia unlimited?
>
> http://www.wagerlabs.com/blog/2008/06/mnesia-unlimited.html
>

Excellent pointer. Many thanks.

cheers
Y.

> --
> nor...@alum.mit.edu

ERLANG

unread,
Aug 12, 2008, 8:15:45 AM8/12/08
to Jan Lehnardt, erlang-q...@erlang.org

Le 12 août 08 à 13:38, Jan Lehnardt a écrit :

>
> On Aug 12, 2008, at 13:32 , ERLANG wrote:
>
>> Hi Jan,
>>
>> Like CouchDB's introduction pointed, it's not a reliable DB like S3.
>
> Reliable is a bendable term. You can certainly make CouchDB reliable
> with not
> too much effort.
>

By reliable, I mean performant storage engine which offers data
replication, and distribution on
multiple nodes as S3 supports.

Our data are important, and very expensive to re-compute, so we can't
use anything "unreliable"
to save them. Actually, we want to move from S3 to something free and
well designed.


>> What I'm looking after is a free, simple, and reliable (with
>> replication suport) library to store large
>> number (thousands to million) of very big files (>1gb per file) on
>> secondary storage.
>
> CouchDB does have replication and it can store large numbers of big
> files.
>
> Feel free to send more questions along my way. If it turns out that
> CouchDB
> doesn't solve your problem, I don't want to push it onto you :)
>

I'll have a deeper look to CouchDB.

Thanks for all guys.

cheers
Y.

Tim Fletcher

unread,
Aug 12, 2008, 9:07:18 AM8/12/08
to erlang-q...@erlang.org
> What I'm looking after is a free, simple, and reliable (with  
> replication suport) library to store large
> number (thousands to million) of very big files (>1gb per file) on  
> secondary storage.

Hadoop (http://hadoop.apache.org/core/) would be another candidate, if
it doesn't have to be Erlang.

Joseph Wayne Norton

unread,
Aug 12, 2008, 7:39:59 AM8/12/08
to ERLANG, Jan Lehnardt, erlang-q...@erlang.org

ERLANG

unread,
Aug 14, 2008, 7:52:42 AM8/14/08
to Erlang Users' List
Hi guys,

I'd like to share with you some softwares I selected from my actual job.

My problem was to use a robust file system for data storage,
replication and distribution (like Amazon S3),
a messaging system (like Amazon SQS), and computing cloud (like Amazon
EC2). A little Amazon if you want.
All these, with an open source license if possible.

And here is what I choosed :

* GlusterFS (equiv. to S3) : http://www.gluster.org/

A cluster file-system capable of scaling to several peta-bytes. It
aggregates various storage bricks over
Infiniband RDMA or TCP/IP interconnect into one large parallel network
file system. GlusterFS is based on
a stackable user space design without compromising performance.

* Eucalyptus (equiv. to EC2) : http://eucalyptus.cs.ucsb.edu/

Elastic Utility Computing Architecture for Linking Your Programs To
Useful Systems - is an open-source software infrastructure for
implementing "cloud computing" on clusters. The current interface to
EUCALYPTUS is compatible with Amazon's EC2 interface, but the
infrastructure is designed to support multiple client-side interfaces.
EUCALYPTUS is implemented using commonly available Linux tools and
basic Web-service technologies making it easy to install and maintain.

* RabbitMQ (equiv. to SQS) : http://www.rabbitmq.com/
The only exception is here. I need to hide the actual APMQ interface
behind an SQS one.
Not a big deal as RabbitMQ is an awesome piece of code to hack.


Hope this help you build rock solid, open source, free of charge
Erlang applications.

cheers
Y.

Le 12 août 08 à 15:07, Tim Fletcher a écrit :

Viktor Sovietov

unread,
Aug 14, 2008, 8:42:25 PM8/14/08
to erlang-q...@erlang.org

It seems that the thing you need is a good distributed filesystem. There are
solutions able to serve 1GB/s datastream for hundreds TB storage with
installation price ~$1600-2000/TB. Though, you won't be able to use common
hardware, it'll require fast interconnect, at least.

Sincerely,

--Viktor

Hi Jan,

Like CouchDB's introduction pointed, it's not a reliable DB like S3.

What I'm looking after is a free, simple, and reliable (with

replication suport) library to store large
number (thousands to million) of very big files (>1gb per file) on
secondary storage.

Scalaris project is a right candidate but too big in my opinion.


I'm simply looking for something simple.

Anyway, many thanks.

cheers
Y.


--
View this message in context: http://www.nabble.com/large-Erlang-clusters-tp18937196p18990027.html
Sent from the Erlang Questions mailing list archive at Nabble.com.

Jon Singler

unread,
Aug 14, 2008, 10:09:28 PM8/14/08
to erlang-q...@erlang.org
> What I'm looking after is a free, simple, and reliable (with
> replication suport) library to store large
> number (thousands to million) of very big files (>1gb per file) on
> secondary storage.

What you're looking for doesn't exist and can't exist. You're asking
for something "simple" that is also "reliable (with replication
support)" for storing huge numbers of huge files. Any system that is
the latter cannot possibly be the former. Asking for it to be free as
well is really pushing beyond the bounds of reality.

Among the non-free alternatives, I doubt that you're going to be able
to find anything simpler or more reliable than S3, with a management
layer on top.

But best of luck to you in your quest :-)

atomly

unread,
Aug 15, 2008, 7:31:19 PM8/15/08
to erlang-q...@erlang.org
[Jon Singler <jonsi...@gmail.com>]

> What you're looking for doesn't exist and can't exist. You're asking
> for something "simple" that is also "reliable (with replication
> support)" for storing huge numbers of huge files. Any system that is
> the latter cannot possibly be the former. Asking for it to be free as
> well is really pushing beyond the bounds of reality.

I wouldn't say this is true. I mean, you are obviously going to have to
make some tradeoffs, but there is a lot of work being done these days to
solve almost this exact problem. Dynamo does almost exactly this, for
example, and there are several efforts to make open source clones of it.

Having said that, I think S3 is perfect for the job.

--
:: atomly ::

[ ato...@atomly.com : www.atomly.com ...
[ atomiq records : new york city : +1.917.442.9450 ...
[ e-mail atomly-new...@atomly.com for atomly info and updates ...

john s wolter

unread,
Aug 15, 2008, 11:42:22 PM8/15/08
to Jon Singler, john s wolter, erlang-q...@erlang.org
Jon,

"Pushing beyond the bounds of reality" is just what is needed to handle the web services future.  Current ways of scaling data centers and web applications requires a good sized pitch of money.  When, not if, these kind of infrastructures are built, mashing-up a new worldwide company's virtual IT infrastructure will be easy.   Planet sized problems will have the needed technical resources.  

Erlang's features or its successors will be of vital help.  They give us a a glimpse of Things to Come.  Where will it end?  Given these capabilities the answer is "Y" of course go ahead.
--
John S. Wolter President
Wolter Works
Mailto:johns...@wolterworks.com
Desk 1-734-665-1263
Cell: 1-734-904-8433

ERLANG

unread,
Aug 16, 2008, 5:04:23 AM8/16/08
to johns...@wolterworks.com, Jon Singler, Erlang Users' List
Hi guys,

Our decision to leave Amazon grid computing framework isn't only to save money even that saving 
money isn't too bad when you've thousands of Amazon medium/big images running 24/7, exchanging 
huge number of messages per day.

Some of our clients simply don't like SaaS stuffs at all. So, we MUST change for them
because they want full control on their business data from A to Z (most of time due to confidentiality reasons).

Joe Weinman from AT&T wrote a nice post about that:

cheers
Y.

Le 16 août 08 à 05:42, john s wolter a écrit :

Toby Thain

unread,
Aug 16, 2008, 9:45:09 AM8/16/08
to atomly, erlang-q...@erlang.org

On 15-Aug-08, at 8:31 PM, atomly wrote:

> [Jon Singler <jonsi...@gmail.com>]
>> What you're looking for doesn't exist and can't exist. You're asking
>> for something "simple" that is also "reliable (with replication
>> support)" for storing huge numbers of huge files. Any system that is
>> the latter cannot possibly be the former. Asking for it to be free as
>> well is really pushing beyond the bounds of reality.
>
> I wouldn't say this is true. I mean, you are obviously going to
> have to
> make some tradeoffs, but there is a lot of work being done these
> days to
> solve almost this exact problem. Dynamo does almost exactly this,

Dynamo isn't meant to store large blobs (megabytes, gigabytes); its
sweet spot is values of a few KB (iirc - it's a few months since I
read the paper). It certainly is 'simple' and elegant.

--Toby

Viktor Sovietov

unread,
Aug 16, 2008, 4:48:07 PM8/16/08
to erlang-q...@erlang.org

Hi Serge

As far as I know you're only limited with the maximum number of sockets
which are available on your system and with number of atoms which can be
used as node names.
We tested 600 nodes cluster, but I honestly can't recall if there were any
patches to BEAM to increase mentioned parameters.

Sincerely,

--Viktor


Serge Aleynikov-2 wrote:
>
> Does any one have experience running somewhere between 200 and 400 nodes
> in production? I recall that Erlang distributed layer had a limit of
> 256 nodes. Is it still the case?
>
> I suppose that partitioning the cluster in several global_groups should
> limit the network load and the number of open file descriptors on each
> node would be reduced.
>
> Are there any other concerns one should be aware of when working with
> such large clusters.
>
> Serge

> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>

--
View this message in context: http://www.nabble.com/large-Erlang-clusters-tp18937196p18990220.html


Sent from the Erlang Questions mailing list archive at Nabble.com.

_______________________________________________

Serge Aleynikov

unread,
Aug 16, 2008, 5:24:12 PM8/16/08
to Viktor Sovietov, erlang-q...@erlang.org
I suppose that the problem with the max number of sockets is solved by
tweaking session limits (ulimit) and using kernel poll (+K true).

As I understand, in a 600 node cluster every node will maintain
connections to the rest 599 nodes, and send periodic pings. So, that
pinging overhead would be something in the order of 10 events per second
per node in this configuration. While the number doesn't seem
intimidating I wonder if that overhead becomes noticeable in large
network configurations and if there are any other guidelines that help
architect such large network clusters to keep background load minimal.

Serge

Viktor Sovietov

unread,
Aug 16, 2008, 5:48:03 PM8/16/08
to erlang-q...@erlang.org

Hi Serge

Well, in that experiment we still had some reserve, because we had used beam
that been patched to bypass TCP. With having really fast interconnect and no
expenses for often kernel calls, we can pay no no attention to background
load, but further growth of cluster would require a meddling to Erlang's
scalability mechanisms.

Sincerely,

--Viktor


Serge Aleynikov-2 wrote:
>
> I suppose that the problem with the max number of sockets is solved by
> tweaking session limits (ulimit) and using kernel poll (+K true).
>
> As I understand, in a 600 node cluster every node will maintain
> connections to the rest 599 nodes, and send periodic pings. So, that
> pinging overhead would be something in the order of 10 events per second
> per node in this configuration. While the number doesn't seem
> intimidating I wonder if that overhead becomes noticeable in large
> network configurations and if there are any other guidelines that help
> architect such large network clusters to keep background load minimal.
>
> Serge
>
>

--
View this message in context: http://www.nabble.com/large-Erlang-clusters-tp18937196p19015411.html


Sent from the Erlang Questions mailing list archive at Nabble.com.

_______________________________________________

Matt Williamson

unread,
Aug 16, 2008, 11:00:50 PM8/16/08
to Serge Aleynikov, Viktor Sovietov, erlang-q...@erlang.org

Serge Aleynikov

unread,
Aug 17, 2008, 8:22:00 AM8/17/08
to Matt Williamson, erlang-q...@erlang.org
Thanks Matt, I am indeed familiar with that net_kernel's setting.

My current reasoning for large clusters is as follows. If all
inter-node communications are quite busy, this net_kernel's pinging
doesn't add any extra overhead for them. However if several busy nodes
are also connected to a large number of quiet nodes, they'll have to
coop with extra background pinging load.

Perhaps that load is not to worry about when the "close to" real-time
response is expected of the "busy" nodes, as in case of an I/O busy node
signaling select/poll calls will likely happen simultaneously on several
file descriptors, with some of them carrying pinging payload. As a
result the total number of system calls involved in processing these
events wouldn't be as many as if the events were spatially separated.
However, since all net_ticktime logic happens in the bytecode, in a busy
node it steals CPU cycles from other time-sensitive code, leading to
increased latencies. This may also be a constraint if other OS
processes on the same host have higher priority then the emulator and
the emulator is not allowed to run with SMP mode on so that it doesn't
compete with CPU resources of other more critical processes.

This background load can be reduced by either bumping up the
net_ticktime setting (this is quite tricky if part of nodes are located
in one subnet and others are in a more distant network - unfortunately
with this feature *all* nodes in the cluster must have the same setting)
or making sure that global/net_kernel don't maintain a full mesh of
interconnected nodes using global_groups. In the past I used the later
approach (together with dist_auto_connect net_kernel's setting) to make
Erlang applications work through firewalls. In case of node
partitioning applications that take advantage of failover and global
registration must be careful not to deal with several global groups as
global names may be moved between them during failover.

I was hoping to hear in this thread experiences of those currently
running large Erlang clusters in production. In my prior Erlang
deployments I haven't had cases of more than 30-40 node clusters (where
total number of OS processes were much larger than that using ei/tcp to
connect to Erlang, but the Erlang applications would not spread over
more than 40 hosts). Since now there is a case for running a 400 node
cluster (an instance of a VM per host) in a much larger local network
(with a growth potential to about 600-700 nodes), I want to make sure
that I won't miss something obvious to those who had similar experiences.

Regards,

Serge

Matt Williamson

unread,
Aug 17, 2008, 12:37:59 PM8/17/08
to Serge Aleynikov, erlang-q...@erlang.org
Yes, I regret that I haven't got that sort of experience to share with you. I'd really like to hear of some as well, as I'm working on an open project that is using distributed Erlang for all it's inter-node communication.

If you want to play it safe, I hear that using a DHT, something like chord <http://en.wikipedia.org/wiki/Chord_(DHT)>, is a proven scalable method. To make a ring, you could still use distributed Erlang if you set connect_all to false and use ets to keep track of neighbors.
Reply all
Reply to author
Forward
0 new messages