[erlang-questions] Twoorl: an open source Twitter clone

26 views
Skip to first unread message

Yariv Sadan

unread,
May 29, 2008, 1:12:29 AM5/29/08
to erl...@googlegroups.com, erlang-questions
Hi,

I created an open source Twitter clone in Erlang called Twoorl. I
wrote it on top of ErlyWeb/Yaws. You can see it at http://twoorl.com.
The code is at http://code.google.com/p/twoorl.

I'll appreciate any feedback!

Thanks,
Yariv
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Nick Gerakines

unread,
May 29, 2008, 1:26:00 AM5/29/08
to Yariv Sadan, erl...@googlegroups.com, erlang-questions
MySQL? Seriously?

I've heard from multiple sources database congestion is a major source
of scaling problems for websites. Why take MySQL over a fragmented
mnesia store or a set of hybrid services?

# Nick Gerakines

Yariv Sadan

unread,
May 29, 2008, 1:38:19 AM5/29/08
to Nick Gerakines, erlang-questions
Facebook runs on MySQL -- that's pretty scalable :)

The main reason I didn't go with Mnesia for storing most data is the
dets issues discussed previously on this list (specifically, the long
repair times and the need for fragmentation over 4gb). I use Mnesia
for storing session data though.

Yariv

Rapsey

unread,
May 29, 2008, 1:55:02 AM5/29/08
to erlang-q...@erlang.org
Are the repair times really that much of a problem? It's not like servers or erlang runtime crash that often.
I would think the advantages of using mnesia far outweight the disadvantages. Mnesia is much easier to scale and work with than mysql.


Sergej

Anders Nygren

unread,
May 29, 2008, 2:09:40 AM5/29/08
to Rapsey, erlang-q...@erlang.org
2008/5/29 Rapsey <rap...@gmail.com>:

> Are the repair times really that much of a problem? It's not like servers or
> erlang runtime crash that often.
> I would think the advantages of using mnesia far outweight the
> disadvantages. Mnesia is much easier to scale and work with than mysql.
>

I don't know who scared Yariv about dets, but it is interesting
to note in The Erlang Efficiency Guide it now says.

"2.6 Myth: Repairing a Dets file is very slow

The repair time is still proportional to the number of records in the
file, but Dets repairs used to be much, much slower in the past. Dets
has been massively rewritten and improved. "

So I don't know if his worries still apply.

/Anders

Yariv Sadan

unread,
May 29, 2008, 2:34:43 AM5/29/08
to Anders Nygren, erlang-q...@erlang.org
>
> I don't know who scared Yariv about dets, but it is interesting
> to note in The Erlang Efficiency Guide it now says.
>
> "2.6 Myth: Repairing a Dets file is very slow
>
> The repair time is still proportional to the number of records in the
> file, but Dets repairs used to be much, much slower in the past. Dets
> has been massively rewritten and improved. "
>
> So I don't know if his worries still apply.
>
> /Anders
>

That's great -- I didn't see that!

What scared me was when Klacke said dets should be rewritten.

Yariv Sadan

unread,
May 29, 2008, 2:39:34 AM5/29/08
to Rapsey, erlang-q...@erlang.org
I've been using MySQL for Vimagi (http://vimagi.com) and haven't had
any problems. ErlyDB makes working with MySQL very easy -- just as
easy as Mnesia (if not easier in some cases) in terms of APIs. The
main reasons I went with MySQL are that I didn't want to deal with
fragmentation and dets repair times (yes, I know it wouldn't really
affect an app such as Twoorl, but I've already gotten into the mindset
of avoiding dets).

I actually wouldn't mind offering Mnesia as an alternate data store if
someone wants to implement it :)

Yariv

2008/5/28 Rapsey <rap...@gmail.com>:

Per Melin

unread,
May 29, 2008, 5:20:27 AM5/29/08
to Yariv Sadan, erlang-questions
2008/5/29 Yariv Sadan <yariv...@gmail.com>:

> Facebook runs on MySQL -- that's pretty scalable :)

By Twitter's own account the database has been their bottleneck all along.

> The main reason I didn't go with Mnesia for storing most data is the
> dets issues discussed previously on this list (specifically, the long
> repair times and the need for fragmentation over 4gb). I use Mnesia
> for storing session data though.

I wouldn't go for Mnesia either. If you're building something that's
should possibly scale to the level of Twitter, the 2 GB limit just
doesn't cut it, with or without fragmentation. And in the tests I've
done DETS got very slow long before it hit the size limit. If you use
disc_copies (which doesn't use DETS) instead of disc_only_copies
you're of course instead limited by RAM.

I (like everyone else) have built a (web) chat service that I'll
probably (alpha) release this weekend. The single biggest problem I've
been wrestling with is how to persist the messages in an efficient
manner. Right now I'm using term_to_binary and dumping that in a flat
file (much like disk_log), which works for all my needs, except for
full text search. I'd prefer not to build and store my own reverse
index.

Gleb Peregud

unread,
May 29, 2008, 6:44:16 AM5/29/08
to erlang-questions
On Thu, May 29, 2008 at 11:20 AM, Per Melin <per....@gmail.com> wrote:
> Right now I'm using term_to_binary and dumping that in a flat
> file (much like disk_log), which works for all my needs, except for
> full text search. I'd prefer not to build and store my own reverse
> index.

I'm interested in an example of the reverse index implementation,
'cause i'll need one in my bachelors thesis (written in Erlang :) )

P.S. Sorry, Per, for sending previously this mail exclusively to you.
--
Gleb Peregud
http://gleber.pl/

Every minute is to be grasped.
Time waits for nobody.
-- Inscription on a Zen Gong

Ulf Wiger (TN/EAB)

unread,
May 29, 2008, 7:02:55 AM5/29/08
to Gleb Peregud, erlang-questions
Gleb Peregud skrev:

> On Thu, May 29, 2008 at 11:20 AM, Per Melin <per....@gmail.com> wrote:
>> Right now I'm using term_to_binary and dumping that in a flat
>> file (much like disk_log), which works for all my needs, except for
>> full text search. I'd prefer not to build and store my own reverse
>> index.
>
> I'm interested in an example of the reverse index implementation,
> 'cause i'll need one in my bachelors thesis (written in Erlang :) )

For full-text search, you could peek at the rdbms_wsearch*.erl
modules in

http://jungerl.cvs.sourceforge.net/jungerl/jungerl/lib/rdbms/src/

They have not been integrated into rdbms, so they're mainly
there for future use.

The code implements full text searching using Porter's
stemming algorithm, courtesy Hans Nilsson.

BR,
Ulf W

Joe Armstrong

unread,
May 29, 2008, 10:19:40 AM5/29/08
to Yariv Sadan, erlang-q...@erlang.org
How about a using a 2-3 node ram replicated Mnesia as a front-end
cache with MySQL as a backend
store?

Add a time to live argument to all data and flush the cache to the
backend in idle time.

Adjust the time to live so as to keep the cache a reasonable size.

Best of both worlds.

/Joe

Lev Walkin

unread,
May 29, 2008, 10:50:27 AM5/29/08
to Joe Armstrong, Yariv Sadan, erlang-q...@erlang.org
Joe Armstrong wrote:
> How about a using a 2-3 node ram replicated Mnesia as a front-end
> cache with MySQL as a backend
> store?
>
> Add a time to live argument to all data and flush the cache to the
> backend in idle time.
>
> Adjust the time to live so as to keep the cache a reasonable size.

This is the hardest part. It is easier to interface with memcached,
or several, at this point. You can even have more than 4g cached
in memory with several memcached and PAE on a single machine.

Tim Fletcher

unread,
May 29, 2008, 11:59:57 AM5/29/08
to erlang-q...@erlang.org
> This is the hardest part. It is easier to interface with memcached,
> or several, at this point. You can even have more than 4g cached
> in memory with several memcached and PAE on a single machine.

If it helps, I have an Erlang Memcache client half written.

Gleb Peregud

unread,
May 29, 2008, 12:12:30 PM5/29/08
to Tim Fletcher, erlang-q...@erlang.org
On Thu, May 29, 2008 at 5:59 PM, Tim Fletcher <two...@gmail.com> wrote:
> If it helps, I have an Erlang Memcache client half written.

There is one ready in Cacherl project (supports get, set and delete):
http://code.google.com/p/cacherl/source/browse/trunk/memcached/src/memcached_client.erl

Feel free to use it and expand it (vide. replace command ;) )

--
Gleb Peregud
http://gleber.pl/

Every minute is to be grasped.
Time waits for nobody.
-- Inscription on a Zen Gong

Torbjorn Tornkvist

unread,
May 29, 2008, 4:13:28 PM5/29/08
to erlang-q...@erlang.org
Per Melin wrote:
> 2008/5/29 Yariv Sadan <yariv...@gmail.com>:
>> Facebook runs on MySQL -- that's pretty scalable :)
>
> By Twitter's own account the database has been their bottleneck all along.
>
>> The main reason I didn't go with Mnesia for storing most data is the
>> dets issues discussed previously on this list (specifically, the long
>> repair times and the need for fragmentation over 4gb). I use Mnesia
>> for storing session data though.
>
> I wouldn't go for Mnesia either. If you're building something that's
> should possibly scale to the level of Twitter, the 2 GB limit just
> doesn't cut it, with or without fragmentation. And in the tests I've
> done DETS got very slow long before it hit the size limit. If you use
> disc_copies (which doesn't use DETS) instead of disc_only_copies
> you're of course instead limited by RAM.
>
> I (like everyone else) have built a (web) chat service that I'll
> probably (alpha) release this weekend. The single biggest problem I've
> been wrestling with is how to persist the messages in an efficient
> manner. Right now I'm using term_to_binary and dumping that in a flat
> file (much like disk_log), which works for all my needs, except for
> full text search. I'd prefer not to build and store my own reverse
> index.

One approach could perhaps be to break out the DB-part of CouchDB and
make it available as a general purpose Erlang library ?

--Tobbe

Tuncer Ayaz

unread,
May 30, 2008, 6:33:44 AM5/30/08
to Yariv Sadan, erlang-q...@erlang.org
On Thu, May 29, 2008 at 8:34 AM, Yariv Sadan <yariv...@gmail.com> wrote:
>>
>> I don't know who scared Yariv about dets, but it is interesting
>> to note in The Erlang Efficiency Guide it now says.
>>
>> "2.6 Myth: Repairing a Dets file is very slow
>>
>> The repair time is still proportional to the number of records in the
>> file, but Dets repairs used to be much, much slower in the past. Dets
>> has been massively rewritten and improved. "
>>
>> So I don't know if his worries still apply.
>>
>> /Anders
>>
>
> That's great -- I didn't see that!
>
> What scared me was when Klacke said dets should be rewritten.

Is dets and/or ets so needy of a rewrite that lifting the size
limit is not easily implementable with a comparable
performance profile to what it has now?

Sorry if I failed to find the appropriate thread in the mailing
list archives discussing this. If there is any it would be kind
to point me there instead.

<snip>

Joe Armstrong

unread,
May 30, 2008, 10:18:11 AM5/30/08
to Yariv Sadan, erl...@googlegroups.com, erlang-questions
Hi Yariv

Well done - now a few questions:

- what is Twitter? and why is it good (if it is good) - can you
explain in a couple of paragraphs what it does? It seemed to be some
kind of new way for interrupting people ...

(this is from one who thinks that e-mail is intrusive and whose
mobile phone is usually
turned off :-)

- the reaction to your work seems to be (oh but it doesn't scale -
so Erlang must be no good -
despite the fact that I believe you expressly said it was a quick
hack and wasn't designed to
scale)

- I might be nice to say "it doesn't scale YET" - and then sit back
and let us help
you make it VERY scalable. I like a good challenge.

Could you try to describe exactly what Twitter is (in an as
abstract way as possible) so that old
fogies like me can wrap our brains around the problem of making it scale -

So what does it do? How many things must it scale to (give me
numbers here - what are
talking about - scaling is not some abstract quantity - it's number of
bytes of data delivered to end-users
per gram of CO2 - how many bytes/gm of CO2 are we aiming at?)

(aside) What does scalable mean? - I suppose the answer is some
constant cost per user.

A more literal answer would be "doesn't break when we add more users"
- but surely the cost
verses number of users curve must be more important.

In the context of Twitter what does the word "scalable mean" (how
about, we can handle
100,000 users per "box" - each box costs 500$ and consumes 25Watts -
this is constant up to
6x10^9 users)

So if anybody who knows about this stuff can chip in with a few
figures it would be helpful
- what are the desired costs for running twitter for a few million
people - how would you like
this to scale

(/aside)


/Joe Armstrong

Per Melin

unread,
May 30, 2008, 11:02:03 AM5/30/08
to Joe Armstrong, erlang-questions
2008/5/30 Joe Armstrong <erl...@gmail.com>:
> - what is Twitter?

It was launched as a "micro-blogging" service. Each post is limited to
140 characters (like an SMS).

Users can follow what others post with RSS, SMS, instant messaging,
desktop applications (through an API), email or on the Twitter
website.

Some users have tens of thousands of "followers" and some follow
thousands of people.


> - the reaction to your work seems to be (oh but it doesn't scale -
> so Erlang must be no good -
> despite the fact that I believe you expressly said it was a quick
> hack and wasn't designed to
> scale)

I think the reason for this is because Twitter is plagued with daily
outages because they can't handle the load, and the internets have
been filled with speculation on why this is, and how these problems
could be avoided. In these discussions you can often find a commenter
saying that they would've used Erlang and that would've solved all
problems more or less automagically.


> So what does it do? How many things must it scale to (give me
> numbers here - what are
> talking about - scaling is not some abstract quantity - it's number of
> bytes of data delivered to end-users
> per gram of CO2 - how many bytes/gm of CO2 are we aiming at?)

In March Twitter had on average 200 000 users per week with 3 million
messages sent per day.

They have raised $20 million in venture capital to a $100 million
valuation. So they can afford to buy hardware.

Patrick Logan

unread,
May 30, 2008, 12:23:57 PM5/30/08
to erlang-q...@erlang.org
"I might be nice to say "it doesn't scale YET" - and then sit back and
let us help you make it VERY scalable. I like a good challenge."

Another angle on this is to use ejabberd and XMPP. The XMPP folks are
interested in "micro-blogging", ejabberd is scalable, and XMPP itself
is federated. So why build an all-new system when the IETF has already
blessed a protocol for "micro-blogging"?

To match more of Twitter's features though, the system has to go
beyond XMPP per se and integrate with the web, with SMS, etc. That
could come along gradually, but I'm not sure why all of this shouldn't
be based on XMPP at the core of it?

Demo this on top of ejabberd and yaws as two key, scalable subsystems
within an overall "micro-blogging" architecture, so to speak.

-Patrick

Per Melin

unread,
May 30, 2008, 1:00:00 PM5/30/08
to Patrick Logan, erlang-q...@erlang.org
2008/5/30 Patrick Logan <patric...@gmail.com>:

> Another angle on this is to use ejabberd and XMPP.

Can you send a message with XMPP/ejabberd to a client that is not
online? If not, that will get you nowhere.

As far as I know (which in this case isn't far), Twitter already uses
ejabberd, but only to communicate with Jabber IM clients.

Someone from Twitter gave a presentation in 2007 called "Scaling
Twitter" which had a slide on Erlang as an alternative (to Ruby +
MySQL). It said: "What are you doing? Stabbing my eyes out with a
fork."

I guess they did have a quick look, and nothing more, at Erlang.

Dave Smith

unread,
May 30, 2008, 10:51:27 PM5/30/08
to Per Melin, erlang-questions
On Fri, May 30, 2008 at 9:02 AM, Per Melin <per....@gmail.com> wrote:
> 2008/5/30 Joe Armstrong <erl...@gmail.com>:
>> - what is Twitter?
>
> It was launched as a "micro-blogging" service. Each post is limited to
> 140 characters (like an SMS).
>
> Users can follow what others post with RSS, SMS, instant messaging,
> desktop applications (through an API), email or on the Twitter
> website.
>
> Some users have tens of thousands of "followers" and some follow
> thousands of people.

Twitter is basically a large-scale notification problem -- N people
wanting to know most recent status of M other people, where the rate
of status change is high (multiple changes per minute) and the latency
of notification must be low. Furthermore, not all parties are
necessarily "online" at any given moment, so some sort of
store-and-forward must be present such that changes are not missed.

My understanding is that the reason they have such poor uptime is due
to the fact that they modeled the problem as a web-app instead of a
messaging system. Compounded on this poor design choice is that they
have achieved success and so are experiencing record growth every
month -- i.e. no time to recover from past success. :)

Certainly Erlang would be a great foundation for Twitter, but
honestly, the problem they have is not one of language choice so much
as correctly identifying the characteristics of the problem. Along
these lines, twoorl repeats this mistake by modeling it as a web-app
instead of a messaging system, but again..just an opinion. :)

D.

Dave Smith

unread,
May 30, 2008, 10:51:35 PM5/30/08
to Per Melin, erlang-questions
On Fri, May 30, 2008 at 9:02 AM, Per Melin <per....@gmail.com> wrote:
> 2008/5/30 Joe Armstrong <erl...@gmail.com>:
>> - what is Twitter?
>
> It was launched as a "micro-blogging" service. Each post is limited to
> 140 characters (like an SMS).
>
> Users can follow what others post with RSS, SMS, instant messaging,
> desktop applications (through an API), email or on the Twitter
> website.
>
> Some users have tens of thousands of "followers" and some follow
> thousands of people.

Twitter is basically a large-scale notification problem -- N people


wanting to know most recent status of M other people, where the rate
of status change is high (multiple changes per minute) and the latency
of notification must be low. Furthermore, not all parties are
necessarily "online" at any given moment, so some sort of
store-and-forward must be present such that changes are not missed.

My understanding is that the reason they have such poor uptime is due
to the fact that they modeled the problem as a web-app instead of a
messaging system. Compounded on this poor design choice is that they
have achieved success and so are experiencing record growth every
month -- i.e. no time to recover from past success. :)

Certainly Erlang would be a great foundation for Twitter, but
honestly, the problem they have is not one of language choice so much
as correctly identifying the characteristics of the problem. Along
these lines, twoorl repeats this mistake by modeling it as a web-app
instead of a messaging system, but again..just an opinion. :)

D.

Steve

unread,
May 31, 2008, 5:27:12 AM5/31/08
to erlang-q...@erlang.org
"Dave Smith" wrote:

> My understanding is that the reason they have such poor uptime is due
> to the fact that they modeled the problem as a web-app instead of a
> messaging system.

Nice to hear a voice of reason. It perplexes me that people appear
surprised when their LAMP stack fails to scale.

Yariv Sadan

unread,
May 31, 2008, 5:39:40 PM5/31/08
to erl...@googlegroups.com, erlang-questions
Thanks! Comet based scrolling is relatively low priority. There are a
bunch of other feature requests I'm working on. Twoorl users are a
small but pretty excited bunch :)
I'm not planning on keeping the current db-backed timeline rendering.
I realize it wouldn't well when users start having too many friends.
To avoid this bottleneck, I'm planning on keeping a per-user data
store (in mnesia or memcache) for the last 20 twoorls (=tweets) from
their friends. New twoorls would be broadcasted (ie copied) to all
(active) followers. It's a space/speed tradeoff. This solution
wouldn't allow paging (unless I increase the cache size) but I think
it's a good tradeoff.

I still have some way to go before I hit these scaling problems, though.

Yariv

On Sat, May 31, 2008 at 8:53 AM, David Pollak
<feeder.of...@gmail.com> wrote:
> Yariv,
> Nice stuff. A question and an observation:
> Q: Are you planning to add a comet-based scrolling timeline?
> O: The creation of a timeline from the backing store on request strikes me
> as one that is prone to performance problems (see http://twitter.com)
> Thanks,
> David


>
> On Wed, May 28, 2008 at 10:12 PM, Yariv Sadan <yariv...@gmail.com> wrote:
>>

>> Hi,
>>
>> I created an open source Twitter clone in Erlang called Twoorl. I
>> wrote it on top of ErlyWeb/Yaws. You can see it at http://twoorl.com.
>> The code is at http://code.google.com/p/twoorl.
>>
>> I'll appreciate any feedback!
>>
>> Thanks,
>> Yariv
>>
>>
>
>
>

> --
> lift, the simply functional web framework http://liftweb.net
> Collaborative Task Management http://much4.us
> Follow me: http://twitter.com/dpp
> Git some: http://github.com/dpp
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups
> "erlyweb" group.
> To post to this group, send email to erl...@googlegroups.com
> To unsubscribe from this group, send email to
> erlyweb-u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/erlyweb?hl=en
> -~----------~----~----~----~------~----~------~--~---

Yariv Sadan

unread,
May 31, 2008, 6:04:56 PM5/31/08
to Joe Armstrong, erlang-questions
On Fri, May 30, 2008 at 7:18 AM, Joe Armstrong <erl...@gmail.com> wrote:
> Hi Yariv
>
> Well done - now a few questions:
>
> - what is Twitter? and why is it good (if it is good) - can you
> explain in a couple of paragraphs what it does? It seemed to be some
> kind of new way for interrupting people ...
>
> (this is from one who thinks that e-mail is intrusive and whose
> mobile phone is usually
> turned off :-)

Twitter is a micro-blogging service. It allows asynchronous
broadcasting of casual messages/questions/observations/brain-farts to
your friends. It's not intrusive like IM because it's asynchronous.
It's more like a public mailing list where you only see the messages
from people you choose to follow.

At first I didn't get Twitter, but once I started using it I realized
it's kinda fun.

>
> - the reaction to your work seems to be (oh but it doesn't scale -
> so Erlang must be no good -
> despite the fact that I believe you expressly said it was a quick
> hack and wasn't designed to
> scale)

What can ya do? :)

Here's how I userstand the main scaling challenge of a Twitter-like service:

You have M users and each user follows an average of N other users. To
render a user's timeline, you need to present the latest 20 tweets
(twoorls in the new terminology :) ) from all his friends. The naive
way to do it is to fetch N*20 messages, sort them, then take the first
20. This is expensive to do when N is big. Another problem is that
because of the frequent updates from typical users, caching this
calculation doesn't give you much because the cache gets frequently
invalidated between page loads.

The only solution I see is to keep a "latest twoorls" store for each
user. When a friend sends a new twoorl, it (or a pointer to it) is
copied to the twoorl store for all his friends. This trades space for
speed, and it limits the paging you can do to see your twoorl history,
but it's the only way you can scale this kind of service when N is
big.

YC

unread,
May 31, 2008, 9:53:04 PM5/31/08
to Yariv Sadan, erlang-questions
You have M users and each user follows an average of N other users. To
render a user's timeline, you need to present the latest 20 tweets
(twoorls in the new terminology :) ) from all his friends. The naive
way to do it is to fetch N*20 messages, sort them, then take the first
20. This is expensive to do when N is big. Another problem is that
because of the frequent updates from typical users, caching this
calculation doesn't give you much because the cache gets frequently
invalidated between page loads.

The only solution I see is to keep a "latest twoorls" store for each
user. When a friend sends a new twoorl, it (or a pointer to it) is
copied to the twoorl store for all his friends. This trades space for
speed, and it limits the paging you can do to see your twoorl history,
but it's the only way you can scale this kind of service when N is
big.

It seems the solution will look similar to emails and mailing lists, which both are quite scalable. 

Except in this case every account is also a broadcast mailing list. 

Cheers,
yc

Steve

unread,
Jun 1, 2008, 3:23:27 AM6/1/08
to erlang-q...@erlang.org

On May 31, 5:04 pm, "Yariv Sadan" <yarivsa...@gmail.com> wrote:
> ...but it's the only way you can scale this kind of service when N is
> big.

Hmmm, Yariv, aren't you still thinking about this in the way that Dave
Smith pointed to as the heart of the issue? i.e.
Dave said: "My understanding is that the reason they have such poor


uptime is due to the fact that they modeled the problem as a web-app
instead of a messaging system."

I'm aware that you are likely a good way away from hitting any
scalability problems, but some kind of tiering would seem to be
appropriate if twoorl is to be "twitter done right". Yaws at the front
end, definitely - but rather /RabbitMQ/ at the back end. I believe
that you'd then have the flexibility to distribute/cluster as
necessary to scale to the required level (whatever that may be).

For sure, Twoorl is a great demo of what can be done with Erlang in an
incredibly short time. I'm a relative noob to Erlang, and have learned
a great deal from your blog/code/examples.

Steve

Paul Stanley

unread,
Jun 1, 2008, 5:10:16 AM6/1/08
to erlang-q...@erlang.org
On Sun, Jun 1, 2008 at 8:23 AM, Steve <steven.cha...@gmail.com> wrote:
>
> Hmmm, Yariv, aren't you still thinking about this in the way that Dave
> Smith pointed to as the heart of the issue? i.e.
> Dave said: "My understanding is that the reason they have such poor
> uptime is due to the fact that they modeled the problem as a web-app
> instead of a messaging system."

This seems to be the Twitter developers' own conclusion, too
(http://dev.twitter.com/2008/05/twittering-about-architecture.html):

"Twitter is, fundamentally, a messaging system. Twitter was not architected
as a messaging system, however. For expediency's sake, Twitter was built
with technologies and practices that are more appropriate to a content
management system. Over the last year and a half we've tried to make our
system behave like a messaging system as much as possible, but that's
introduced a great deal of complexity and unpredictability. When we're
in crisis mode, adding more instrumentation to help us navigate the web
of interdependencies in our current architecture is often our primary
recourse. This is, clearly, not optimal."

It's worth googling "twitter architecture". There's a lot of rather feverish
discussion on the various issues, but some of it is illuminating.

--
Paul Stanley

David Mitchell

unread,
Jun 1, 2008, 4:41:05 AM6/1/08
to erlang-q...@erlang.org
This is a REALLY interesting discussion, but at this point it's
becoming obvious that I don't know enough about Twitter...

Are you suggesting that Twoorl should be architected as follows:
- when they register, every user gets assigned their own RabbitMQ
incoming and outgoing queues
- user adds a message via Web/Yaws interface (I know, this could be
SMS or something else later...)
- message goes to that user's RabbitMQ incoming queue
- a backend reads messages from the user's incoming queue, looks up in
e.g. a Mnesia table to see who should be receiving messages from that
user and whether they're connected or not. If "yes" to both, RabbitMQ
then forwards the message to each of those users' outgoing queues
- either the receiving users poll their outgoing queue for the
forwarded message, or a COMET-type Yaws app springs to life and
forwards the message to their browser (again, ignoring SMS)

This seems like a reasonable approach; I'm just curious if that's what
you're suggesting, or whether you've got something else in mind.

Great thread, and thanks Yariv for getting this discussion going with Twoorl

Regards

Dave M.

2008/6/1 Steve <steven.cha...@gmail.com>:

Paul Stanley

unread,
Jun 1, 2008, 8:39:19 AM6/1/08
to erlang-q...@erlang.org
On Sun, Jun 1, 2008 at 9:41 AM, David Mitchell <monc...@gmail.com> wrote:

...
> Are you suggesting that Twoorl should be architected as follows:
> - when they register, every user gets assigned their own RabbitMQ
> incoming and outgoing queues
> - user adds a message via Web/Yaws interface (I know, this could be
> SMS or something else later...)
> - message goes to that user's RabbitMQ incoming queue
> - a backend reads messages from the user's incoming queue, looks up in
> e.g. a Mnesia table to see who should be receiving messages from that
> user and whether they're connected or not. If "yes" to both, RabbitMQ
> then forwards the message to each of those users' outgoing queues
> - either the receiving users poll their outgoing queue for the
> forwarded message, or a COMET-type Yaws app springs to life and
> forwards the message to their browser (again, ignoring SMS)
>
> This seems like a reasonable approach; I'm just curious if that's what
> you're suggesting, or whether you've got something else in mind.

I think this is not quite right. As I understand it, twitter messages
are retrieved
by someone who is "following" another user when the follower logs on.
They don't have to be connected when the message is sent.

In other words, the server side has to maintain (or construct) an archive of
"received messages" for each user. Like email. But unlike email (which is
normally longish messages to few people, this system assumes short
messages which may well go to lots of people.

There seem to be two options.

The first (which is what Twitter originally used) is to store every outgoing
message once, and construct the archive on the fly. When a user comes
online and asks for messages, a server looks up who that user follows,
finds the messages from each followed person (which may involve checking
who is allowed to see them), arranges them and delivers them. The cost, in
effort and space, of storing a message is small. The cost of retrieving them
is high, and this (it seems) is where Twitter has been hitting problems. It is
difficult to cache efficiently (since most users follow a different set of
people, so queries are often unique) and they have had trouble scaling it.

The alternative is to process outgoing messages at once, delivering
copies to each "follower". It's tempting to assume this is the obviously
right solution, but bear in mind that it means more heavy weight
processing on send, and that it also means a great deal of redundant
storage. Even if each message is only 200 bytes long, including
metadata, if 10 people are following a message that still means more
than 1K of "unnecessary" storage, and that's before you consider the
need for copies to ensure reliability. As I understand it, some users
are followed by thousands of people ... you can see where that is
going. Tempting as it is to say "throw more disks at it", I don't think
that's an altogether elegant answer: after all, even if storage doesn't
cost much it does cost something.

One solution, which Yariv proposed above, is to limit the number of
messages in each queue, flushing old messages. But as I understand
it, there are many users who like to be able to go back more than 20
messages. So that solves the problem at the cost of desired function.

Perhaps (probably?) the key is to find a way of storing very lightweight
pointers to messages, which can be appear redundantly in many
queues/archives without too much wastage, but without arbitrary
limits on archive size.

The devil is probably in the detail with that, though. In particular, unless
you make them very lightweight you may still have a painfully wastefully
bloated storage requirement and you have to be happy that you have a
blazingly efficient way of retrieving the messages from these pointers.

(I feel this discussion, fascinating as I find it, is not very Erlang-related
(at least not directly). I hope it's worth having though, because these
basic issues are very interesting ... at least I find them so!)

--
Paul Stanley

Steve

unread,
Jun 1, 2008, 10:08:00 AM6/1/08
to erlang-q...@erlang.org
On Jun 1, 3:41 am, "David Mitchell" <monch1...@gmail.com> wrote:
> I'm just curious if that's what
> you're suggesting, or whether you've got something else in mind.

Actually, I haven't fully thought it all through in detail (I'm
supposed to be 100% on my own projects, lol) but I had in mind
something like...

...an MQ backend with maybe a publish and subscribe queue for each
user where the web app instances/sms gateways act as clients to
collect the tweets for presentation/forwarding to the current follower/
user...
...I also was thinking of a peer to peer queue between MQ instances
that would allow for separating the messaging volume out (to allow
unlimited scaling) into messaging "domains" or regions across multiple
instances...
... And that the user accounts (far less volume) could be probably
stored in a single RDBMS *cluster* (think oracle replication), and
possibly the account data could hold appropriate routing info to the
users' "messaging domain"...

...all very vague but seems to me to be the right direction to go.

Regs,

Joe Armstrong

unread,
Jun 1, 2008, 10:10:22 AM6/1/08
to David Mitchell, erlang-q...@erlang.org
It *is* an interesting discussion - not about Twitter - but about architectures.

There seem to be some implicit assumptions here:

Let's suppose that twoorl services are accessed through a *single*
name (www.twooerl.org) or something.

Let's assume we have 20 (for the sake of argument) back-end machines
(could be hundreds or thousands though)( I guess scalable means that
we can just add/remove backend machines without breaking
things :-)

The first step must be to associate 20 IP addresses with this single
name - some kind of DNS load balancing should be possible. (how do you
do this?????)(is this DNS round robin????)(does this scale to
thousands of machines???)

The user tries to connect to www.twooerl.org and looks up the address
in DNS the result is one of these IP addresses.

The user connects to one of theses IP addresses and and requests data
for "joe" - they are immediately redirected to the machine having
joe's data.

The simplest way to find a machine would be to use the idea in chord.

For a machine with IP 123.45.12.34 we create a tuple

{"123.45.12.34", Md5("123.45.12.34")} we do this for all machines and
sort my the Md5 sum


so we get a list of twenty machines, say:

[{"122.34.45.67", Md1}, {"223.56.1.23", Md2}, ... ]

Which machine is "joe" stored on? - to find this we compute md5("joe")
and find the
first machine in the list whose Md5 sum is greater than md5("joe").

The initial machine would perform this computation and redirect to the
machine with joe's data.

The initial machine can also check the liveness this second machine -
and if it is unresponsive
redirect to a machine containing a replica of joe's data (which could
this be? - imagine the
machines arranged in a circle and redirect to the machine nearest to
180 degrees away from the
original machine)

A problem occurs if the original machine is dead - (ie the one that
DNS said was the address associated
with www.twoo.erl) - if the back-end machines are in pairs that
monitor each other then I guess
a gratuitous arp can fix the problem and reassign the the IP address
of the failing machine to a
new machine (which must now take over the role of the first machine)

In all of this the fundamental problem seems to be that if the server
fails then we have to do a lot
of messing around to keep the server on a fixed IP address.

It would be a zillion times easier if the *client* did all this
messing around. If the client
knew about N server address it could automatically redirect to a
different server if the primary server
failed. if the client knew about N machines and performed the md5(Key)
calculation to find the correct machine then the problem would be very
easy to solve with no messing around in the server.

(( The fact that DNS has multiple addresses make writing a resolver
really easy, if one DNS server
fails just try anothert ))

Now if we're talking "standard browsers" then accessing multiple sites
is painful. Javascript and flash
plugins etc. are very restrictive in the sites that they can open
sockets to (for good reasons)
I think only the originating site is permitted.

If we could persuade the users to install a small proxy one their
machines then all these problems would
go away - a standard browser could talk to a proxy on localhost and
this could talk to the multiple
back ends.

What appears to be a tricky problem in scaling things if we have to
keep the back-end servers on fixed
addresses seems a lot easier if the clients have some limited intelligence.

The next step is (ovf course) to allow all the clients to collectively
behave as if they were as server -
I think therefore that the problem is really one about the base level
of a P2P architecture and not
about a twitter architecture per se.

If we did have a simple proxy component that allowed messaging to
multiple sites then this and many other
problems would be easily soluble.

We might imaging a proxy that that was programmable:

It presents a menu like this:

{allow, ServiceName, at, Addr1, Addr2, Addr3, .....}

(( example {allow, twitter, at, www.a.b, www.p.q, www.c.d}
- this means that the proxy can open "twitter" sessions to the
three "trusted" machines in the list

then the web browser could access the proxy that could talk to the
trusted machines -
the trusted machine should just redirect to other machines, until
finally the desired machines are found. ))

Would the fact that you have to install a proxy server limit
deployment of a service? - probably.
Also it opens up a new bag of worms (security) - for good reasons
browsers do not allow plugins
to access multiple sites (only the originating sites).

I suppose therefore, that the inner cluster that provides the service
would have a full P2P structure
and that the service would be accessed by DNS round robin with IP
failover to handle the errors.

I suspect that architecture like this are being used in some Erlang
systems (the details might vary)

If anybody would like to post code and go into the details of how to
rig systems for DNS load balancing
(or whatever it's called) and for IP monitoring and fail-over then we
could get to the interesting
part of building the application)

(( the next bit will be to look at the limits of scaling - still
nobody has talked numbers -
how far can we press a two-tier system - with say 20 name servers
in the front-end that *only* do
redirects - this bit should be very fast ))

(( By combining twitter with IRC we might make the option of
installing a proxy more attractive -
The irc "server" for a group G should really me the client that
first created the group G -
If G drops out the second machine in the group could become the
server and so on.
Really twitter is like having one IRC grrup per person. many
people can join but only the
owner can write to it. What's the difference (architectually) .
))

Cheers

/Joe Armstrong

Per Melin

unread,
Jun 1, 2008, 10:58:02 AM6/1/08
to Joe Armstrong, erlang-q...@erlang.org
2008/6/1 Joe Armstrong <erl...@gmail.com>:

> A problem occurs if the original machine is dead - (ie the one that
> DNS said was the address associated
> with www.twoo.erl) - if the back-end machines are in pairs that
> monitor each other then I guess
> a gratuitous arp can fix the problem and reassign the the IP address
> of the failing machine to a
> new machine (which must now take over the role of the first machine)

This is a problem that every web site faces once it outgrows a single
server, and is usually solved with a dedicated off-the-shelf load
balancer (with a "high enough" MTBF) that detects and excludes failing
servers. These are turn-key solutions that work for the rest of the
web, including banks and others with higher demands than something
like Twitter has.

> The user connects to one of theses IP addresses and and requests data
> for "joe" - they are immediately redirected to the machine having
> joe's data.

Here is where Twitter goes wrong, I think. The data, and the
processing, is not distributed. All of it is in a single MySQL master
server, with two slaves for read access. It's easy to see how that
can't scale.

Patrick Logan

unread,
Jun 1, 2008, 11:01:30 AM6/1/08
to erlang-q...@erlang.org
"Yaws at the front end, definitely - but rather /RabbitMQ/ at the back end."

I'm not sure why RabbitMQ would be a better choice than XMPP /
ejabberd for a text messaging system?

In addition to that XMPP can federate with existing IM systems, an
XMPP-based "Twoorl" could federate itself easily across multiple
organizations.

How easy is it to federate across organizational boundaries AMQP or
RabbitMQ in particular? I mean assuming if Twoorl does get to this
scale, does he want to continue paying for the entire infrastructure?

Dale Harvey

unread,
Jun 1, 2008, 11:03:02 AM6/1/08
to Joe Armstrong, erlang-q...@erlang.org
Flash sockets can open links to external sites, it requires the same
verification as local sites now do since (v9.0.16.0?) which means when
you try to .connect, flash will setup a temporary connection, send a
<policy-file-request/> and expects an xml snippet that looks something
like http://api.flickr.com/crossdomain.xml.

An IP lookup does sound nicer than a full http proxy, As far as I know you
cant do cross domain POSTs with js though, GET works fine with json
return.

2008/6/1 Joe Armstrong <erl...@gmail.com>:

Steve

unread,
Jun 1, 2008, 11:27:38 AM6/1/08
to erlang-q...@erlang.org
On Jun 1, 10:01 am, "Patrick Logan" <patrickdlo...@gmail.com> wrote:
> "Yaws at the front end, definitely - but rather /RabbitMQ/ at the back end."
> I'm not sure why RabbitMQ would be a better choice than XMPP /
> ejabberd for a text messaging system?

Well, I've not really done my "due diligence" on this, but here goes
anyway - the first considerations that come to mind with XMPP/ejabberd
is support for... multiple recipients, offline delivery, clients as
systems (web servers, sms gateways) not user apps (browsers/phones). I
could probably add to the list.

>
> In addition to that XMPP can federate with existing IM systems, an
> XMPP-based "Twoorl" could federate itself easily across multiple
> organizations.

I can't see how this applies here.. you see, the MQ suggestion was for
an additional back-end tier... I wasn't bothering to address the front
end interface(s) -- maybe I wasn't clear enough on that?

> How easy is it to federate across organizational boundaries AMQP or
> RabbitMQ in particular? I mean assuming if Twoorl does get to this
> scale, does he want to continue paying for the entire infrastructure?

I'm not sure I understand the question here in this context. The
suggestion was assuming that you run the entire service a la Twitter.

/regs

Dale Harvey

unread,
Jun 1, 2008, 11:44:53 AM6/1/08
to Steve, erlang-q...@erlang.org
http://www.process-one.net/en/blogs/article/introducing_the_xmpp_application_server/

is also pretty topical read, Blaine Cook (ex twitter dev) mentioned it

"Scaling Twitter as a messaging platform is pretty easy. See Mickaël Rémond's post on the subject. Scaling the archival, and massive infrastructure concerns (think billions of authenticated polling requests per month) are not, no matter what platform you're on. Particularly when you need to take complex privacy concerns into account."

2008/6/1 Steve <steven.cha...@gmail.com>:

Per Melin

unread,
Jun 1, 2008, 11:48:23 AM6/1/08
to Paul Stanley, erlang-q...@erlang.org
2008/6/1 Paul Stanley <pstanl...@googlemail.com>:

> The alternative is to process outgoing messages at once, delivering
> copies to each "follower".

Twitter has features that makes it different from any other messaging
service I can think of. Some examples:

Anyone can go to a users page and see all their posts. So everyone
needs a special follower called "everyone else". Unless the user has
checked the "protect my updates" box in the settings, in which case
only approved followers can view the posts.

And even if *my* posts are public, one of them may be a reply to a
non-public post from someone else, and I believe that in this case
Twitter filters out even the public reply. Unless of course you're
logged in when you're viewing my posts and are also approved to view
the protected post that I was replying to.

I can put @paul in a message to show that it's directed to you. Users
have the option to filter out @-messages that are directed to people
they themselves are not following. If Joe is following me, but not
you, he may or may be shown a post from me with @paul in it.

So at some point in the chain, probably every time you show a post,
you need to pass through several filters, some of which may require
that you examine the settings of other users (which in a distributed
system may live on another node).

Disclaimer: I'm guilty of some speculation here, since I've only
lurked on Twitter, and don't fully know the ins and outs of the
functionality it offers. That will change now. I'll start posting, at
least for a while. My page: http://twitter.com/pmelin

Dominic Williams

unread,
Jun 3, 2008, 6:06:18 PM6/3/08
to Joe Armstrong, erlang-q...@erlang.org
Hi Joe,

> It would be a zillion times easier if the *client* did
> all this messing around. If the client knew about N
> server address it could automatically redirect to a
> different server if the primary server failed. if the
> client knew about N machines and performed the md5(Key)
> calculation to find the correct machine then the problem
> would be very easy to solve with no messing around in the
> server.

Indeed, and I've used this strategy in the past on systems
in which we were responsible for both client and server.

It has one problem, though, which is if the network is not
completely reliable and can become partitioned - different
clients can end up using different servers, resulting in an
inconsistent state. The way around this - leader election -
works nicely in small systems but surely it doesn't scale to
millions of clients?

> The next step is (ovf course) to allow all the clients to
> collectively behave as if they were as server - I think
> therefore that the problem is really one about the base
> level of a P2P architecture and not about a twitter
> architecture per se.

I agree - but P2P architectures come with their own lot
problems (firewalls, heterogenous clients, the legal issues
related to innocent users storing someone else's illegal
data, etc.)

So I think you're right to stress that we probably need to
talk actual numbers, and think about a "sufficiently
scalable" architecture rather than an infinitely scalable one.

Regards,

Dominic Williams
http://dominicwilliams.net

----

Paul Fisher

unread,
Jun 3, 2008, 6:16:15 PM6/3/08
to Dominic Williams, erlang-q...@erlang.org
On Wed, 2008-06-04 at 00:06 +0200, Dominic Williams wrote:
> Hi Joe,
>
> > It would be a zillion times easier if the *client* did
> > all this messing around. If the client knew about N
> > server address it could automatically redirect to a
> > different server if the primary server failed. if the
> > client knew about N machines and performed the md5(Key)
> > calculation to find the correct machine then the problem
> > would be very easy to solve with no messing around in the
> > server.
>
> Indeed, and I've used this strategy in the past on systems
> in which we were responsible for both client and server.
>
> It has one problem, though, which is if the network is not
> completely reliable and can become partitioned - different
> clients can end up using different servers, resulting in an
> inconsistent state. The way around this - leader election -
> works nicely in small systems but surely it doesn't scale to
> millions of clients?

This is why systems that use hashing to distribute load like Amazon's
dynamo also employ a "gossip protocol" to maintain group consistency and
perform failure detection. Even though it is not stated more
specifically in the paper, it is a reasonable guess, base on the
background of their CTO (Werner Vogels), that this employs some form of
virtual synchrony.

The node receiving the call from a client (which may be behind in group
membership and partition knowledge) will set it straight.


--
paul

Steve Arons

unread,
Jun 3, 2008, 7:21:37 PM6/3/08
to erlang-q...@erlang.org
The stream of incoming tweets is bursty. Even if the stream were level, the backend load for store and retrieve would look nonlinear, because it depends on the number of followers for each poster. So the system needs a lot of expensive slack capacity to provide a reasonable quality of service, and this comes on top of a growing user base.

Whether the answer is to hammer a database to retrieve a user's message queue, or to burn storage, I'm wondering if the structure of the social graph might suggest a way of distributing the storage and workload--stuff like the number, size and stability of connected components, and the feasibility of splitting larger components into communities, with the goal of maximizing the probability that a message delivered to a server will be retrieved by another user on that server. But I suppose that could get messy quickly as new cross-server clusters form on the graph. Anyway, I'd be curious to see an analysis of real data on the dynamics of Twitter's message stream and its social graph.

Yariv Sadan

unread,
Jun 4, 2008, 12:45:34 AM6/4/08
to David Mitchell, erlang-q...@erlang.org
I considered using a reliable queuing mechanism such as RabbitMQ or
Amazon SQS but I don't think it would make the architecture inherently
more scalable (more reliable maybe, but not more scalable). I think a
Twitter like solution can be designed to scale using just Yaws, MySQL,
and Mnesia or memcache (and maybe Ejabberd if you add an XMPP
gateway). RabbitMQ or SQS would provide *reliable* asynchronous
background processing, but if you don't need 100% reliability (Twoorl
isn't a banking application after all), you can just spawn Erlang
processes from Yaws to do background tasks after a user posts a
message. Also, using persistent queues doesn't make the need for a
database go away. When you pull a tweet from a queue you have to put
it somewhere so it can be shown on rendered pages, and a database is
the most reasonable place to put it. The main problem in scaling
Twitter/Twoorl is how you architect your database backend --
partitioning, denormalization, replication, load balancing, caching,
etc, will probably make or break your ability to scale.

Yariv

David Mitchell

unread,
Jun 4, 2008, 4:12:07 AM6/4/08
to Yariv Sadan, erlang-q...@erlang.org
Do you need a database at all for the individual messages?

I'm aware that you need to be able to access an archive of a user's
posts, but look at the Twitter use cases for *receiving* messages (and
I REALLY hope I've got these right, otherwise I'll look like a
klutz...):
- user receives all messages that they've "subscribed" to (i.e. from
users that they're following)
- user receives all messages directed to them specifically (i.e. stuff
directed to @me)
- user receives all messages directed to them as part of a group (I
assume this functionality is in Twitter somewhere...)
- user looks at the history of messages sent by a specific user

You could achieve ALL of these by forwarding messages to queues, and
not storing the messages themselves in a central database. For the
4th use case above, you need to be able to retain messages in queues
indefinitely, but there's ways to achieve that without relying on a
single, central database. Yep, you're going to burn through disc
space, but that's unavoidable unless you put in some mechanism for
ageing out old messages.

Essentially your queues *become* the database, and managing a vast
number of queues (and the messages in those queues) is a different
challenge than managing a vast database.

If you assume that you can massively scale out the number of queues
being managed (and IBM does a pretty good job of that with MQ running
on a mainframe, so I assume it's at least feasible to do it with
RabbitMQ or SQS), and you've got the capacity to store a large number
of messages in those queues (a big assumption, but again IBM does it
with MQ), the key 2 items of data you need to manage is (a) mapping of
user IDs to the queues for those users (i.e. queue name, queue server
name), and (b) mapping the many-to-many relationships between users.
That's the bit that needs the rapid response, but you've reduced the
database scalability problem to only that.

Managing the scalability of the actual messages themselves then falls
within the realm of the queue servers & software, and IBM's experience
seems to be that you can go a very long way down the path of just
throwing hardware at that problem before you hit the limits of that
approach.

It's HIGHLY likely I'm missing something here, since I'm still a
novice as far as Twitter's functionality is concerned (and it's late,
and I'm tired...). This thread continues to be a very interesting
discussion of scaling a big bit of infrastructure, and I'm learning a
lot from everyone participating.

Please jump in and tell me where I'm wrong, and please don't think I'm
questioning the decisions you've made. Twoorl is a really impressive
bit of work, and a great demo of what can be achieved relatively
easily with Erlyweb. For me at least, it's really opened my eyes to
what could be achieved using pure Erlang infrastructure.

Regards

David Mitchell

2008/6/4 Yariv Sadan <yariv...@gmail.com>:

Bob Ippolito

unread,
Jun 4, 2008, 4:18:15 AM6/4/08
to Yariv Sadan, erlang-q...@erlang.org
Well once you start denormalizing and partitioning/sharding a database
you really start losing the advantages of having one.

It'd probably be easy enough to have an append-only file(s) on disk
strategy per user to represent their queue (something like disk_log
that is easy to read backwards), partitioning that would be very easy.
You could then have a second file or database that lists the incoming
and outgoing connections so the process that represents that user
knows which queues to publish and subscribe to.

Joe Armstrong

unread,
Jun 4, 2008, 5:53:06 AM6/4/08
to Paul Fisher, erlang-q...@erlang.org
On Wed, Jun 4, 2008 at 12:16 AM, Paul Fisher <pfi...@alertlogic.net> wrote:
> On Wed, 2008-06-04 at 00:06 +0200, Dominic Williams wrote:
>> Hi Joe,
>>
>> > It would be a zillion times easier if the *client* did
>> > all this messing around. If the client knew about N
>> > server address it could automatically redirect to a
>> > different server if the primary server failed. if the
>> > client knew about N machines and performed the md5(Key)
>> > calculation to find the correct machine then the problem
>> > would be very easy to solve with no messing around in the
>> > server.
>>
>> Indeed, and I've used this strategy in the past on systems
>> in which we were responsible for both client and server.
>>
>> It has one problem, though, which is if the network is not
>> completely reliable and can become partitioned - different
>> clients can end up using different servers, resulting in an
>> inconsistent state. The way around this - leader election -
>> works nicely in small systems but surely it doesn't scale to
>> millions of clients?

Right - but now we need to analyse the consequences of
of a partitioning. I can imagine two extreme end-cases cases

1) best effort repair - basically ignore consistency - two different
users could get different
subsets of the information - this will only happen when the system is
failing and
will only effect users on the failing machines.

2) strong consistency (all uses get the same consistent view)
Expensive and difficult


(then there are many inbetween cases)

For a free service then 1) would be easy to implement and "good enough"

For a fully paid service then 2)

1) should be easy to do without a fully synchronised back-end - use a
simple gossip and the
notes will "eventually" become consistent.

As long as the users are aware of this it shouldn't be a problem.

/Joe

Steve Davis

unread,
Jun 4, 2008, 8:57:25 AM6/4/08
to erlang-q...@erlang.org
Hi all,

This is indeed an interesting discussion, albeit totally orthogonal to
what I'm *supposed* to be focused on... :)

WRT to the use of MQ --- David Mitchell seems to have covered the
issue of persistent queues and thus points out - correctly to my mind
- that there's no need for RDBMS storage of tweet content, but then...

On Jun 3, 11:45 pm, "Yariv Sadan" <yarivsa...@gmail.com> wrote:
> I considered using a reliable queuing mechanism such as RabbitMQ or
> Amazon SQS but I don't think it would make the architecture inherently
> more scalable (more reliable maybe, but not more scalable).

Yariv, did you take a look at RabbitMQ clustering? It's claimed to be
near-linear (though I don't have practical experience of this, it
feels like it should be a valid claim)... http://www.rabbitmq.com/clustering.html
BTW, in case anyone is wondering -- i have no commercial interest or
connection with either lshift or cohesive.

Unlike Joe, I'm not concerned by the Web server (Yaws) scaling -- if
you were to get serious you'd probably shove a hardware load balancer
setup in front of your web servers and that would allow you to scale
httpd instances horizontally pretty much without limit (an example for
those interested in the HWLB would be Cisco's Content Services Switch
- http://www.cisco.com/en/US/products/hw/contnetw/ps792/ ) -- I'm
assuming here that the web server user session becomes decoupled from
the persistent tweet data by moving the latter out into the
(persistent queue) MQ cluster...

Also - I did do a bit of reading around after seeing Patrick Logan's
suggestion for using XMPP/Ejabberd as an alternative to MQ. It does
appear that it also could be a back-end candidate - and indeed Mickael
Remond has written on that issue directly
http://www.process-one.net/en/blogs/article/introducing_the_xmpp_application_server/
--- using XMPP is a less "traditional" approach than using an MQ, but
Mickael himself does present evidence that the idea has legs. So yes,
ejabberd does look like an alternative candidate to RabbitMQ that's
worth examining closely for anyone actually building a "Twitter done
right".

regs
/s

Steve Davis

unread,
Jun 4, 2008, 10:06:27 AM6/4/08
to erlang-q...@erlang.org
I just reread the entire thread and have three short "addenda"...
1) I realized that I simply reposted the link to process-one that Dale
Harvey introduced and set me off on looking at ejabberd - apologies
for that.
2) Joe's very first post in the thread suggests using mnesia as a
front-end cache - for a first step in dealing with scaling - i.e.
evolving your architecture to keep step with demand and so "go
vertical first" - this suggestion seems to me to be exactly right.
3) What has struck me from this discussion is that there are a large
number of options *already available* as good architectural components
for building full-scale systems. Beyond hardware and the inevitable(?)
RDBMS, everything can be done with Erlang/OTP + OTP applications.
Interesting...

Damien Morton

unread,
Jun 4, 2008, 10:16:09 AM6/4/08
to Steve Davis, erlang-q...@erlang.org
I think Bob hit the nail on the head with this.

No database is needed. Everything can pretty much be achieved with files
containing fixed-length records.

Per Melin

unread,
Jun 4, 2008, 10:34:02 AM6/4/08
to damien...@acm.org, erlang-q...@erlang.org
2008/6/4 Damien Morton <dmo...@bitfurnace.com>:

> I think Bob hit the nail on the head with this.
>
> No database is needed. Everything can pretty much be achieved with files
> containing fixed-length records.

And indeed, Twitter wrote on their blog Saturday:
"Our new architecture will move our reliance to a simple, elegant
filesystem-based approach, rather than a collection of database
[sic]."

Steve Davis

unread,
Jun 4, 2008, 10:56:14 AM6/4/08
to erlang-q...@erlang.org

On Jun 4, 9:16 am, Damien Morton <dmor...@bitfurnace.com> wrote:
> I think Bob hit the nail on the head with this.
>
> No database is needed. Everything can pretty much be achieved with files
> containing fixed-length records.

What you and Bob have said is, of course, true. I suspect that as the
system becomes large scale then the software required to access and
manage those files would eventually end up looking a lot like one
of... an RDBMS, a Message Queue, an IM server, etc.

Steve Davis

unread,
Jun 4, 2008, 11:02:47 AM6/4/08
to erlang-q...@erlang.org
On Jun 4, 9:16 am, Damien Morton <dmor...@bitfurnace.com> wrote:
> No database is needed. Everything can pretty much be achieved with files
> containing fixed-length records.

...but you probably took my comment "and the inevitable(?) RDBMS" as
meaning that the architecture should be database-centric - actually
that's not what it meant -- as I described previously the RDBMS does
have a place for user accounts (not for the tweet storage). The
"inevitability" part was referring to the fact that pretty much any
(not all!) large-scale architecture will "need" an RDBMS "somewhere".

Lev Walkin

unread,
Jun 4, 2008, 11:12:28 AM6/4/08
to Steve Davis, erlang-q...@erlang.org
Steve Davis wrote:
>
> On Jun 4, 9:16 am, Damien Morton <dmor...@bitfurnace.com> wrote:
>> I think Bob hit the nail on the head with this.
>>
>> No database is needed. Everything can pretty much be achieved with files
>> containing fixed-length records.
>
> What you and Bob have said is, of course, true. I suspect that as the
> system becomes large scale then the software required to access and
> manage those files would eventually end up looking a lot like one
> of... an RDBMS, a Message Queue, an IM server, etc.

... yet, with relaxed ACID properties, making it much more scalable.

Bob Ippolito

unread,
Jun 4, 2008, 11:18:20 AM6/4/08
to Per Melin, damien...@acm.org, erlang-q...@erlang.org
On Wed, Jun 4, 2008 at 7:34 AM, Per Melin <per....@gmail.com> wrote:
> 2008/6/4 Damien Morton <dmo...@bitfurnace.com>:
>> I think Bob hit the nail on the head with this.
>>
>> No database is needed. Everything can pretty much be achieved with files
>> containing fixed-length records.
>
> And indeed, Twitter wrote on their blog Saturday:
> "Our new architecture will move our reliance to a simple, elegant
> filesystem-based approach, rather than a collection of database
> [sic]."

I totally forgot that twitter messages are limited to 160 characters.
Yeah, that makes it even easier... with those constraints I doubt I'd
have even used a database in a prototype, let alone production.

-bob

Steve Davis

unread,
Jun 4, 2008, 12:08:48 PM6/4/08
to erlang-q...@erlang.org
On Jun 4, 10:12 am, Lev Walkin <v...@lionet.info> wrote:
> ... yet, with relaxed ACID properties, making it much more scalable.

The level to which I disagree with your conclusion here is quite
profound! I see a large-scale future of lost messages, data
corruption, file replication and synchronization issues, and
*downtime* for the FS approach.

The raw FS solution IMHO will only "work" up to a point... in a sense
the FS suggestion parallels (but is not the same as) MySQL's early and
conscious sacrifice of referential integrity for performance (InnoDB
does not resolve this btw, for those that know the issues with MySQL).
This decision at MySQL has resulted in numerous high-profile scaling
issues for many services that committed to MySQL for their persistence
needs. Twitter isn't a good example of this kind of failure - it
shouldn't have been using the RDBMS at all in the way that it did.

All in all, the fundamental application scope of Twitter simply
*screams* Message Queue at me. I'm not sure why the "experts" that
Twitter have scavenged from IBM and Google haven't come to that
conclusion also. Since Twitter have appeared to commit to a FS
approach, I guess we'll have to see if future history proves me
incorrect :)

/s

Damien Morton

unread,
Jun 4, 2008, 11:16:48 AM6/4/08
to Steve Davis, erlang-q...@erlang.org
Well, the relational database is suitable for the quickly evolving part
of their business, and there will always be a quickly evolving part of
any business, so yes - inevitability is the perfect description.

Now that their business has stabilised and they know what a larger part
of it is supposed to do and what the performance goals they need to
reach are, they can begin to crate a more specialised datastructure
optimised for their (now known) use case.

Question is - can they create such a specialised datastructure such that
it _can_ evolve, for example, in the case where they wanted to handle
not just 140 byte tweets, but also photos and video (twotos, and twideos)?

On 6/5/2008 1:02 AM, Steve Davis wrote:
> On Jun 4, 9:16 am, Damien Morton <dmor...@bitfurnace.com> wrote:
>
>> No database is needed. Everything can pretty much be achieved with files
>> containing fixed-length records.
>>
>
> ...but you probably took my comment "and the inevitable(?) RDBMS" as
> meaning that the architecture should be database-centric - actually
> that's not what it meant -- as I described previously the RDBMS does
> have a place for user accounts (not for the tweet storage). The
> "inevitability" part was referring to the fact that pretty much any
> (not all!) large-scale architecture will "need" an RDBMS "somewhere"

Steve Davis

unread,
Jun 4, 2008, 12:31:45 PM6/4/08
to erlang-q...@erlang.org

On Jun 4, 10:16 am, Damien Morton <dmor...@bitfurnace.com> wrote:
> Question is - can they create such a specialised datastructure such that
> it _can_ evolve, for example, in the case where they wanted to handle
> not just 140 byte tweets, but also photos and video (twotos, and twideos)?

Actually, I immediately can see how to cope with this already if were
to use the MQ-backend approach. You have a number of regionalized
upload servers (i.e. asset servers) where the media gets stored and
served - probably exposing RESTful locatars, and the mediatweet that
goes through the MQ becomes just a URL reference.

Maybe we could even use Joe's Shoutcast server (the Crosswalk Book on
p. 259) to stream out the media...? (OK that was just a trick to keep
this thread relevant to Erlang :)

/s

Damien Morton

unread,
Jun 4, 2008, 12:33:47 PM6/4/08
to Steve Davis, erlang-q...@erlang.org
Allow me to disagree with you here.

The purpose of an RDMS is to enable easy changes to schemas and making
arbitrary queries. It is a thing designed to facilitate rapid changes in
direction. If you have a look at the scientific community, for example,
if performance is required for a known application, specialised
databases tend to flourish.

Bob Ippolito

unread,
Jun 4, 2008, 12:36:47 PM6/4/08
to damien...@acm.org, Steve Davis, erlang-q...@erlang.org
Two words: tiny URLs :) That's what people are doing anyway. Something
like oEmbed but a little smarter (it needs autodetection, via headers
or a microformat) can handle that part: http://oembed.com/

Per Melin

unread,
Jun 4, 2008, 12:46:48 PM6/4/08
to Steve Davis, erlang-q...@erlang.org
2008/6/4 Steve Davis <steven.cha...@gmail.com>:

> All in all, the fundamental application scope of Twitter simply
> *screams* Message Queue at me.

Doesn't a message queue imply that each message is, at some point,
delivered and then removed from the queue?

In that case: Delivered where?

The two main interfaces to Twitter, their website and their HTTP API,
both just fetch and show the N last messages received. They don't even
have a read/unread flag on the messages.

It seems to me that it's more of a message log than a message queue.

Lev Walkin

unread,
Jun 4, 2008, 12:45:28 PM6/4/08
to Steve Davis, erlang-q...@erlang.org
Steve Davis wrote:
> On Jun 4, 10:12 am, Lev Walkin <v...@lionet.info> wrote:
>> ... yet, with relaxed ACID properties, making it much more scalable.
>
> The level to which I disagree with your conclusion here is quite
> profound! I see a large-scale future of lost messages, data
> corruption, file replication and synchronization issues, and
> *downtime* for the FS approach.
>
> The raw FS solution

Who talks about _raw_ FS solution? You are? I am certainly not.

> IMHO will only "work" up to a point... in a sense
> the FS suggestion parallels (but is not the same as) MySQL's early and
> conscious sacrifice of referential integrity for performance (InnoDB
> does not resolve this btw, for those that know the issues with MySQL).
> This decision at MySQL has resulted in numerous high-profile scaling
> issues for many services that committed to MySQL for their persistence
> needs. Twitter isn't a good example of this kind of failure - it
> shouldn't have been using the RDBMS at all in the way that it did.
>
> All in all, the fundamental application scope of Twitter simply
> *screams* Message Queue at me. I'm not sure why the "experts" that
> Twitter have scavenged from IBM and Google haven't come to that
> conclusion also. Since Twitter have appeared to commit to a FS
> approach, I guess we'll have to see if future history proves me
> incorrect :)

Steve, I run a quite high traffic, distributed, file-system
based (as opposed to RDBMS based) service, js-kit.com

http://www.techaddress.com/2008/05/29/js-kit-scores-deal-with-worldnow-adds-19-million-potential-users/

The service internally provides itself a CID guarantee,
with "eventual consistency" instead of A, quite like Amazon SimpleDB:

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

We also use message queues internally to provide reliability and
sychronization.

One of the most important thing which allowed us to gain scalability
is our use of file system storage behind all this CID machinery on top
of it.

When, as Per Melin says, Twitter wrote on their blog Saturday:


"Our new architecture will move our reliance to a simple, elegant
filesystem-based approach, rather than a collection of database
[sic]."

I believe the emphasis was placed on replacement of RDBMS
with something _on top of a raw FS_, not just _with_ a raw FS.

--
Lev Walkin
v...@lionet.info

Paul Fisher

unread,
Jun 4, 2008, 1:12:04 PM6/4/08
to Steve Davis, erlang-q...@erlang.org
On Wed, 2008-06-04 at 09:08 -0700, Steve Davis wrote:
> On Jun 4, 10:12 am, Lev Walkin <v...@lionet.info> wrote:
> > ... yet, with relaxed ACID properties, making it much more scalable.
>
> The level to which I disagree with your conclusion here is quite
> profound! I see a large-scale future of lost messages, data
> corruption, file replication and synchronization issues, and
> *downtime* for the FS approach.
>

I promise that the RDBMS wall exists... we deal with more "records" that
you would ever imagine putting into a relational database. For example,
on one small portion of our infrastructure we still use MySQL databases
with 8-core 32GB machines, and we have to limit each instance to no more
than a few days worth of data in order to keep it going with reasonable
performance. Imagine having to commission database instances every
other day of 100s of GBs.

Any of the Google papers or recent articles from Stonebreaker discuss
this in directly in terms of the limits of RDBMS systems.

Finally, i am also here to attest that the filesystem approach scales
well past were the RDBMS falls over. You just need to architect the
system to balance the trade-offs, which naturally exist at extreme
levels of scale.


--
paul

Kevin Scaldeferri

unread,
Jun 4, 2008, 1:23:50 PM6/4/08
to damien...@acm.org, Steve Davis, erlang-q...@erlang.org

On Jun 4, 2008, at 9:33 AM, Damien Morton wrote:

> The purpose of an RDMS is to enable easy changes to schemas and making
> arbitrary queries. It is a thing designed to facilitate rapid
> changes in
> direction.


Could you tell that to my DBAs? ;-)

Seems like it usually takes months to make a schema change around here.


-k

Kevin Scaldeferri

unread,
Jun 4, 2008, 1:22:42 PM6/4/08
to Per Melin, Steve Davis, erlang-q...@erlang.org

On Jun 4, 2008, at 9:46 AM, Per Melin wrote:

> 2008/6/4 Steve Davis <steven.cha...@gmail.com>:
>> All in all, the fundamental application scope of Twitter simply
>> *screams* Message Queue at me.
>
> Doesn't a message queue imply that each message is, at some point,
> delivered and then removed from the queue?
>
> In that case: Delivered where?
>
> The two main interfaces to Twitter, their website and their HTTP API,
> both just fetch and show the N last messages received.

Most people seem to get tweets via IM or SMS, I thought.


-k

Steve Davis

unread,
Jun 4, 2008, 1:29:14 PM6/4/08
to erlang-q...@erlang.org

On Jun 4, 11:45 am, Lev Walkin <v...@lionet.info> wrote:
> Who talks about _raw_ FS solution? You are? I am certainly not.

Lev: Actually, I didn't see anyone mention anything other than a raw
FS solution, hence the "rider", so thanks for expanding on your
initial comment!

> Steve, I run a quite high traffic, distributed, file-system
> based (as opposed to RDBMS based) service, js-kit.com
>

> http://www.techaddress.com/2008/05/29/js-kit-scores-deal-with-worldno...


>
> The service internally provides itself a CID guarantee,
> with "eventual consistency" instead of A, quite like Amazon SimpleDB:
>
> http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
>
> We also use message queues internally to provide reliability and
> sychronization.
>
> One of the most important thing which allowed us to gain scalability
> is our use of file system storage behind all this CID machinery on top
> of it.

At first sight, I don't see the immediate applicability of these
architectures to the "twitter problem", but there's a lot of detail
there so I'll take the time to look more closely and actually fully
read the amazon paper to see what's there. At the very least, it
should be an interesting/informative read.

> When, as Per Melin says, Twitter wrote on their blog Saturday:
> "Our new architecture will move our reliance to a simple, elegant
> filesystem-based approach, rather than a collection of database
> [sic]."
>
> I believe the emphasis was placed on replacement of RDBMS
> with something _on top of a raw FS_, not just _with_ a raw FS.

Whilst elegant, neither your solution nor Amazon's seem "simple". One
can only hope that Twitter have thought it through to the extent you
have for your architecture, and that the "something on top" that they
have in mind does actually address the issues that I raised. For now
and for me, the best candidate for that "something on top of the FS"
remains a Message Queue :)

To Per Melin: I'm thinking subscription queues (pubsub) and messages
having a time-to-live (TTL). Perhaps one subscriber might be a logger
for permanent record.

/s

Scott Lystig Fritchie

unread,
Jun 4, 2008, 1:54:12 PM6/4/08
to Steve Davis, erlang-q...@erlang.org
Steve Davis <steven.cha...@gmail.com> wrote:

sd> All in all, the fundamental application scope of Twitter simply
sd> *screams* Message Queue at me. I'm not sure why the "experts" that
sd> Twitter have scavenged from IBM and Google haven't come to that
sd> conclusion also. Since Twitter have appeared to commit to a FS
sd> approach, I guess we'll have to see if future history proves me
sd> incorrect :)

In the serious, serious business of arm-chair architecting, I'd say that
a file system approach is almost certainly more scalable than their
current implemention.

As for "message queueing", there may be a misunderstanding over how MQ
systems typically work: they have producers *and* consumers, and (more
importantly) consumers actually "consume". Consuming a queue item
usually means also deleting it from the queue. A single Twitter user X
can have thousands of consumers all trying to consume the same messages,
but in a typical MQ system, all but the first consumer would find X's
queue empty.

For one example, see the RabbitMQ FAQ, "Q. How do I archive to RDBMS?".

-Scott

Steve Davis

unread,
Jun 4, 2008, 2:10:17 PM6/4/08
to erlang-q...@erlang.org
Scott Lystig Fritchie <fritc...@snookles.com> wrote:
> In the serious, serious business of arm-chair architecting...

http://en.wikipedia.org/wiki/Publish/subscribe

BTW I find veiled ad hominem attacks somewhat distasteful.

Yariv Sadan

unread,
Jun 4, 2008, 2:16:08 PM6/4/08
to damien...@acm.org, Steve Davis, erlang-q...@erlang.org
You nailed it. Just last night I added a couple of columns to the
'msg' table to be able to store the user's gravatar settings in a
denormalized fashion. If I had committed to a fixed file-based format
for all user archives, making such a change would have been a pretty
big pain.

I see file based storage as premature optimization. You sacrifice
functionality for speed. That's probably why Google built BigTable on
top of GFS.

Yariv

On Wed, Jun 4, 2008 at 8:16 AM, Damien Morton <dmo...@bitfurnace.com> wrote:

David Mitchell

unread,
Jun 4, 2008, 6:03:35 PM6/4/08
to Scott Lystig Fritchie, erlang-q...@erlang.org
2008/6/5 Scott Lystig Fritchie <frit...@snookles.com>:
> Steve Davis <steven.cha...@gmail.com> wrote:

> As for "message queueing", there may be a misunderstanding over how MQ
> systems typically work: they have producers *and* consumers, and (more
> importantly) consumers actually "consume". Consuming a queue item
> usually means also deleting it from the queue. A single Twitter user X
> can have thousands of consumers all trying to consume the same messages,
> but in a typical MQ system, all but the first consumer would find X's
> queue empty.
>
> For one example, see the RabbitMQ FAQ, "Q. How do I archive to RDBMS?".

In case anyone's losing track, I was the one who suggested keeping
tweets in queues essentially forever, and having users retrieve them
from queues without deleting the message from the que.

I understand how MQ works in normal environments; what I'm suggesting
is that Twitter (and any clones) aren't "normal" once they start to
scale up to many millions of users.

The reasons I suggested storing messages in queues indefinitely are:
- experience says that queueing systems can scale very large, and that
it appears to be an "easier" problem to solve than scaling a database
very large. I'll accept it if anyone complains about "gross
generalisation"...
- the APIs for storing messages to queues and then retrieving them are
designed to be very fast, and (again referencing IBM's MQ) we know
they can scale to queues holding very large numbers of messages

Storing messages in flat files seems to have a couple of limitations to me:
- if you're going to store 1 message per flat file, you need a
database (or database-like thing) to track those zillions of flat
files. I figure that's going to put you back where you started in
terms of scalability
- assuming you're always appending messages to the end of flat files,
you'd have to assume that most requests will be for the most recent
message i.e. the last message in the file. Do you really want to be
seeking through to the last record of flat files all the time? That
doesn't seem to be a scalable approach
- alternately, if you always add the most recent message to the
*start* of a flat file, you'll constantly be rewriting the entire file
(at least, that's the case in any file system I can think of; there
might be an exception). I suppose you could write your own file
system to optimise that...

Please speak up if you've got any thoughts - I'm treating this like a
bunch of intellectuals throwing ideas around, rather than an argument
about right and wrong, and it seems that everyone else is too at this
stage. Very happy to be convinced I'm wrong, in other words

Regards

David Mitchell

Lev Walkin

unread,
Jun 4, 2008, 6:26:17 PM6/4/08
to David Mitchell, erlang-q...@erlang.org

better coalesce the messages for a particular user's consumption in
a single file. better for FS inodes utilization, seek times (latency),
disk and memory fragmentation.

> - assuming you're always appending messages to the end of flat files,
> you'd have to assume that most requests will be for the most recent
> message i.e. the last message in the file. Do you really want to be
> seeking through to the last record of flat files all the time? That
> doesn't seem to be a scalable approach

btw, seek to the end is O(1), if not O(0) (jokingly), if the file
entries are self-delimiting and (as an optimization)
double-tagging (message size at the beginning and at the end of the
message).

> - alternately, if you always add the most recent message to the
> *start* of a flat file, you'll constantly be rewriting the entire file
> (at least, that's the case in any file system I can think of; there
> might be an exception). I suppose you could write your own file
> system to optimise that...

this is not necessary.

Joe Armstrong

unread,
Jun 5, 2008, 5:52:18 AM6/5/08
to David Mitchell, erlang-q...@erlang.org

Yes yes yes - I have for a long time thought that non-destructive
persistent queues are the perfect data structure for
many applications. I can't see why REST has GET, PUT, POST and DELETE
- It should have GET and APPEND
(only).

Appending things to a input queue *and never deleting them* seems to
me a perfect way to deal with things.
If you every want to delete things it can only be for pragmatic
reasons and should be done years later
in a garbage collection sweep. If you never delete things you can
always go back later an fix errors!

The question of how to store the queue is unrelated to the abstraction
- probably disk replicated with
a ram replica of the last N entries in the tail of the queue. If the
queue entries are fixed length
(or multiples of a fixed length) then life becomes a lot easier

Many things can be build using this abstraction. Add fault-tolerance
and location transparency and you
have a wonderfully useful mechanism. (ie it would be very nice to have
GUID's that identify persistent queues -
how to do this is orthogonal to the storage mechanisms involved). To
start with a queues identified by
{Ip,Port,name} would be useful :-)

Cheers

/Joe Armstrong

Patrick Logan

unread,
Jun 5, 2008, 1:23:40 PM6/5/08
to erlang-q...@erlang.org
From: Steve Davis <steven.cha...@gmail.com>

>> No database is needed. Everything can pretty much be achieved with
>> files containing fixed-length records.
>
> What you and Bob have said is, of course, true. I suspect that as
> the system becomes large scale then the software required to access
> and manage those files would eventually end up looking a lot like
> one of... an RDBMS, a Message Queue, an IM server, etc.

Hmm. Sounds kind of like a "record" has most of the properties of an
Atom format "Entry" and a series of these are like a "Feed". Whereever
these are stored transiently or archived semi-permanently, these are
small, time-ordered, series of information. Not surprising.

From: Steve Davis <steven.cha...@gmail.com>

> The level to which I disagree with your conclusion here is quite
> profound! I see a large-scale future of lost messages, data
> corruption, file replication and synchronization issues, and
> *downtime* for the FS approach.
>

> All in all, the fundamental application scope of Twitter simply

> *screams* Message Queue at me. I'm not sure why the "experts" that

> Twitter have scavenged from IBM and Google haven't come to that

> conclusion also. Since Twitter have appeared to commit to a FS

> approach, I guess we'll have to see if future history proves me

> incorrect :)

This decision could probably be broken down into some general "needs"
(e.g. Twoorl needs time-ordering, categorization by author,
persistence of an entry for some length of time, etc.). Then various
solution alternatives could be mapped onto some or all of those
needs. I think it's too much too soon to say either "use a FS
approach" or "use a queue approach". Plus you'd certainly have to get
to the level of which kind of file system or queue for which needs?
e.g. I wrote before about XMPP as a suitable queue for several of the
needs, and for file systems, ZFS may have some nice attributes for
some of these needs.

From: Damien Morton <dmo...@bitfurnace.com>
> Well, the relational database is suitable for the quickly evolving
> part of their business, and there will always be a quickly evolving
> part of any business, so yes - inevitability is the perfect
> description.

Ironic since I hardly ever see the phrases "relational database" and
"quickly evolving" used together except in an inverse
relationship. 8^D

> Question is - can they create such a specialised datastructure such
> that it _can_ evolve, for example, in the case where they wanted to
> handle not just 140 byte tweets, but also photos and video (twotos,
> and twideos)?

I would probably consider something like an Amazon SimpleDB service as
a potential solution alternative.

-Patrick

Damien Morton

unread,
Jun 5, 2008, 1:33:46 PM6/5/08
to Patrick Logan, erlang-q...@erlang.org
On 6/6/2008 3:23 AM, Patrick Logan wrote:
> From: Damien Morton <dmo...@bitfurnace.com>
>
>> Well, the relational database is suitable for the quickly evolving
>> part of their business, and there will always be a quickly evolving
>> part of any business, so yes - inevitability is the perfect
>> description.
>>
>
> Ironic since I hardly ever see the phrases "relational database" and
> "quickly evolving" used together except in an inverse
> relationship. 8^D
>

That's true - relational databases have ossified. The original Date book
on SQL databases, IIRC, seemed to cast them in the light of being a tool
for executives to quickly and arbitrarily query the company's
information repository. I cant quite remember what the orginal IBM SQL
database was called, but Query-By-Example seemed to be an essential element.

Still, can you name a database that better at evolving than a relational
database?

Christian S

unread,
Jun 5, 2008, 3:18:13 PM6/5/08
to damien...@acm.org, Patrick Logan, erlang-q...@erlang.org
> That's true - relational databases have ossified. The original Date book
> on SQL databases, IIRC, seemed to cast them in the light of being a tool
> for executives to quickly and arbitrarily query the company's
> information repository. I cant quite remember what the orginal IBM SQL
> database was called, but Query-By-Example seemed to be an essential element.

My experience with rdbms told me that it is a really bad idea to write
queries that depend on table names and layouts. That is what makes it
so hard to evolve a db. All the applications using it depend on the
current schema.

So stored procedures are a really good idea.


I assume there is a similar lesson to mnesia, put all your transaction
funs for a given mnesia-using application in a single module.

Richard A. O'Keefe

unread,
Jun 5, 2008, 6:55:46 PM6/5/08
to Joe Armstrong, erlang-q...@erlang.org
On 5 Jun 2008, at 9:52 pm, Joe Armstrong wrote:
> Appending things to a input queue *and never deleting them* seems to
> me a perfect way to deal with things.
> If you every want to delete things it can only be for pragmatic
> reasons and should be done years later
> in a garbage collection sweep. If you never delete things you can
> always go back later an fix errors!

I note that this is one of the reasons why SAP is successful.
Businesses use SAP because they *can* use SAP; their national tax
departments are happy with SAP. Their national tax departments
are happy with SAP because they can *audit* business data held in
SAP. And they can do *that* because (ta-dah!) SAP never deletes
anything! If there is an incorrect data entry, for example, you
can mark it as incorrect and replace it with a corrected record,
but you cannot remove the old record from the system.

I note that you can now buy a
Western Digital WD10EACS 1 Terabyte 7200 RPM disc drive for
USD 253 (which is currently NZD 330), price from pixelUSA.com.
Deleting stuff makes a lot less sense than it used to.

Barry Kelly

unread,
Jun 5, 2008, 10:52:14 PM6/5/08
to erlang-q...@erlang.org
Joe Armstrong wrote:

> Yes yes yes - I have for a long time thought that non-destructive
> persistent queues are the perfect data structure for
> many applications. I can't see why REST has GET, PUT, POST and DELETE
> - It should have GET and APPEND
> (only).

> Appending things to a input queue *and never deleting them* seems to
> me a perfect way to deal with things.

There are privacy and security ramifications to such a design. A service
provider implemented using such a model may hold passwords and customer
data much longer than it needs to.

Furthermore, the semantic operations of PUT and DELETE still need
implementing - customers still want to logically upsert and delete
resources, so a second-level API convention or standard is still
required.

Why not simply implement PUT, POST and DELETE as enqueued operations on
the server side?

-- Barry

--
http://barrkel.blogspot.com/

Joe Armstrong

unread,
Jun 6, 2008, 8:48:16 AM6/6/08
to Barry Kelly, erlang-q...@erlang.org
On Fri, Jun 6, 2008 at 4:52 AM, Barry Kelly <bkel...@gmail.com> wrote:
> Joe Armstrong wrote:
>
>> Yes yes yes - I have for a long time thought that non-destructive
>> persistent queues are the perfect data structure for
>> many applications. I can't see why REST has GET, PUT, POST and DELETE
>> - It should have GET and APPEND
>> (only).
>
>> Appending things to a input queue *and never deleting them* seems to
>> me a perfect way to deal with things.
>
> There are privacy and security ramifications to such a design. A service
> provider implemented using such a model may hold passwords and customer
> data much longer than it needs to.
>

Security has nothing to do with this argument - if a password is sent
over the network it has been
sent - nothing can alter that.

A man in the middle might store the message forever so it would make
no difference if the
server stores the data for a millisecond or a trillion years.

The reason for storing things in an appended log is to be able to
replay the log later if things go wrong and
recover from errors - it has nothing to do with security. Security and
privacy has to do with the level of
encryption that is applied to the items in the log.

/Joe

Damien Morton

unread,
Jun 6, 2008, 9:37:18 AM6/6/08
to Joe Armstrong, erlang-q...@erlang.org
On 6/6/2008 10:48 PM, Joe Armstrong wrote:
>
> Security has nothing to do with this argument - if a password is sent
> over the network it has been
> sent - nothing can alter that.
>
> A man in the middle might store the message forever so it would make
> no difference if the
> server stores the data for a millisecond or a trillion years.
>
> The reason for storing things in an appended log is to be able to
> replay the log later if things go wrong and
> recover from errors - it has nothing to do with security. Security and
> privacy has to do with the level of
> encryption that is applied to the items in the log.
>
> /Joe
>

Yes and no - security against legal attacks depends on the information
being deleted (irrecoverable) after a certain point.

Joe Armstrong

unread,
Jun 6, 2008, 11:00:35 AM6/6/08
to damien...@acm.org, erlang-q...@erlang.org
On Fri, Jun 6, 2008 at 3:37 PM, Damien Morton <dmo...@bitfurnace.com> wrote:
> On 6/6/2008 10:48 PM, Joe Armstrong wrote:
>>
>> Security has nothing to do with this argument - if a password is sent
>> over the network it has been
>> sent - nothing can alter that.
>>
>> A man in the middle might store the message forever so it would make
>> no difference if the
>> server stores the data for a millisecond or a trillion years.
>>
>> The reason for storing things in an appended log is to be able to
>> replay the log later if things go wrong and
>> recover from errors - it has nothing to do with security. Security and
>> privacy has to do with the level of
>> encryption that is applied to the items in the log.
>>
>> /Joe
>>
>
> Yes and no - security against legal attacks depends on the information being
> deleted (irrecoverable) after a certain point.
>

If the man in the middle took all your data then deleting the data is
irrelevant - anyway most attacks
are illegal

/Joe

Darren New

unread,
Jun 6, 2008, 12:19:52 PM6/6/08
to erlang-q...@erlang.org
Joe Armstrong wrote:
>> Yes and no - security against legal attacks depends on the
information being
>> deleted (irrecoverable) after a certain point.
>
> If the man in the middle took all your data then deleting the data is
> irrelevant - anyway most attacks
> are illegal

For MITM to work, the man has to be in the middle when the message is
sent. If you save things forever on your server, you present years worth
of potential value to anyone who breaks into your server.

This is exactly why your credit card has those three or four extra
printed digits on it. Merchants are contractually disallowed from
storing those longer than it takes to process the transaction, so anyone
who breaks into the server afterwards will be unable to use those cards
without the company doing additional checking.

This is why sshd changes its keys every few hours - old streams recorded
by attackers become unbreakable even if they manage to break into the
server.

I believe a "legal" attack was meant to imply a subpoena. Mr Gates, for
example, was quite brutalized by email messages he had written years
earlier that would not have been available to his detractors had he a
policy of purging any emails more than a few months old.

Encrypting the data only helps if the court is unable to compel you to
decrypt the data. And if the system is processing it, it must be
decrypted, so a hacker is going to be able to see it anyway.

--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."

Damien Morton

unread,
Jun 5, 2008, 3:40:11 PM6/5/08
to Christian S, Patrick Logan, erlang-q...@erlang.org
Makes one wonder where the database refactoring tools are :)

Damien Morton

unread,
Jun 5, 2008, 7:30:24 PM6/5/08
to Richard A. O'Keefe, erlang-q...@erlang.org
Umm, I think I would prefer it if my tweets didnt persist too long.

There's no expectation on the part of the users that they will be
retained, so imagine the surprise if one of their users ended up in
court only to be presented with a complete history of all their tweets.

No, in this case deletion is the right policy.

Tony Garnock-Jones

unread,
Jun 10, 2008, 10:50:33 AM6/10/08
to damien...@acm.org, erlang-q...@erlang.org
Damien Morton wrote:
> Makes one wonder where the database refactoring tools are :)

http://www.dabbledb.com/ :-)

Tony

Reply all
Reply to author
Forward
0 new messages