Mnesia scales?

380 views
Skip to first unread message

Toby DiPasquale

unread,
Dec 10, 2008, 7:58:19 AM12/10/08
to think...@googlegroups.com
On Wed, Dec 10, 2008 at 7:51 AM, grant michaels
<grantm...@hotmail.com> wrote:
> 2) since Erlang/Mnesia (or CouchDB) scale out so nicely relatively speaking ...

I hear this sentiment a lot, but I have trouble agreeing with it. The
fact that Mnesia has built-in replication is nice, but so does MySQL.
As well, Mnesia replicates full tables without any partitioning,
forcing you to do the sharding on your own just like any other regular
RDBMS. I'm not picking on Grant, since I hear this from basically
everyone, but I don't feel like Mnesia is much of a leg up in the
scalability department, esp. given its well-known size restrictions
(though tcerl looks to fix those).

What do others think? Has anyone scaled a stock Mnesia instance to
more than, say, 5 machines? How was it compared to a comparably
MySQL/PostgreSQL/Oracle scaling effort?

--
Toby DiPasquale

grant michaels

unread,
Dec 10, 2008, 9:04:14 AM12/10/08
to think...@googlegroups.com
No offense taken, my erlang knowledge is more cerebral and less practical thus far ... I have read a lot, but coded very little ... I do remember, though, reading that the Dukes of Erl has addressed the 2GB limit w/ success ...

best personal regards,
 
-[ grantmichaels ]-

> Date: Wed, 10 Dec 2008 07:58:19 -0500
> From: codes...@gmail.com
> To: think...@googlegroups.com
> Subject: Mnesia scales?

Joel Reymont

unread,
Dec 10, 2008, 11:12:27 AM12/10/08
to think...@googlegroups.com, Bob Ippolito, Paul Mineiro

On Dec 10, 2008, at 12:58 PM, Toby DiPasquale wrote:

> What do others think? Has anyone scaled a stock Mnesia instance to
> more than, say, 5 machines? How was it compared to a comparably
> MySQL/PostgreSQL/Oracle scaling effort?


A nice collaborative effort would be to grab a few Amazon EC2
instances for an hour and pit MySQL against Mnesia in a scalability
match.

Me, I don't know enough about MySQL scalability and I distrust Mnesia
due to the network split and rejoin issue [1]. This issue, has been
proven to have no solution so Mnesia is unlikely to meet my needs.
Which is to say that I won't be using Mnesia as a backend for a web
site.

[1] http://is.gd/b0ce

This has an interesting implication for me. If I'm forced to use MySQL
or PostgreSQL to back a web site built with Erlang and deal with
replication and scalability issues , then do I want to use Erlang at
all?

My current philosophy is that you can safely build a web site using
more mainstream tools and means like Ruby or Python. You can back this
web site with a regular RDBMS. You can then use Erlang as a black
box on the backend, e.g. a message broker (RabbitMQ), Jabber server
(ejabberd) or a scalable job manager.

This is what Bob Ippolito does at Mochi and I hope he will chime in.
Then there are the brave like the Dukes of Erl but they have a
different set of needs and can afford to discard split Mnesia nodes
together with their content.

--
http://wagerlabs.com

Joel Reymont

unread,
Dec 10, 2008, 11:21:49 AM12/10/08
to think...@googlegroups.com

On Dec 10, 2008, at 2:04 PM, grant michaels wrote:

> No offense taken, my erlang knowledge is more cerebral and less
> practical thus far ... I have read a lot, but coded very little ...
> I do remember, though, reading that the Dukes of Erl has addressed
> the 2GB limit w/ success ...


I wrote (the initial?) implementation for the Dukes [1] and that
implementation is inherently faulty, like any other attempt to add a
new backend to Mnesia.

[1] http://www.wagerlabs.com/blog/2008/06/mnesia-unlimited.html#more

Yes, you do get to go past 2Gb by using Tokyo Cabinet [2] but the
intractable issue is that any operation in the non-Mnesia backend runs
outside of Mnesia transaction management.

http://sourceforge.net/projects/tokyocabinet/

Mnesia assumes that the transaction succeeded by the time the custom
backend code is invoked to store the data in Tokyo Cabinet, S3, etc.
and there's absolutely no way to report errors back to Mnesia at this
point.

This is a fundamental issue that requires a significant Mnesia
rewrite to fix. Mnesia has to be made aware of the fact that it deals
with table types other than ram_copies, disc_copies and
disc_only_copies. I think that any rewrite of the scale needed here is
unlikely to make it into OTP for fear of compromising Mnesia stability.

I may be wrong but I did what I did since it was relatively
straightforward and fit within my client's budget.

--
http://wagerlabs.com


Matthew Kanwisher

unread,
Dec 10, 2008, 11:26:06 AM12/10/08
to think...@googlegroups.com
I'm curious what kind of application needs to scale out mysql out to 5-6 nodes, you can get to a pretty large scale off a single db and scale it out with in memory replication like memcache or jboss jgroups. Is there any way to use mmnesia as an in memory cache like those other technologies and write it ultimately to a standard db?

~Mattt

grant michaels

unread,
Dec 10, 2008, 12:21:37 PM12/10/08
to think...@googlegroups.com
joel -

interesting that you would pen this right now, because, during the couple extra weeks between when i signed up for your initial for-profit project and when you first published it, i decided that i would head towards merb and couchdb and leverage their building upon erlang (nanite, vertebra, etc) so that i didn't have to concede mental bandwidth to learning erlang concurrently to my learning ruby ...

that being said, it's still best to understand what is going on w/ any tool you are using, but in this case, i'm going to let the merb and couchdb commiters worry about erlang while i write ruby ... i am still actively following erlang, but found myself unable to dedicate myself to making 'normal' webapps in it w/o ruby/python etc, if for no other reason than the prevalence of useful libraries around ruby and the lack of beginner/intermediate level documentation for the erlang projects ... i would love to use webmachine/mochi, but there is precious little read to learn, and the guys - while nice, and helpful if you can ask specific questions - just aren't putting time into documentation ... it's hard to learn about something from code if you are learning the language at the same time ...

also, at the end of the day, none of my ideas warrant the dedication to fault-tolerance that is in discussion here ... it's just not that important, and nothing is really mission critical ... in the end i decided to pick the tools that would make the process easiest for a solo developer, tools with communities behind them that i thought would make the projects the most fun ...

the python guys were kind of boring and the rails people were like middleschoolers ... that being said, the community around merb, couchdb, and jruby seemed to be just right ... furthermore, scaling JVM's is anything but trailblazing and is well-documented ...

so, for me, for the time while i'm responsible for creating all facets of my projects, it seems wisest to use SWF clients w/ merb/jruby/couchdb backends ...

if something takes off and ruby can't scale, at least i'll be somewhat knowledgeable re: erlang for following here, and wil have been talking to all the right people if and when i need to introduce erlang - but, until then ...


best personal regards,
 
-[ grantmichaels ]-

Jim McCoy

unread,
Dec 10, 2008, 1:21:25 PM12/10/08
to think...@googlegroups.com
On Wed, Dec 10, 2008 at 8:12 AM, Joel Reymont <joe...@gmail.com> wrote:
>[...]

> Me, I don't know enough about MySQL scalability and I distrust Mnesia
> due to the network split and rejoin issue [1]. This issue, has been
> proven to have no solution so Mnesia is unlikely to meet my needs.

Actually, it has _no solution_ in the general case so nothing is going
to meet your needs. To get around the CAP paradox you need to relax
one of the assumptions. The real question is which assumption should
be relaxed and I think that this is usually implementation-specific.
The mnesia developers chose consistency and availability, so
partition-tolerance is a goner. If you want to add back
partition-tolerance then you will need to add an additional layer that
sacrifices either consistency or availability in order to gain
partition-tolerance (I am fairly confident, but not certain, that you
can use a system that makes one CAP choice as a component in a system
which provides a different set of CAP choices.)

There are a couple of other interesting Erlang systems out that that
might be worth examining in the context of scaling data storage: Kai
and dynomite might be worth a look. Maybe a better topic for
discussion is what a system that sacrificed consistency or
availability might look like in Erlang and perhaps we could sketch out
a few broad outlines of how one might go about putting this together
using as many existing Erlang bits as possible?

> This has an interesting implication for me. If I'm forced to use MySQL
> or PostgreSQL to back a web site built with Erlang and deal with
> replication and scalability issues , then do I want to use Erlang at
> all?

It sounds like you are assuming MySQL and PostgreSQL offer something
beyond what mnesia offers in terms of general features, which I am
unable to discover. Having not really dug around in the mnesia code
the way you have I can't say one way or the other how easy it is to
tweak or manipulate certain mnesia operations, but its integration
into Erlang itself makes using it as a component of a system that does
what is desired than changing the way popular RDBMSs work.

> My current philosophy is that you can safely build a web site using
> more mainstream tools and means like Ruby or Python. You can back this
> web site with a regular RDBMS.

I guess the question to ask here is what do you think would have been
gained over using erlang and mnesia? With the single exception of
large-tables the mainstream RDBMS don't add anything to the equation.

jim

Joel Reymont

unread,
Dec 10, 2008, 4:50:22 PM12/10/08
to think...@googlegroups.com
Jim,

On Dec 10, 2008, at 6:21 PM, Jim McCoy wrote:

> The mnesia developers chose consistency and availability, so
> partition-tolerance is a goner. If you want to add back
> partition-tolerance then you will need to add an additional layer that
> sacrifices either consistency or availability in order to gain
> partition-tolerance

What would sacrificing consistency mean in this case? That you can
merge the data but it won't be quite right?

> There are a couple of other interesting Erlang systems out that that
> might be worth examining in the context of scaling data storage: Kai
> and dynomite might be worth a look.

Do you mean Amazon Dynamo? There's precious little information on it.

> Maybe a better topic for
> discussion is what a system that sacrificed consistency or
> availability might look like in Erlang and perhaps we could sketch out
> a few broad outlines of how one might go about putting this together
> using as many existing Erlang bits as possible?

Do you want to start the sketching?

> It sounds like you are assuming MySQL and PostgreSQL offer something
> beyond what mnesia offers in terms of general features, which I am
> unable to discover.

Let me add another variable into the mix. I have a project that I'm
trying to commercialize, a translator (compiler?) from a Pascal-like
trading language into C#. I wrote it using Allegro Common Lisp (3d
version, after Haskell and OCaml) which comes with AllegroCache, an
OODBMS. The translator is web-based so I'll need a web site and the
site will need a database and AllegroCache will be it.

What MySQL and PostgreSQL give me is ... SQL! Very fast and proven,
with a well known approach to optimization. A ton of tools I could use
to manipulate the data, e.g. reporting, admin, etc. Rapid development
of front-ends in Rails, Django, etc.

Should I go on?

The drawbacks of using an RDBMS with Erlang is that ... well, you
loose the replication capabilities built into Erlang. You need to rely
on the replication and clustering capabilities of those database
systems and manage them separately.

There's an element of speed that you loose as well, for nothing is
faster than keeping transient data in a ram_copies Mnesia table, i.e.
in memory. Mnesia is also well integrated into Erlang, e.g. there's no
need to translate results into Erlang records since results _are_
records.

Still, there are plenty of drawbacks to using Mnesia. Rather than
enumerate them, I'll point out a drawback that you may not have
thought of, at least from the OODBMS perspective.

With AllegroCache above I set up my schema as a bunch of objects so
there's no impedance mismatch. I get automatic versioning where code
that uses the old schema gets to see the old data and new code gets
hold of the added attributes. And, here's the kicker, if I want to
store a bunch of "children" into a parent object and then modify them,
I don't have to fetch the data for the whole chain into memory and
then write it back. I grab the parent and follow the pointer chain to
get to the data I need.

This is how Mnesia works since it makes use of tuples. Your Mnesia
schema is a bunch of records. Your records are tuples. You want to
store children in the parent, you store them as a list or tuple in the
parent record. You want to modify a child? You copy the whole chain,
potentially megabytes of data, before changing a few bytes and writing
the whole thing back.

Then there's the manual versioning of the schema. If you change your
data model with Mnesia then you need to provide a mechanism to upgrade
your data, manually. You need to write code to convert your old
records to your new ones. Neither MySQL, PostgreSQL or AllegroCache
require it.

Of course you wouldn't set up your parent/child relationship this way
with Mnesia but that makes it neither fish nor fowl -- you have the
schema and manual versioning but you don't have SQL and the associated
benefits. What about multi-table joins? You do have QLC but do you
know how to optimize it?

> I guess the question to ask here is what do you think would have been
> gained over using erlang and mnesia? With the single exception of
> large-tables the mainstream RDBMS don't add anything to the equation.


Please let me know if I have answered that question.

Thanks, Joel

--
http://wagerlabs.com

Jim McCoy

unread,
Dec 11, 2008, 12:33:35 AM12/11/08
to think...@googlegroups.com
On Wed, Dec 10, 2008 at 1:50 PM, Joel Reymont <joe...@gmail.com> wrote:
>
>> The mnesia developers chose consistency and availability, so
>> partition-tolerance is a goner. If you want to add back
>> partition-tolerance then you will need to add an additional layer that
>> sacrifices either consistency or availability in order to gain
>> partition-tolerance
> What would sacrificing consistency mean in this case? That you can
> merge the data but it won't be quite right?

Correct, sort of...

Consitency - pretty much the same as the C in ACID.
Availability - the data service is always available, if you can get to
one node you can run a transaction.
Partition-tolerance - the service will continue to operate if part of
it fails (and

A good example of a distributed system that relaxes the consistency
assurance is Amazon's S3 and SimpleDB services. If you make a write
to S3 from point A and later make a read from point B you might not
get back the data that you wrote from A. "Eventual consistency" is
the goal, but you couldn't run a bank on S3 buckets.

If you want distributed transactions then you almost always go for
consistency and availability. The downside to this is that partitions
kill you.

If you relaxed the consistency assurance then if you do a
post-partition merge most of the transactions would be just fine, but
some of them might not be and you need to have a protocol (usually in
the client-side business logic) to resolve the conflicts.

>> There are a couple of other interesting Erlang systems out that that
>> might be worth examining in the context of scaling data storage: Kai
>> and dynomite might be worth a look.
>
> Do you mean Amazon Dynamo? There's precious little information on it.

Not dynamo, dynomite. This was another simpledb clone similar to Kai
that Cliff Moon developed for Powerset that is now available on
github.

>> Maybe a better topic for
>> discussion is what a system that sacrificed consistency or
>> availability might look like in Erlang and perhaps we could sketch out
>> a few broad outlines of how one might go about putting this together
>> using as many existing Erlang bits as possible?
>
> Do you want to start the sketching?

That's a tall order. I will give it some thought and take a look at
the kai and dynomite stuff this weekend and see if I can start with an
overview of what might be available already.


> What MySQL and PostgreSQL give me is ... SQL! Very fast and proven,
> with a well known approach to optimization. A ton of tools I could use
> to manipulate the data, e.g. reporting, admin, etc. Rapid development
> of front-ends in Rails, Django, etc.

This is a big win, and one that I wrestle with occasionally. There
are a lot of supporting tools out there for the standard SQL model. I
think that this model eventually breaks when things get big, but most
people never need to worry about that part.

It sounds like AllegroCache is pretty ideal for your needs, and while
mnesia has a speed win for ram tables and a low impedence mismatch
this might not be enough to make it the tool for this job. I actually
picked up Erlang because someone pointed me towards mnesia back when I
was looking at distributed databases (at a time when master-master
replication in existing open sql engines was nothing more than a
pipe-dream and master-slave replication sucked as well) and I have
gone from the high of initial discovery through the trough of
disillusionment and am now coming back to thinking it is a specialized
tool that works well in its niche but in danger of being mis-applied
in the Erlang world simply because we don't have many other options...

jim

MS

unread,
Dec 11, 2008, 2:00:14 AM12/11/08
to think...@googlegroups.com


On Thu, Dec 11, 2008 at 6:33 AM, Jim McCoy <jim....@gmail.com> wrote:
[snip]

It sounds like AllegroCache is pretty ideal for your needs, and while
mnesia has a speed win for ram tables and a low impedence mismatch
this might not be enough to make it the tool for this job.  I actually
picked up Erlang because someone pointed me towards mnesia back when I
was looking at distributed databases (at a time when master-master
replication in existing open sql engines was nothing more than a
pipe-dream and master-slave replication sucked as well) and I have
gone from the high of initial discovery through the trough of
disillusionment and am now coming back to thinking it is a specialized
tool that works well in its niche but in danger of being mis-applied
in the Erlang world simply because we don't have many other options...
Great summary what is happening in the erlang world currently, imho.

I haven't tested CouchDB, but from what I can read at couchdb.org, it seems like couchdb would somewhat fit the requirements: multi-master-replication, conflict resolution, etc.?

If it doesn't fit the requirements, well then let's hack mnesia or build another database system?!


Martin

Joel Reymont

unread,
Dec 11, 2008, 9:09:01 AM12/11/08
to think...@googlegroups.com

On Dec 11, 2008, at 5:33 AM, Jim McCoy wrote:

> It sounds like AllegroCache is pretty ideal for your needs, and while
> mnesia has a speed win for ram tables and a low impedence mismatch
> this might not be enough to make it the tool for this job.

AllegroCache is missing scalability and replication. Oh, the irony!

I think I need to take a closer look at CouchDB. The only think I'm
biased against is the concept of external views as command-line tools
accessible via standard output.

--
http://wagerlabs.com

Michele Sciabarra

unread,
Dec 11, 2008, 9:19:08 AM12/11/08
to think...@googlegroups.com
Joel Reymont ha scritto:
Well, I was concerned the same. Anyway, after more careful consideration:
- using an external command is the way erlang does communications with
external tool
- the result of the indexing is stored in the db so it should not affect
performance
- javascript is way more well known than erlang
- also, because the output is json, it make sense that a view is defined
in javascript (after all, json IS the javascript notation, and IS the
scripting language of the web)
- you can also plug another scripting engine. I see no reason to avoid
to write a escript view for example
- I think is possibile (of course) to bypass the excecuton and write
some views in Erlang itself, but I am not convinced that this can be the
best thing to do
- slighly better could be the idea to compile javascript in erlang...
There is a project on the web (by Roberto Saccon) to do such a thing:
http://code.google.com/p/erlyjs/
Someday it can be a replacement for the externally spawned javascript
interpreter. Although I am not so sure that this can be a great
improvement in performance.

Justin Sheehy

unread,
Dec 13, 2008, 4:00:23 PM12/13/08
to think...@googlegroups.com
On Dec 10, 4:50 pm, Joel Reymont <joe...@gmail.com> wrote:

> > The mnesia developers chose consistency and availability, so
> > partition-tolerance is a goner. If you want to add back
> > partition-tolerance then you will need to add an additional layer that
> > sacrifices either consistency or availability in order to gain
> > partition-tolerance
>
> What would sacrificing consistency mean in this case? That you can
> merge the data but it won't be quite right?

Not necessarily. One useful way to look at the CAP theorem is that
you can't guarantee consistency, availability, and partition-tolerance
all at the same moment -- but that doesn't have to mean that whichever
one you give up at any moment cannot be reclaimed shortly thereafter.

For instance, a data storage system might choose to be willing to
relax consistency for short periods of time in order to never lose
(write) availability and partition-tolerance. If you require perfect
continuous consistency, you cannot retain the other two. However, all
this means is that it is acceptable within system constraints for data
on two different nodes to diverge. There are many known techniques
which can be used to make such divergences brief, achieving "eventual"
(but usually very close to perfect) consistency.

I have worked on such a system, and the operational benefits that come
from these choices can be huge in the proper environment. If you use
a system (mnesia, mysql, etc) that cannot -- even briefly -- allow for
inconsistency, then you end up spending a lot of money as you grow
trying to pretend that you have both availability and partition
tolerance.

> > Maybe a better topic for
> > discussion is what a system that sacrificed consistency or
> > availability might look like in Erlang and perhaps we could sketch out
> > a few broad outlines of how one might go about putting this together
> > using as many existing Erlang bits as possible?
>
> Do you want to start the sketching?

While the system I mentioned above has not yet been released as source
code, it is in production use. I have posted some small modules of
Erlang code that can be used either as building blocks or as example
code for some of the parts of such a system:

http://code.google.com/p/distributerl/

-Justin

Reply all
Reply to author
Forward
0 new messages