I hear this sentiment a lot, but I have trouble agreeing with it. The
fact that Mnesia has built-in replication is nice, but so does MySQL.
As well, Mnesia replicates full tables without any partitioning,
forcing you to do the sharding on your own just like any other regular
RDBMS. I'm not picking on Grant, since I hear this from basically
everyone, but I don't feel like Mnesia is much of a leg up in the
scalability department, esp. given its well-known size restrictions
(though tcerl looks to fix those).
What do others think? Has anyone scaled a stock Mnesia instance to
more than, say, 5 machines? How was it compared to a comparably
MySQL/PostgreSQL/Oracle scaling effort?
--
Toby DiPasquale
> What do others think? Has anyone scaled a stock Mnesia instance to
> more than, say, 5 machines? How was it compared to a comparably
> MySQL/PostgreSQL/Oracle scaling effort?
A nice collaborative effort would be to grab a few Amazon EC2
instances for an hour and pit MySQL against Mnesia in a scalability
match.
Me, I don't know enough about MySQL scalability and I distrust Mnesia
due to the network split and rejoin issue [1]. This issue, has been
proven to have no solution so Mnesia is unlikely to meet my needs.
Which is to say that I won't be using Mnesia as a backend for a web
site.
This has an interesting implication for me. If I'm forced to use MySQL
or PostgreSQL to back a web site built with Erlang and deal with
replication and scalability issues , then do I want to use Erlang at
all?
My current philosophy is that you can safely build a web site using
more mainstream tools and means like Ruby or Python. You can back this
web site with a regular RDBMS. You can then use Erlang as a black
box on the backend, e.g. a message broker (RabbitMQ), Jabber server
(ejabberd) or a scalable job manager.
This is what Bob Ippolito does at Mochi and I hope he will chime in.
Then there are the brave like the Dukes of Erl but they have a
different set of needs and can afford to discard split Mnesia nodes
together with their content.
> No offense taken, my erlang knowledge is more cerebral and less
> practical thus far ... I have read a lot, but coded very little ...
> I do remember, though, reading that the Dukes of Erl has addressed
> the 2GB limit w/ success ...
I wrote (the initial?) implementation for the Dukes [1] and that
implementation is inherently faulty, like any other attempt to add a
new backend to Mnesia.
[1] http://www.wagerlabs.com/blog/2008/06/mnesia-unlimited.html#more
Yes, you do get to go past 2Gb by using Tokyo Cabinet [2] but the
intractable issue is that any operation in the non-Mnesia backend runs
outside of Mnesia transaction management.
http://sourceforge.net/projects/tokyocabinet/
Mnesia assumes that the transaction succeeded by the time the custom
backend code is invoked to store the data in Tokyo Cabinet, S3, etc.
and there's absolutely no way to report errors back to Mnesia at this
point.
This is a fundamental issue that requires a significant Mnesia
rewrite to fix. Mnesia has to be made aware of the fact that it deals
with table types other than ram_copies, disc_copies and
disc_only_copies. I think that any rewrite of the scale needed here is
unlikely to make it into OTP for fear of compromising Mnesia stability.
I may be wrong but I did what I did since it was relatively
straightforward and fit within my client's budget.
Actually, it has _no solution_ in the general case so nothing is going
to meet your needs. To get around the CAP paradox you need to relax
one of the assumptions. The real question is which assumption should
be relaxed and I think that this is usually implementation-specific.
The mnesia developers chose consistency and availability, so
partition-tolerance is a goner. If you want to add back
partition-tolerance then you will need to add an additional layer that
sacrifices either consistency or availability in order to gain
partition-tolerance (I am fairly confident, but not certain, that you
can use a system that makes one CAP choice as a component in a system
which provides a different set of CAP choices.)
There are a couple of other interesting Erlang systems out that that
might be worth examining in the context of scaling data storage: Kai
and dynomite might be worth a look. Maybe a better topic for
discussion is what a system that sacrificed consistency or
availability might look like in Erlang and perhaps we could sketch out
a few broad outlines of how one might go about putting this together
using as many existing Erlang bits as possible?
> This has an interesting implication for me. If I'm forced to use MySQL
> or PostgreSQL to back a web site built with Erlang and deal with
> replication and scalability issues , then do I want to use Erlang at
> all?
It sounds like you are assuming MySQL and PostgreSQL offer something
beyond what mnesia offers in terms of general features, which I am
unable to discover. Having not really dug around in the mnesia code
the way you have I can't say one way or the other how easy it is to
tweak or manipulate certain mnesia operations, but its integration
into Erlang itself makes using it as a component of a system that does
what is desired than changing the way popular RDBMSs work.
> My current philosophy is that you can safely build a web site using
> more mainstream tools and means like Ruby or Python. You can back this
> web site with a regular RDBMS.
I guess the question to ask here is what do you think would have been
gained over using erlang and mnesia? With the single exception of
large-tables the mainstream RDBMS don't add anything to the equation.
jim
On Dec 10, 2008, at 6:21 PM, Jim McCoy wrote:
> The mnesia developers chose consistency and availability, so
> partition-tolerance is a goner. If you want to add back
> partition-tolerance then you will need to add an additional layer that
> sacrifices either consistency or availability in order to gain
> partition-tolerance
What would sacrificing consistency mean in this case? That you can
merge the data but it won't be quite right?
> There are a couple of other interesting Erlang systems out that that
> might be worth examining in the context of scaling data storage: Kai
> and dynomite might be worth a look.
Do you mean Amazon Dynamo? There's precious little information on it.
> Maybe a better topic for
> discussion is what a system that sacrificed consistency or
> availability might look like in Erlang and perhaps we could sketch out
> a few broad outlines of how one might go about putting this together
> using as many existing Erlang bits as possible?
Do you want to start the sketching?
> It sounds like you are assuming MySQL and PostgreSQL offer something
> beyond what mnesia offers in terms of general features, which I am
> unable to discover.
Let me add another variable into the mix. I have a project that I'm
trying to commercialize, a translator (compiler?) from a Pascal-like
trading language into C#. I wrote it using Allegro Common Lisp (3d
version, after Haskell and OCaml) which comes with AllegroCache, an
OODBMS. The translator is web-based so I'll need a web site and the
site will need a database and AllegroCache will be it.
What MySQL and PostgreSQL give me is ... SQL! Very fast and proven,
with a well known approach to optimization. A ton of tools I could use
to manipulate the data, e.g. reporting, admin, etc. Rapid development
of front-ends in Rails, Django, etc.
Should I go on?
The drawbacks of using an RDBMS with Erlang is that ... well, you
loose the replication capabilities built into Erlang. You need to rely
on the replication and clustering capabilities of those database
systems and manage them separately.
There's an element of speed that you loose as well, for nothing is
faster than keeping transient data in a ram_copies Mnesia table, i.e.
in memory. Mnesia is also well integrated into Erlang, e.g. there's no
need to translate results into Erlang records since results _are_
records.
Still, there are plenty of drawbacks to using Mnesia. Rather than
enumerate them, I'll point out a drawback that you may not have
thought of, at least from the OODBMS perspective.
With AllegroCache above I set up my schema as a bunch of objects so
there's no impedance mismatch. I get automatic versioning where code
that uses the old schema gets to see the old data and new code gets
hold of the added attributes. And, here's the kicker, if I want to
store a bunch of "children" into a parent object and then modify them,
I don't have to fetch the data for the whole chain into memory and
then write it back. I grab the parent and follow the pointer chain to
get to the data I need.
This is how Mnesia works since it makes use of tuples. Your Mnesia
schema is a bunch of records. Your records are tuples. You want to
store children in the parent, you store them as a list or tuple in the
parent record. You want to modify a child? You copy the whole chain,
potentially megabytes of data, before changing a few bytes and writing
the whole thing back.
Then there's the manual versioning of the schema. If you change your
data model with Mnesia then you need to provide a mechanism to upgrade
your data, manually. You need to write code to convert your old
records to your new ones. Neither MySQL, PostgreSQL or AllegroCache
require it.
Of course you wouldn't set up your parent/child relationship this way
with Mnesia but that makes it neither fish nor fowl -- you have the
schema and manual versioning but you don't have SQL and the associated
benefits. What about multi-table joins? You do have QLC but do you
know how to optimize it?
> I guess the question to ask here is what do you think would have been
> gained over using erlang and mnesia? With the single exception of
> large-tables the mainstream RDBMS don't add anything to the equation.
Please let me know if I have answered that question.
Thanks, Joel
Correct, sort of...
Consitency - pretty much the same as the C in ACID.
Availability - the data service is always available, if you can get to
one node you can run a transaction.
Partition-tolerance - the service will continue to operate if part of
it fails (and
A good example of a distributed system that relaxes the consistency
assurance is Amazon's S3 and SimpleDB services. If you make a write
to S3 from point A and later make a read from point B you might not
get back the data that you wrote from A. "Eventual consistency" is
the goal, but you couldn't run a bank on S3 buckets.
If you want distributed transactions then you almost always go for
consistency and availability. The downside to this is that partitions
kill you.
If you relaxed the consistency assurance then if you do a
post-partition merge most of the transactions would be just fine, but
some of them might not be and you need to have a protocol (usually in
the client-side business logic) to resolve the conflicts.
>> There are a couple of other interesting Erlang systems out that that
>> might be worth examining in the context of scaling data storage: Kai
>> and dynomite might be worth a look.
>
> Do you mean Amazon Dynamo? There's precious little information on it.
Not dynamo, dynomite. This was another simpledb clone similar to Kai
that Cliff Moon developed for Powerset that is now available on
github.
>> Maybe a better topic for
>> discussion is what a system that sacrificed consistency or
>> availability might look like in Erlang and perhaps we could sketch out
>> a few broad outlines of how one might go about putting this together
>> using as many existing Erlang bits as possible?
>
> Do you want to start the sketching?
That's a tall order. I will give it some thought and take a look at
the kai and dynomite stuff this weekend and see if I can start with an
overview of what might be available already.
> What MySQL and PostgreSQL give me is ... SQL! Very fast and proven,
> with a well known approach to optimization. A ton of tools I could use
> to manipulate the data, e.g. reporting, admin, etc. Rapid development
> of front-ends in Rails, Django, etc.
This is a big win, and one that I wrestle with occasionally. There
are a lot of supporting tools out there for the standard SQL model. I
think that this model eventually breaks when things get big, but most
people never need to worry about that part.
It sounds like AllegroCache is pretty ideal for your needs, and while
mnesia has a speed win for ram tables and a low impedence mismatch
this might not be enough to make it the tool for this job. I actually
picked up Erlang because someone pointed me towards mnesia back when I
was looking at distributed databases (at a time when master-master
replication in existing open sql engines was nothing more than a
pipe-dream and master-slave replication sucked as well) and I have
gone from the high of initial discovery through the trough of
disillusionment and am now coming back to thinking it is a specialized
tool that works well in its niche but in danger of being mis-applied
in the Erlang world simply because we don't have many other options...
jim
It sounds like AllegroCache is pretty ideal for your needs, and while
mnesia has a speed win for ram tables and a low impedence mismatch
this might not be enough to make it the tool for this job. I actually
picked up Erlang because someone pointed me towards mnesia back when I
was looking at distributed databases (at a time when master-master
replication in existing open sql engines was nothing more than a
pipe-dream and master-slave replication sucked as well) and I have
gone from the high of initial discovery through the trough of
disillusionment and am now coming back to thinking it is a specialized
tool that works well in its niche but in danger of being mis-applied
in the Erlang world simply because we don't have many other options...
> It sounds like AllegroCache is pretty ideal for your needs, and while
> mnesia has a speed win for ram tables and a low impedence mismatch
> this might not be enough to make it the tool for this job.
AllegroCache is missing scalability and replication. Oh, the irony!
I think I need to take a closer look at CouchDB. The only think I'm
biased against is the concept of external views as command-line tools
accessible via standard output.
> > The mnesia developers chose consistency and availability, so
> > partition-tolerance is a goner. If you want to add back
> > partition-tolerance then you will need to add an additional layer that
> > sacrifices either consistency or availability in order to gain
> > partition-tolerance
>
> What would sacrificing consistency mean in this case? That you can
> merge the data but it won't be quite right?
Not necessarily. One useful way to look at the CAP theorem is that
you can't guarantee consistency, availability, and partition-tolerance
all at the same moment -- but that doesn't have to mean that whichever
one you give up at any moment cannot be reclaimed shortly thereafter.
For instance, a data storage system might choose to be willing to
relax consistency for short periods of time in order to never lose
(write) availability and partition-tolerance. If you require perfect
continuous consistency, you cannot retain the other two. However, all
this means is that it is acceptable within system constraints for data
on two different nodes to diverge. There are many known techniques
which can be used to make such divergences brief, achieving "eventual"
(but usually very close to perfect) consistency.
I have worked on such a system, and the operational benefits that come
from these choices can be huge in the proper environment. If you use
a system (mnesia, mysql, etc) that cannot -- even briefly -- allow for
inconsistency, then you end up spending a lot of money as you grow
trying to pretend that you have both availability and partition
tolerance.
> > Maybe a better topic for
> > discussion is what a system that sacrificed consistency or
> > availability might look like in Erlang and perhaps we could sketch out
> > a few broad outlines of how one might go about putting this together
> > using as many existing Erlang bits as possible?
>
> Do you want to start the sketching?
While the system I mentioned above has not yet been released as source
code, it is in production use. I have posted some small modules of
Erlang code that can be used either as building blocks or as example
code for some of the parts of such a system:
http://code.google.com/p/distributerl/
-Justin