[erlang-questions] Which distributed key-value storage do you use?

129 views
Skip to first unread message

Sergey Samokhin

unread,
Aug 11, 2009, 3:24:52 PM8/11/09
to Erlang-Questions
Hello!

Currently I'm using mesia to store detailed session information of
each user my system serves.

What worries me a bit is that as soon as my code gets into heavy load
in production and the number of transactions increases, performance of
mnesia might become an issue. Don't know exactly how heavy the load
will be, but I want my system to be ready for a few thousand of
transactions per second (with at least two machines involved in
replication).

I think there should be people who are using something other than
mnesia (e.g. Dynamo, Kai, Tokyo Cabinet etc) in production to store
session data in an efficient (=:= with partitioning) and distributed
manner and can share their impressions.

Which one have you chosen?

--
Sergey Samokhin

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Rapsey

unread,
Aug 11, 2009, 4:41:11 PM8/11/09
to erlang-q...@erlang.org
If key-value is enough, tokyo cabinet looks pretty good. If you want a
document store, I would highly suggest MongoDB.


Sergej

Tuncer Ayaz

unread,
Aug 11, 2009, 6:27:09 PM8/11/09
to Sergey Samokhin, Erlang-Questions
On Tue, Aug 11, 2009 at 9:24 PM, Sergey Samokhin<prik...@gmail.com> wrote:
> Hello!
>
> Currently I'm using mesia to store detailed session information of
> each user my system serves.
>
> What worries me a bit is that as soon as my code gets into heavy load
> in production and the number of transactions increases, performance of
> mnesia might become an issue. Don't know exactly how heavy the load
> will be, but I want my system to be ready for a few thousand of
> transactions per second (with at least two machines involved in
> replication).
>
> I think there should be people who are using something other than
> mnesia (e.g. Dynamo, Kai, Tokyo Cabinet etc) in production to store
> session data in an efficient (=:= with partitioning) and distributed
> manner and can share their impressions.

Also consider http://riak.basho.com.

Robert Raschke

unread,
Aug 12, 2009, 4:27:11 AM8/12/09
to Erlang-Questions
On Tue, Aug 11, 2009 at 8:24 PM, Sergey Samokhin <prik...@gmail.com>wrote:

> Hello!
>
> Currently I'm using mesia to store detailed session information of
> each user my system serves.
>
> What worries me a bit is that as soon as my code gets into heavy load
> in production and the number of transactions increases, performance of
> mnesia might become an issue. Don't know exactly how heavy the load
> will be, but I want my system to be ready for a few thousand of
> transactions per second (with at least two machines involved in
> replication).
>

Lots of unknown sentiment in that paragraph. Why are you looking for an
alternative technology if you don't even know that your current one is
insufficient?

I would recommend doing some measurements against your system. Creating some
form of stress test for your system will cost you as much as investigating
lots of funky new stuff. And then you will have some actual numbers to
compare and contrast.

Additionally, mnesia very helpfully tells you if it is starting to get
"overloaded".

Robby

Sergey Samokhin

unread,
Aug 12, 2009, 3:47:51 PM8/12/09
to Robert Raschke, Erlang-Questions
Hello.

> Lots of unknown sentiment in that paragraph. Why are you looking for an
> alternative technology if you don't even know that your current one is
> insufficient?

Don't get me wrong: I have already done some measurements and found
mnesia *quite efficient* for the near future. What I'm trying to find
out by posting messages here are other possible "routes to escape" if
things go wrong with mnesia (ok, I may be too prudent).

It's always a good idea to take into account more than one way to
solve a problem. The more ways to solve a problem you have, the better
one you end up with will be.

So, I've decided to ask what other dbs people here are using =) Soon
I'm going make a list of document oriented and Key-Value db with
bindings for Erlang and post it in this thread.

Ulf Wiger

unread,
Aug 12, 2009, 4:19:09 PM8/12/09
to Sergey Samokhin, Robert Raschke, Erlang-Questions

If you are using disc_copies, I think that thousands of
writes/sec is attainable.

Personally, I think that disc_only copies seldom have a
place nowadays, since they are limited to 2 GB, and
64-bit Erlang allows you to go way beyond that limit with
disc_copies and good performance.

There are some things to think about when going to 64 bit.
The main rule of thumb is that the initial switch will
nearly double the memory usage, unless you are using
binaries heavily. After that, you can just keep growing.

I once tested disc_copies up to about 13 GB on a 16 GB
machine. Performance was excellent.

http://erlang.org/pipermail/erlang-questions/2005-November/017728.html

BR,
Ulf W

Sergey Samokhin wrote:
> Hello.
>
>> Lots of unknown sentiment in that paragraph. Why are you looking for an
>> alternative technology if you don't even know that your current one is
>> insufficient?
>
> Don't get me wrong: I have already done some measurements and found
> mnesia *quite efficient* for the near future. What I'm trying to find
> out by posting messages here are other possible "routes to escape" if
> things go wrong with mnesia (ok, I may be too prudent).
>
> It's always a good idea to take into account more than one way to
> solve a problem. The more ways to solve a problem you have, the better
> one you end up with will be.
>
> So, I've decided to ask what other dbs people here are using =) Soon
> I'm going make a list of document oriented and Key-Value db with
> bindings for Erlang and post it in this thread.
>


--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com

Dave Smith

unread,
Aug 12, 2009, 4:28:37 PM8/12/09
to Ulf Wiger, Sergey Samokhin, Robert Raschke, Erlang-Questions
On Wed, Aug 12, 2009 at 2:19 PM, Ulf
Wiger<ulf....@erlang-consulting.com> wrote:
>
> Personally, I think that disc_only copies seldom have a
> place nowadays, since they are limited to 2 GB, and
> 64-bit Erlang allows you to go way beyond that limit with
> disc_copies and good performance.

I apologize if this is a stupid question, but I was under the
impression that using mnesia w/ disc_copies meant that your entire
data set has to fit in RAM? For most large-scale datasets, even with a
lot of RAM, that makes mnesia not really desirable as you'd get a low
density of data-to-node (i.e. I can put 1 TB of disk in a node but
probably can't afford that much RAM).

D.

Kunthar

unread,
Aug 12, 2009, 6:22:18 PM8/12/09
to erlang-q...@erlang.org
People newly coming to Erlang area mostly use dedicated hosting solutions.
Some other well formed corporations could afford to use some ultra
mega machines with several megabaytes of ram.
But the naked truth is reasonable dedicated solution starts with 1GB
RAM and goes up to 8GB ram.
More then this means additional trouble. Not only money but trouble.
You can prove yourself this, if you quickly check the hosting
companies around the world. RAM is not cheap enough. CPU is not
problem but ram is a dead end.
Thanks to Basho guys and all others that they can see the the real
world and they provide a real solutions.
dets is not efficient for some problem domains, for example if you
need to store 2GB/sec data, you're simply dead.
I really would like to see someone could organise this silly Ericcson
open source soap and create high traffic open source environment.
And then i really would like to see, dets evolved like Dynamo to solve
more problems better then others even Dynamo.
For now, we have to look other choices if the bill is not fit to our pocket.

Peace
\|/ Kunthar

Ulf Wiger

unread,
Aug 12, 2009, 7:30:29 PM8/12/09
to Dave Smith, Erlang-Questions
Dave Smith wrote:
> On Wed, Aug 12, 2009 at 2:19 PM, Ulf
> Wiger<ulf....@erlang-consulting.com> wrote:
>> Personally, I think that disc_only copies seldom have a
>> place nowadays, since they are limited to 2 GB, and
>> 64-bit Erlang allows you to go way beyond that limit with
>> disc_copies and good performance.
>
> I apologize if this is a stupid question, but I was under the
> impression that using mnesia w/ disc_copies meant that your entire
> data set has to fit in RAM? For most large-scale datasets, even with a
> lot of RAM, that makes mnesia not really desirable as you'd get a low
> density of data-to-node (i.e. I can put 1 TB of disk in a node but
> probably can't afford that much RAM).

I was just pointing out that given the 2 GB limit of dets,
disc_copies is almost always the better option.

Your entire data set doesn't have to fit in RAM on a single
node. Using table fragmentation, you can scale further.

Still, mnesia was not designed for TB databases, and if that
is what you need, you need to look for other solutions.

OTOH, the ODBC library allows you to connect to such DBMSs.

BR,
Ulf W


--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com

________________________________________________________________

Ulf Wiger

unread,
Aug 12, 2009, 7:50:42 PM8/12/09
to Kunthar, erlang-q...@erlang.org
Kunthar wrote:
>
> Thanks to Basho guys and all others that they can see the the real
> world and they provide a real solutions.
> dets is not efficient for some problem domains, for example if you
> need to store 2GB/sec data, you're simply dead.
> I really would like to see someone could organise this silly Ericcson
> open source soap and create high traffic open source environment.

You are of course free to move to any other solution, or
interface with any other database out there.

Mnesia may not meet your needs, but seeing as you don't
have to pay a dime for it, and Ericsson does not force you
to use it, I fail to see how this should inconvenience you
much.

The basho guys, the Dukes of Erl, et al, do marvellous work,
but focus on the stuff they need, just like Ericsson does.
That's a good strategy if you want to stay in business...

The wonderful thing about Open Source is that they let you share
the fruits of their labour (and expenses!). If it does what you
need, great. Otherwise, you are free to search for some other
community that offers you better components for free, or perhaps
even spend some money on some of the many commercially available
components out there. You can even do all of the above at the
same time.

BR,
Ulf W


--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com

________________________________________________________________

Benjamin Tolputt

unread,
Aug 12, 2009, 5:53:29 PM8/12/09
to Dave Smith, Ulf Wiger, Sergey Samokhin, Robert Raschke, Erlang-Questions
Dave Smith wrote:
> I apologize if this is a stupid question, but I was under the
> impression that using mnesia w/ disc_copies meant that your entire
> data set has to fit in RAM? For most large-scale datasets, even with a
> lot of RAM, that makes mnesia not really desirable as you'd get a low
> density of data-to-node (i.e. I can put 1 TB of disk in a node but
> probably can't afford that much RAM).
>

This was also my impression and is the reason I have avoided Mnesia to
date for anything but small, in-memory distributed databases (such as
keeping web session data and the like). I would love to be told I am
wrong here!

--
Regards,

Benjamin Tolputt
Analyst Programmer

Benjamin Tolputt

unread,
Aug 12, 2009, 5:53:14 PM8/12/09
to Dave Smith, Ulf Wiger, Sergey Samokhin, Robert Raschke, Erlang-Questions
Dave Smith wrote:
> I apologize if this is a stupid question, but I was under the
> impression that using mnesia w/ disc_copies meant that your entire
> data set has to fit in RAM? For most large-scale datasets, even with a
> lot of RAM, that makes mnesia not really desirable as you'd get a low
> density of data-to-node (i.e. I can put 1 TB of disk in a node but
> probably can't afford that much RAM).
>

This was also my impression and is the reason I have avoided Mnesia to


date for anything but small, in-memory distributed databases (such as
keeping web session data and the like). I would love to be told I am
wrong here!

--
Regards,

Benjamin Tolputt
Analyst Programmer
Mob: 0417 456 505
Email: btol...@bigpond.net.au

This email and any files transmitted with it are confidential to the
intended recipient and may be privileged. If you have received this
email inadvertently or you are not the intended recipient, you may not
disseminate, distribute, copy or in any way rely on it. Further, you
should notify the sender immediately and delete the email from your
computer. Whilst we have taken precautions to alert us to the presence
of computer viruses, we cannot guarantee that this email and any files
transmitted with it are free from such viruses.

Sergey Samokhin

unread,
Aug 17, 2009, 3:40:30 PM8/17/09
to Robert Raschke, Erlang-Questions
Hello!

> Soon I'm going make a list of document oriented and Key-Value db with
> bindings for Erlang and post it in this thread.

Here are listed distributed key-value storages I've found with Erlang
bindings (with no particular order):

1) Dynomite

http://github.com/cliffmoon/dynomite/tree/master

2) There are at least three interfaces for Tokyo Cabinet:

http://code.google.com/p/tcerl/
http://github.com/mccoy/medici/tree/master
http://github.com/mallipeddi/tora/tree/master (the last commit was in
February 2009)

3) Kai

http://sourceforge.net/projects/kai/

4) Ringo.

http://github.com/tuulos/ringo/tree/master

It seems that Ringo is no longer under active development. The last
commit was in December 17, 2008

5) Scalaris

http://code.google.com/p/scalaris/

6) Riak

http://riak.basho.com/

7) MongoDB

http://github.com/eliast/mongo-erlang-driver/tree/master

8) CouchDB

Search results on github:

http://github.com/search?type=Repositories&language=erlang&q=couchdb&repo=&langOverride=&x=0&y=0&start_value=1

9) MotionDB

http://github.com/dilshod/MotionDb/tree/master

Let me know if I missed anything.

Now it's time to test them =)

Zoltan Lajos Kis

unread,
Aug 17, 2009, 3:47:29 PM8/17/09
to Sergey Samokhin, Erlang-Questions

Jim McCoy

unread,
Aug 17, 2009, 4:46:42 PM8/17/09
to Sergey Samokhin, Robert Raschke, Erlang-Questions
[...]

> 2) There are at least three interfaces for Tokyo Cabinet:
>
> http://code.google.com/p/tcerl/
> http://github.com/mccoy/medici/tree/master
> http://github.com/mallipeddi/tora/tree/master (the last commit was in
> February 2009)

Just for reference, tcerl is an linked-in driver interface for Tokyo
Cabinet (the in-process db) while Harish and I provide interfaces to
Tokyo Tyrant (the network-enabled stand-alone version of tokyo
cabinet.) There are things that a "real" Tokyo Cabinet linked-in
driver would be able to do that Tyrant interfaces will never be able
to accomplish (e.g. iteration over all elements with any assurances
that you will actually hit all elements, etc) and it will always be
able to do it faster. The upside to Tyrant is that you can do
everything over the network.

Jim

Reply all
Reply to author
Forward
0 new messages