Why do you expect the file activity of virtual memory to be any faster than a
traditional database designed to use files efficiently? I'd expect the
opposite. You should generally assume that the disk head is going to be halfway
across the disk from the data you want and add up the seek time it will take to
get there. On the other hand, using real memory across several machines is very
fast.
--
Les Mikesell
lesmi...@gmail.com
membase is compatible with memcached protocol, has a 20MByte default
object size limit, lets you define memory and disk usage across nodes in
different "buckets".
memcacheDB is challenging to deploy for a few reasons, one of which is
that the topology is fixed at deployment time.
- Matt
p.s.: full disclosure: I'm one of the membase guys
Does anyone know how these would compare to 'riak', a distributed
database that can do redundancy with some fault tolerance and knows how
to rebalance the storage across nodes when they are added or removed?
(Other than the different client interface...).
--
Les Mikesell
lesmi...@gmail.com
This is a very detailed question, but...
Without going too much into advocacy (I'd defer you to the membase
list/site), membase does have redundancy, fault tolerance and can
rebalance when nodes are added and removed. The interface to membase is
memcached protocol. It does so by making sure there is an authoritative
place for any given piece of data at any given point in time. That
doesn't mean data's not replicated or persisted, just that there are
rules about the state changes for a given piece of data based on vbucket
hashing and a shared configuration.
This was actually inspired by similar concepts that in memcached's
codebase up through the early 1.2.x, but not in use in anywhere that I'm
familiar with.
riak is more designed around eventually consistent and lots of tuning
W+R>N, meaning that it is designed more to always take writes and deal
with consistency for reads by doing multiple reads. This is different
than memcached in that memcached expects one and only one location for a
given piece of data with a given topology. If the topology changes
(node failures, additions), things like consistent hashing dictate a new
place, but there aren't multiple places to write to.
Any time you accept concurrent writes in more than one place, you have
to deal with conflict resolution. In some cases this means dealing with
it at the application level.
I don't know it well, but it's my understanding that MemcacheDB is
really just memcached with disk (BDB, IIRC) in place of memory on the
back end. This has been done a few different times and in a few
different ways. Topology changes are the killers here. Consistent
hashing can't really help you deal with changes in this kind of deployment.
- Matt