I haven't looked too deeply yet but it seems that all of the logic for
handling replication, resolving read inconsistencies and doing
read-repair is in the client... at least by default (there may be some
other option). So a sanity change in protocol would be insignificant
if you still have to write all of that code to handle the vector
clocks, efficiently speaking to several servers in parallel, etc. It's
not too hard to build all of this stuff, but it's too much of a hurdle
to get started and there's no way you could do it in something like
PHP without writing another daemon for it to talk to because you need
to do concurrency well to have a good implementation.
The protocol should definitely change, there seems to be a lot of them
and all of them are pretty dumb serializations, e.g. the "json"
protocol (which is nothing like JSON at all, actually) has an
arbitrary 32kb limit for string data since it uses a signed 16-bit
integer length prefix(!!). I can't figure out why you would want so
many different serializations beyond one reasonable way to store raw
bytes. I don't think the servers actually have any code that does
anything with the data beyond storage and retrieval so all of the
serialization and schema options don't make any sense to me. If you
had schema for the way the bytes were formatted I don't see any reason
why you couldn't just store it as a key and let the clients sort it
out rather than putting all of the limitations in the server.
> 2. Is it correct that when adding or removing servers to an existing
> pool of running servers, will require shutting down the entire pool,
> updating the cluster configuration on all servers, and getting them
> back online? Would it make sense to integrate some Paxos kind of
> configuration synchronization here ? (for example by integrating
> ZooKeeper http://hadoop.apache.org/zookeeper/ into Voldemort)
I have not seen any code in Voldemort that handles adding or removing
servers. There is a little bit of code that will rebalance your
cluster which is something you would need in this kind of event after
you've managed to update the configuration everywhere, but the code is
commented out and not referenced from anywhere.
> Other than that, Voldemort seems pretty neat! Are there any examples
> of the scale of deployment, for example within LinkedIn ? Are any
> other companies other than LinkedIn using Voldemort in a large-scale
> deployment ?
I've been looking at it for Mochi but honestly I will probably choose
another solution. I would be surprised if anyone is using Voldemort in
production for data that isn't transient. I build a proof of concept
client in Python to play with but it was a fair amount of work and it
would take a while longer to polish it up, write some proper tests,
and get the concurrency stuff right (all network stuff is currently
serialized, it only speaks to one server at a time).
-bob