The logic for routing, reconciliation, failure detection, etc. does
not make assumptions about whether it is on the client or the server.
You are correct that the current wiring-up of layers is for client
side routing, but there is no limitation that says that this must be
so. There is a very real trade-off of efficiency vs. client
simplicity, so it is good to support both models (this is what dynamo
does as well). I completely agree that non-jvm clients should no try
to do any of this, they should just send serialized requests to
servers for routing. This should not require any handling of
concurrency on the client.
If you don't like the json format that is fine, you need not use it. I
agree the name is not perfect, if you have a better idea then maybe we
can change it. I do think the system should maintain data about the
format of bytes, my experience is that free-form bytes create a lot
more problems then they solve. That said there is a identity
serialization type that just gives byte arrays with no guidance if
that is what you are looking for. I agree that arbitrary limitations
on sizes are pretty lame, but that is true of many database types (not
an excuse, just a fact).
As for whether you should use it in production, I would think hard
about any new storage system that isn't MySQL, Postgres, or Oracle,
etc. There are trade-offs of performance and scalability vs. stability
and track-record. If you think other solutions are better that may be
true. My personal opinion is that storage systems are developed on a
much longer time-horizon than things like web-frameworks, and
absolutely every free entry in the distributed storage space is very
alpha. LinkedIn is doing a number of production things using Voldemort
but we are very careful about risk exposure. The current uses are for
problems that have high-scalability requirements but lower
The front page of the website says "It is still a new system which has
rough edges, bad error messages, and probably plenty of uncaught
bugs", and that is true. The focus for us is moving it up the ladder
of trustworthiness and making it a better system, usable from more
languages, with fewer arbitrary limitations and bugs. If it isn't
where you need it to be now, then that is completely understandable.
On Thu, Feb 12, 2009 at 9:04 AM, Bob Ippolito <b...
> On Thu, Feb 12, 2009 at 12:34 AM, Leon Mergen <l...@solatis.com> wrote:
>> With all the distributed key/value store projects out there nowadays,
>> it's hard to see the forest by the trees, but it looks like Project
>> Voldemort is the most suitable for my needs (plain ol' distributed key/
>> value store, reliable and "unlimited" scalability).
>> I was wondering whether I was correct on these two issues, since I
>> couldn't find it explicitly in the docs:
>> 1. The Client API currently only is Java, and if you're using it from
>> any other language than Java, you will have to roll out your own
>> integration solution. And if this is so, my next question is: how much
>> effort do you think it's going to be to integrate something like
>> ProtocolBuffers or Thrift into Voldemort? It seems only natural to me
>> to support such a solution, and as far as I can see it would be
>> somewhat straightforward to implement (although versioning might
>> require a bit of delicacy to support with this, but this is an
>> absolute requirement for me to prevent race conditions). Needless to
>> say I would be willing to look into this myself too.
> I haven't looked too deeply yet but it seems that all of the logic for
> handling replication, resolving read inconsistencies and doing
> read-repair is in the client... at least by default (there may be some
> other option). So a sanity change in protocol would be insignificant
> if you still have to write all of that code to handle the vector
> clocks, efficiently speaking to several servers in parallel, etc. It's
> not too hard to build all of this stuff, but it's too much of a hurdle
> to get started and there's no way you could do it in something like
> PHP without writing another daemon for it to talk to because you need
> to do concurrency well to have a good implementation.
> The protocol should definitely change, there seems to be a lot of them
> and all of them are pretty dumb serializations, e.g. the "json"
> protocol (which is nothing like JSON at all, actually) has an
> arbitrary 32kb limit for string data since it uses a signed 16-bit
> integer length prefix(!!). I can't figure out why you would want so
> many different serializations beyond one reasonable way to store raw
> bytes. I don't think the servers actually have any code that does
> anything with the data beyond storage and retrieval so all of the
> serialization and schema options don't make any sense to me. If you
> had schema for the way the bytes were formatted I don't see any reason
> why you couldn't just store it as a key and let the clients sort it
> out rather than putting all of the limitations in the server.
>> 2. Is it correct that when adding or removing servers to an existing
>> pool of running servers, will require shutting down the entire pool,
>> updating the cluster configuration on all servers, and getting them
>> back online? Would it make sense to integrate some Paxos kind of
>> configuration synchronization here ? (for example by integrating
>> ZooKeeper http://hadoop.apache.org/zookeeper/ into Voldemort)
> I have not seen any code in Voldemort that handles adding or removing
> servers. There is a little bit of code that will rebalance your
> cluster which is something you would need in this kind of event after
> you've managed to update the configuration everywhere, but the code is
> commented out and not referenced from anywhere.
>> Other than that, Voldemort seems pretty neat! Are there any examples
>> of the scale of deployment, for example within LinkedIn ? Are any
>> other companies other than LinkedIn using Voldemort in a large-scale
>> deployment ?
> I've been looking at it for Mochi but honestly I will probably choose
> another solution. I would be surprised if anyone is using Voldemort in
> production for data that isn't transient. I build a proof of concept
> client in Python to play with but it was a fair amount of work and it
> would take a while longer to polish it up, write some proper tests,
> and get the concurrency stuff right (all network stuff is currently
> serialized, it only speaks to one server at a time).