Received: by 10.114.103.1 with SMTP id a1mr224396wac.18.1234460156618; Thu, 12 Feb 2009 09:35:56 -0800 (PST) Return-Path: Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.229]) by mx.google.com with ESMTP id m37si236528waf.2.2009.02.12.09.35.56; Thu, 12 Feb 2009 09:35:56 -0800 (PST) Received-SPF: neutral (google.com: 209.85.198.229 is neither permitted nor denied by best guess record for domain of b...@redivi.com) client-ip=209.85.198.229; Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.198.229 is neither permitted nor denied by best guess record for domain of b...@redivi.com) smtp.mail=...@redivi.com Received: by rv-out-0506.google.com with SMTP id b25so388694rvf.39 for ; Thu, 12 Feb 2009 09:35:56 -0800 (PST) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Received: by 10.141.209.6 with SMTP id l6mr1685315rvq.2.1234460156145; Thu, 12 Feb 2009 09:35:56 -0800 (PST) In-Reply-To: <49945D03.2010108@gmail.com> References: <6a36e7290902120904r49886ads245a4cc7aeb5c1ed@mail.gmail.com> <49945D03.2010108@gmail.com> Date: Thu, 12 Feb 2009 09:35:56 -0800 Message-ID: <6a36e7290902120935w7084562at74e6dc10aa3bc1e4@mail.gmail.com> Subject: Re: [project-voldemort] Re: Client API and adding/removing servers From: Bob Ippolito To: project-voldemort@googlegroups.com I haven't yet looked at Dynomite. It's on my list of Erlang implementations to look closely at along with Ringo and Kai. On Thu, Feb 12, 2009 at 9:31 AM, Cliff Moon wrote: > > Have you looked at Dynomite? I'm the author, and I can tell you that it > currently supports thrift clients, dynamic adding of nodes, and it keeps > all of the read repair, replication, and concurrency in the server, > keeping the client code as simple as possible. I don't want to be that > guy who shills for his own project in a competing project's mailing > list, but it really seems like it might fit your requirements better. > > Bob Ippolito wrote: >> On Thu, Feb 12, 2009 at 12:34 AM, Leon Mergen wrote: >> >>> Hello, >>> >>> With all the distributed key/value store projects out there nowadays, >>> it's hard to see the forest by the trees, but it looks like Project >>> Voldemort is the most suitable for my needs (plain ol' distributed key/ >>> value store, reliable and "unlimited" scalability). >>> >>> I was wondering whether I was correct on these two issues, since I >>> couldn't find it explicitly in the docs: >>> >>> >>> 1. The Client API currently only is Java, and if you're using it from >>> any other language than Java, you will have to roll out your own >>> integration solution. And if this is so, my next question is: how much >>> effort do you think it's going to be to integrate something like >>> ProtocolBuffers or Thrift into Voldemort? It seems only natural to me >>> to support such a solution, and as far as I can see it would be >>> somewhat straightforward to implement (although versioning might >>> require a bit of delicacy to support with this, but this is an >>> absolute requirement for me to prevent race conditions). Needless to >>> say I would be willing to look into this myself too. >>> >> >> I haven't looked too deeply yet but it seems that all of the logic for >> handling replication, resolving read inconsistencies and doing >> read-repair is in the client... at least by default (there may be some >> other option). So a sanity change in protocol would be insignificant >> if you still have to write all of that code to handle the vector >> clocks, efficiently speaking to several servers in parallel, etc. It's >> not too hard to build all of this stuff, but it's too much of a hurdle >> to get started and there's no way you could do it in something like >> PHP without writing another daemon for it to talk to because you need >> to do concurrency well to have a good implementation. >> >> The protocol should definitely change, there seems to be a lot of them >> and all of them are pretty dumb serializations, e.g. the "json" >> protocol (which is nothing like JSON at all, actually) has an >> arbitrary 32kb limit for string data since it uses a signed 16-bit >> integer length prefix(!!). I can't figure out why you would want so >> many different serializations beyond one reasonable way to store raw >> bytes. I don't think the servers actually have any code that does >> anything with the data beyond storage and retrieval so all of the >> serialization and schema options don't make any sense to me. If you >> had schema for the way the bytes were formatted I don't see any reason >> why you couldn't just store it as a key and let the clients sort it >> out rather than putting all of the limitations in the server. >> >> >>> 2. Is it correct that when adding or removing servers to an existing >>> pool of running servers, will require shutting down the entire pool, >>> updating the cluster configuration on all servers, and getting them >>> back online? Would it make sense to integrate some Paxos kind of >>> configuration synchronization here ? (for example by integrating >>> ZooKeeper http://hadoop.apache.org/zookeeper/ into Voldemort) >>> >> >> I have not seen any code in Voldemort that handles adding or removing >> servers. There is a little bit of code that will rebalance your >> cluster which is something you would need in this kind of event after >> you've managed to update the configuration everywhere, but the code is >> commented out and not referenced from anywhere. >> >> >>> Other than that, Voldemort seems pretty neat! Are there any examples >>> of the scale of deployment, for example within LinkedIn ? Are any >>> other companies other than LinkedIn using Voldemort in a large-scale >>> deployment ? >>> >> >> I've been looking at it for Mochi but honestly I will probably choose >> another solution. I would be surprised if anyone is using Voldemort in >> production for data that isn't transient. I build a proof of concept >> client in Python to play with but it was a fair amount of work and it >> would take a while longer to polish it up, write some proper tests, >> and get the concurrency stuff right (all network stuff is currently >> serialized, it only speaks to one server at a time). >> >> -bob >> >> > >> >> > > > > >