I see a vast performance difference in a published benchmark. can
> Hi Sean,
> By default, Voldemort uses BerkeleyDB Java Edition as the storage
> engine. BerkeleyDB Java Edition actually uses a log-structured B+Tree,
> which is the same design principle as Log Structured Merge Trees used
> by SSTables in BigTable/Cassandra. If you'd like to learn more, I
> suggest reading the BerkeleyDB Java Edition Architecture white paper
> (http://www.oracle.com/go/?&Src=4945225&Act=7) from Oracle/Sleepycat.
> If you'd like to understand log structured systems in general, Mendel
> Rosenblum's Ph. D dissertation is a good start:http://www.eecs.berkeley.edu/~brewer/cs262/LFS.pdf(the paper is about
> a file system, but in grand scheme of things file systems and
> databases are remarkably similar).
> In terms of actual benchmarks, here's one:http://blog.medallia.com/2010/05/choosing_a_keyvalue_storage_sy.html
> Both Voldemort and Cassandra are also supported by the YCSB (Yahoo
> Cloud Storage Benchmark). We provide a slightly modified version of
> YCSB with Voldemort as the performance tool:https://github.com/voldemort/voldemort/wiki/Performance-Tool
> As far as I recall, writes are slightly faster in Cassandra and reads
> are slightly faster in Voldemort. At least with version 0.6 and
> earlier, I believe, out of the box, the performance impact of log
> compaction is somewhat less visible in Voldemort than in Cassandra (of
> course it entirely depends on your environment and configuration in
> both cases).
> HBase and Hypertable also use LSM trees and, in a normal scenario,
> also have very high write performance. I am not very familiar with
> Hypertable, but HBase is also able to do fast range scans. There are
> some very interesting applications built that leverage write
> performance and data model of HBase e.g., OpenTSDB.
> The key difference between Dynamo and BigTable is the behaviour in a
> failure scenario: in the case of BigTable, when a node responsible for
> a partition goes down, there is a period when read and write
> availability is lost until another node takes over. Using WAL
> shipping (I believe that either is supported or may be supported by
> HBase in the future), it's possible to achieve high availability for
> reads and there's ongoing work to minimize the "transition" period for
> a failed node down to a few seconds (presently, if I am correct, it's
> around 1-2 minutes?). Once this is done, it would mean that upon
> failure, you will see latency spikes as the clients retry writes until
> a success happens. The advantage of this is ability to do more atomic
> operations: e.g., to implement a counter in Voldemort, you have to use
> an "optimistic lock" with a vector clock (see the applyUpdate() method
> in StoreClient interface), but this can be done atomically in HBase.
> Of course, keep in mind that I'm talking about the architecture here,
> the implementation details change.
> - Alex
> On Sun, Jan 30, 2011 at 10:30 PM, Sean <sean.bigdata...@gmail.com> wrote:
> > People seem to have consensus that Bigtable model (HBase/Hypertable)
> > is good for range query, and Dynamo model (Cassandra/Voldemort) is
> > good for write. Ok, let's discuss from this consensus:
> > For write-heavy apps, is there any benckmark between Project Voldemort
> > and Cassandra? -- I suppose the consistency model and DHT routing are
> > probably similar in these two systems. The performance has a lot to do
> > with the data node storage? (BDB vs SSTable?)
> > Is there any theoretical or empirical comparison? Or benchmark results?
> > --
> > You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> > To post to this group, send email to firstname.lastname@example.org.
> > To unsubscribe from this group, send email to email@example.com.
> > For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.