Voldemort vs Cassandra

344 views
Skip to first unread message

tomer

unread,
Mar 31, 2010, 11:06:28 AM3/31/10
to project-voldemort
Hi,
I'm trying to figure out whether I need voldemort or cassandra, can
anyone please refer me to what are the benefits of using voldemort
with regards to cassandra?
Thanks

Jeff

unread,
Apr 5, 2010, 12:46:55 PM4/5/10
to project-voldemort
FWIW a very simplistic filter I use: If I know the 'primary key' of
the thing I need, I use Project Voldemort. Its like a distributed hash
table. OTOH if I have to query for records (meaning I don't know the
record or records I need), then you need something that keeps indices.
And while you can iterate over a hash table/map, it would not be your
first choice for a collection type if iteration was important to you.

Simplistic for sure, maybe not even accurate, but its worked for me.

Jacques

unread,
Apr 12, 2010, 4:21:25 PM4/12/10
to project-voldemort
I've been trying to figure this out as well.

To add to the above analysis, I'd say that, based on my understanding
of Voldemort and Cassandra, they are very directly competitive. If
you take a look at this analysis on page 11, you'll see that current
stable versions of Cassandra perform very poorly in regards to range
scans. http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

Of course, there are plans for the Cassandra team to improve this
which doesn't seem to be the case for Voldemort. But you should also
note that, as outlined here:
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/.
--the scan functionality causes pain with regards to data distribution
and hot spots.

Anyway, I wonder if using a custom bit of code with AdminClient and
Voldemort would not show similar scan performance to where Cassandra
is at right now. (I know the AdminClient isn't designed for this.)
Thoughts anybody?

Real word differentiators I see between V & Cassandra:
- Cassandra's ability to have consistency level determined when doing
get and put calls (as opposed to V where the levels are determined at
the store configuration level)
- Cassandra's ability to subdivide existing partitions based on usage
whereas V requires predetermined partitions to the best of my
understanding.
- Cassandra is easier to use with non-java clients (Thrift API and
multiple high level clients).
- Voldemort has the ability to do versioned puts
- Voldemort has a more mature java client (Hector seems to be the #1
client for Java for C but is on a separate dev cycle and doesn't
necessarily reveal all functionality)
- Voldemort is an easier data model to understand since it is a simple
key value store.


Six months ago, V seemed a more stable and feature complete product
than Cassandra (even without re-balancing completed). That said, with
the recent influx of additional big companies using C, it seems like
some momentum is being created in that community.

My two cents right now:
--If you want big data and fast scans, it seems like you have to go
with Hbase (which doesn't have high-availability (easily) and requires
management of both hadoop and zookeeper in addition to hbase).
--If you need availability, V & C both seem competitive. V seems more
stable but it seems like C may have more ongoing developer support and
more companies backing it.
--If you need availability & fast scans, go with C and hope for the
best.

Of course this is all very high level and just my humble opinion based
on my partial evaluation of each of the technologies.

Good luck,
Jacques

Tatu Saloranta

unread,
Apr 12, 2010, 6:37:39 PM4/12/10
to project-...@googlegroups.com
On Mon, Apr 12, 2010 at 1:21 PM, Jacques <whs...@gmail.com> wrote:
> I've been trying to figure this out as well.
>
> To add to the above analysis, I'd say that, based on my understanding
> of Voldemort and Cassandra, they are very directly competitive.  If
> you take a look at this analysis on page 11, you'll see that current
> stable versions of Cassandra perform very poorly in regards to range
> scans. http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

But isn't ability to actually do range queries (with ordered keys) one
of features that Voldemort does not have, by design?
At least I don't see a way to do efficient range queries with
hash-backed storage. I am not talking about ability to traverse all
key/value pairs in the system, but actual range queries for subset of
keys.

-+ Tatu +-

Jacques

unread,
Apr 12, 2010, 6:58:19 PM4/12/10
to project-voldemort
Yes. V was never designed for range queries. I was more pointing out
that if you want to do range queries, Cassandra doesn't sound like a
great option, either.

On Apr 12, 3:37 pm, Tatu Saloranta <tsalora...@gmail.com> wrote:
> On Mon, Apr 12, 2010 at 1:21 PM, Jacques <whs...@gmail.com> wrote:
> > I've been trying to figure this out as well.
>
> > To add to the above analysis, I'd say that, based on my understanding
> > of Voldemort and Cassandra, they are very directly competitive.  If
> > you take a look at this analysis on page 11, you'll see that current
> > stable versions of Cassandra perform very poorly in regards to range

> > scans.http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

Alex Feinberg

unread,
Apr 12, 2010, 7:00:15 PM4/12/10
to project-...@googlegroups.com
Cassandra is a great project: Avinash (Cassandra's original author at
Facebook) himself had worked on the original Dynamo implementation and
wanted Cassandra to be a "Dynamo 2.0" (thus the influx of many new and
interesting ideas). "NoSQL" isn't a zero-sum game or a race.
Engineering roadmaps driven by competitive feature matrices are a
problem in the commercial software world, great thing about open
source is not having this pressure.

The big problem is that range queries can already involve random disk
seeks (especially with log-structured storage engines, which is the
case both for SSTable
and BerkleyDB Java Edition). If you're also doing quorums on top of
that, that's another multiplier. That's not to stay there's no use for
range scans with Cassandra: extracting for processing in Hadoop
MapReduce is great use case and is possible with latest versions of
Cassandra; there also other uses cases (situations where quorums
aren't required, where range queries are rare, where key order
frequently matches disk order leading in less disk seeks even with
log-structures).

However, I (and I'm speaking for myself, not for Project Voldemort
community or LinkedIn) wouldn't build an application that *relied* on
frequently performing Cassandra range queries, just yet. With many
sorts of data you can effectively "cheat" to avoid using range queries
and some Cassandra users end up using that.

For example, time series is a very frequently use case for range
queries. However, time series data also has the property of known
granularity (seconds, minutes, hours) and moving in one direction, so
one approach could be to associate all data for a specific time period
in to a specific key and use multi-gets (e.g., Voldemort's getAll())
to retrieve the time periods containing the desired range. Cassandra's
support for column families and super-columns (as well as Voldemort's
experimental support for views) can allow for atomically appending
events into a time period and only retrieving specific portions of
that period. Compression and serialization can allow for very
efficient storage for such lists.

As Jacon has mentioned, the way partitioning is done in Voldemort is
also different from Cassandra: Cassandra uses tokens, while Voldemort
uses partitions (which can be manually assigned to specific nodes).
That does mean that if Voldemort were to implement range queries
(entirely possible, if it proves to be the best route for a specific
use case), they would be implemented and would behave differently (for
better or for worse) in Voldemort. It can also mean that Voldemort has
the flexibility of using separate on-disk structures to represent each
partition (e.g., separate BDB environments -- something which is now
possible with BDB Je 4.0).

There are other important differences between the projects that should
help users make a choice which is right for them:

- Vector clocks in Voldemort. This is going to be officially included
in Cassandra soon, AFAIK the ability to do a versioned put will be
available as well .

- Pluggable Storage Engines in Voldemort: some storage engines may be
better for one thing but not another (e.g., storage of larger values,
fast reads with a traditional B+Tree vs. fast writes with a
log-structured B+Tree). We also have a high performance read-only
storage engine, optimized for constructing stores in Hadoop and doing
atomic fetch/swap operations.

- Other features derived from layering (e.g., view support)

- Support for more languages in Cassandra (through Thrift)

- Support for rack/datacenter aware routing strategies in Cassandra.
That's something that's very possible to do Voldemort and is very high
on the list of features to add, given it's a priority for several
large users of Voldemort, but for now users would have to rely on
manually placing the partitions/nodes at the right points in the hash
ring.

- Support for Merkle treees in Cassandra (something that we'd like to
add to Voldemort, if we can implement this in a reliable and
well-performing way)

Known companies are also using Voldemort: Nokia (whose contributions I
am presently working on merging into Voldemort), Jive Software,
eHarmony, Gilt Groupe and others. There are also some big names, which
I am not sure I can mention without their explicit permission; there
also some fairly high traction companies that I've seen mention
Voldemort in job advertisements, not sure if what their current use of
Voldemort is (perhaps it's time for a survey?).

Bottom line is that both Cassandra and Voldemort (as well as HBase and
several other storage systems) have multiple companies that are
invested in their stability and performance.

Thanks,
- Alex

On Mon, Apr 12, 2010 at 1:21 PM, Jacques <whs...@gmail.com> wrote:

> --
> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> To post to this group, send email to project-...@googlegroups.com.
> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
>
>

Alex Feinberg

unread,
Apr 12, 2010, 7:02:16 PM4/12/10
to project-...@googlegroups.com
Some of the storage engine do support range queries (BerkeleyDB-Je and
MySQL, for example). Like with the entries() method in the
StorageEngine interface, a "get_range()" method could be provided that
would throw an UnsupportedOperationException if used by a storage
engine which doesn't support range queries.

Thanks,
- Alex

On Mon, Apr 12, 2010 at 3:37 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
't ability to actually do range queries (with ordered keys) one
> of features that Voldemort does not have, by design?
> At least I don't see a way to do efficient range queries with
> hash-backed storage. I am not talking about ability to traverse all
> key/value pairs in the system, but actual range queries for subset of
> keys.
>
> -+ Tatu +-
>

Tatu Saloranta

unread,
Apr 12, 2010, 8:22:38 PM4/12/10
to project-...@googlegroups.com
On Mon, Apr 12, 2010 at 4:02 PM, Alex Feinberg <fein...@gmail.com> wrote:
> Some of the storage engine do support range queries (BerkeleyDB-Je and
> MySQL, for example). Like with the entries() method in the
> StorageEngine interface, a "get_range()" method could be provided that
> would throw an UnsupportedOperationException if used by a storage
> engine which doesn't support range queries.

True. Personally range queries are 50% of reason why Cassandra is a
serious contender -- it is very useful feature for many use cases. The
other one has to do with append-only style disk I/O: it can be both
benefit (for smallish data sets) and drawback (where I/O can't be
avoided, it can become excessive).

Which is to say that I would be interested in range queries on
Voldemort, too, if it becomes a possibility. Even if this requires
joining data from multiple nodes (which in many cases can be avoided
for C; or at least minimized).
And of course since these storage engines also generally support
multiple indexes, it might even be possible to support use of multiple
or aggregate indexes.

-+ Tatu +-

Alex Feinberg

unread,
Apr 12, 2010, 9:20:00 PM4/12/10
to project-...@googlegroups.com
On Mon, Apr 12, 2010 at 5:22 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
> True. Personally range queries are 50% of reason why Cassandra is a
> serious contender -- it is very useful feature for many use cases. The
> other one has to do with append-only style disk I/O: it can be both
> benefit (for smallish data sets) and drawback (where I/O can't be
> avoided, it can become excessive).

BerkeleyDB Java Edition is actually log-structured as well: the
architecture is completely different from traditional BerkeleyDB.
Writes are only sequential. A cleaner thread is used to perform
"garbage collection" (reading a log file, appending the "live" entries
back in to the log and then reclaiming the space used) on eligible
segments with low utilization. You can read more about the BerkeleyDB
Je architecture, but unfortunately the paper requires an Oracle login:

http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf

Nonetheless, other storage engines are available as plugins, for cases
where log-structured storage isn't a good fit.

> Which is to say that I would be interested in range queries on
> Voldemort, too, if it becomes a possibility. Even if this requires
> joining data from multiple nodes (which in many cases can be avoided
> for C; or at least minimized).
> And of course since these storage engines also generally support
> multiple indexes, it might even be possible to support use of multiple
> or aggregate indexes.

I don't see the use case for a range query implementation for
Voldemort that required going to *every single node*, or using a
method (common in the DHT literature) that would require multiple hops
(e.g., PHTs). The performance of such a setup would simply not be
acceptable, especially in conjunction with multiple disk seeks
involved for each range scan (especially on a log structured file
system) and quorum reads. A range query implementation (were one to be
made) would have to be based on an order preserving routing strategy,
similarly to Cassandra's OPP.

Thanks,
- Alex

Tatu Saloranta

unread,
Apr 13, 2010, 1:19:07 AM4/13/10
to project-...@googlegroups.com
On Mon, Apr 12, 2010 at 6:20 PM, Alex Feinberg <fein...@gmail.com> wrote:
> On Mon, Apr 12, 2010 at 5:22 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
>> True. Personally range queries are 50% of reason why Cassandra is a
>> serious contender -- it is very useful feature for many use cases. The
>> other one has to do with append-only style disk I/O: it can be both
>> benefit (for smallish data sets) and drawback (where I/O can't be
>> avoided, it can become excessive).
>
> BerkeleyDB Java Edition is actually log-structured as well: the
> architecture is completely different from traditional BerkeleyDB.

Yes, I am aware of this.
However, it is somewhat different from the way Cassandra does this if
I understand correctly: mostly because BDB-JE has no chance for
reordering entries (it uses journal "as-is"), at least not to the
degree Cassandra does.

...


> I don't see the use case for a range query implementation for
> Voldemort that required going to *every single node*, or using a
> method (common in the DHT literature) that would require multiple hops
> (e.g., PHTs). The performance of such a setup would simply not be
> acceptable, especially in conjunction with multiple disk seeks
> involved for each range scan (especially on a log structured file
> system) and quorum reads. A range query implementation (were one to be
> made) would have to be based on an order preserving routing strategy,
> similarly to Cassandra's OPP.

That would be optimal, but I am not completely sure that even if one
did have to go through every node, it would be useless for all use
cases. This is the way sharding is commonly done with search
aggregation. Assuming large/largish result sets, overhead might not be
unreasonable, especially considering that lookups can be execute
concurrently.
And especially if it can be deduced up-front what subset of nodes need
to be queried (although I guess that can only be done with OPP).
Finally, locality within individual nodes might offset part of
overhead for intra-node communication, at least with slow disks
(network being faster than hard drives -- maybe not true with SSDs?)

I am not saying this is optimal, of course, just that different
compromises are acceptable for different use cases.

-+ Tatu +-

Reply all
Reply to author
Forward
0 new messages