Unfortunately I won't be able to make it to the next meeting — I've
got some financial troubles to deal with and sadly that takes priority
=(
However, I did thoroughly enjoy meeting many of you at the last meetup
and would like to contribute in any way that I can.
So, in the hopes of being helpful, here are some comments:
* I would group Dynamo and Cassandra as "eventually consistent
datastores" as opposed to "systems which use consistent hashing".
Cassandra thankfully supports partitioning schemes other than
consistent hashing...
* I would recommend adding Riak to the set. It's the best open source
implementation of Dynamo that I've seen:
* The novel contributions that Dynamo made to distributed systems
research was two-fold: (a) the specific manner in which
aoo-level-involved conflict resolution is handled and (b) tunable
parameters to control the desired levels of performance, availability
and durability. Everything else, e.g. vector clocks, had already been
worked to death before.
* The last time I dived into Cassandra they hadn't implemented vector
clocks yet — so you'd have lots of opportunity for data loss if your
machine clocks were to be out of sync. Fun! This is planned to be
fixed for 0.7 afaik — see issue 580 for more info.
* Cassandra's data model is nice — in so far as that it effectively
follows something similar to BigTable's structure if you use the order
preserving partitioner. The biggest issue is their convoluted
terminology.
* Riak is relatively easy to administer, whereas it's something of a
dark art to administer Cassandra clusters. I don't know if they've yet
fixed the requirement to restart the entire cluster every time you
wanted to change the data model, i.e. modify Column Families and
Keyspaces.
* As you know I'm a big fan of secondary indexes. As with most NoSQL
datastores, you have to create and manage these yourself here. But
thanks to its range queries support, Cassandra fares much better at
this than Riak's MapReduce functionality.
* Eventually consistent datastores don't provide native support for
transactions. If this is important for your applications, you can use
an external means of synchronising your changes. Cages is a Java
library which provides support for coarse locks on top of ZooKeeper:
http://code.google.com/p/cages/
* Personally, I'm not a big fan of ZooKeeper, so would strongly
recommend building something on top of Keyspace instead for this
purpose:
* FInally, the biggest impact of using eventually consistent
datastores is the massive change you have to make in how you design
your applications. They now have to deal with conflict resolution and
new sets of edge cases which require careful attention to detail.
Anyways, I hope the above proves useful in some way.
Hope you're all having a great day!
--
love, tav
plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
http://tav.espians.com | http://twitter.com/tav | skype:tavespian
* http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html
Hey all,
Unfortunately I won't be able to make it to the next meeting — I've
got some financial troubles to deal with and sadly that takes priority
=(
However, I did thoroughly enjoy meeting many of you at the last meetup
and would like to contribute in any way that I can.
So, in the hopes of being helpful, here are some comments:
* I would group Dynamo and Cassandra as "eventually consistent
datastores" as opposed to "systems which use consistent hashing".
Cassandra thankfully supports partitioning schemes other than
consistent hashing...