Has anyone attempted to build an Order Preserving Routing Strategy
using Voldemort, comparable to that provided by Cassandra?
On the surface, this sounds like a fairly trivial operation to do
(implementing a new routing strategy). But digging deeper, I think
the structure of the cluster file (to support range partitions rather
than numeric) and possibly other structures would need to change.
Has anyone attempted to write an order preserving routing strategy in
Voldemort or thought about what would need changed to implement such a
routing strategy?
On Tue, Nov 23, 2010 at 6:28 AM, Mark Rambacher <mramb...@gmail.com> wrote: > Has anyone attempted to build an Order Preserving Routing Strategy > using Voldemort, comparable to that provided by Cassandra?
> On the surface, this sounds like a fairly trivial operation to do > (implementing a new routing strategy). But digging deeper, I think > the structure of the cluster file (to support range partitions rather > than numeric) and possibly other structures would need to change.
> Has anyone attempted to write an order preserving routing strategy in > Voldemort or thought about what would need changed to implement such a > routing strategy?
I haven't heard of such efforts, but my first question is about utility: would the use case be that of range queries? That seems to be the obvious one to me, but maybe there are others.
Range queries are one use case and a definite need for us.
Another would be a means of storing "like" data together. For
example, we could create "list keys" where we store a bunch of related
keys as individual keys with the same prefix. I could create an
aggregate key list by having a common prefix (and unique suffix) for a
group of keys. For example, I could prefix the set of contact keys
for a user with some prefix and retrieve all of the contacts by
looking for that prefix. This is comparable to the "range query" case
but slightly different, in that the end goal is to get a single set of
information (all my contacts) rather than a general-purpose search.
The "storing like data together" could be taken one step further to
implement something comparable to a column-store, where a search of
"key" would give me all the data associated with a key, where "key-
column" would look up a specific property. Again, a more specialized
example of a range query.
There are potentially other use cases that could be dreamed of, but
the question is if this would be generally useful and, if so, how one
would go about implementing it in the existing Voldemort
infrastructure.
/ Mark
On Nov 23, 1:03 pm, Tatu Saloranta <tsalora...@gmail.com> wrote:
> On Tue, Nov 23, 2010 at 6:28 AM, Mark Rambacher <mramb...@gmail.com> wrote:
> > Has anyone attempted to build an Order Preserving Routing Strategy
> > using Voldemort, comparable to that provided by Cassandra?
> > On the surface, this sounds like a fairly trivial operation to do
> > (implementing a new routing strategy). But digging deeper, I think
> > the structure of the cluster file (to support range partitions rather
> > than numeric) and possibly other structures would need to change.
> > Has anyone attempted to write an order preserving routing strategy in
> > Voldemort or thought about what would need changed to implement such a
> > routing strategy?
> I haven't heard of such efforts, but my first question is about
> utility: would the use case be that of range queries?
> That seems to be the obvious one to me, but maybe there are others.
On Tue, Nov 23, 2010 at 2:21 PM, Mark Rambacher <mramb...@gmail.com> wrote: > Range queries are one use case and a definite need for us.
> Another would be a means of storing "like" data together. For > example, we could create "list keys" where we store a bunch of related > keys as individual keys with the same prefix. I could create an > aggregate key list by having a common prefix (and unique suffix) for a > group of keys. For example, I could prefix the set of contact keys > for a user with some prefix and retrieve all of the contacts by > looking for that prefix. This is comparable to the "range query" case > but slightly different, in that the end goal is to get a single set of > information (all my contacts) rather than a general-purpose search.
> The "storing like data together" could be taken one step further to > implement something comparable to a column-store, where a search of > "key" would give me all the data associated with a key, where "key- > column" would look up a specific property. Again, a more specialized > example of a range query.
Yes, these make sense to me, and could also be an alternative to other kinds of modeling that is being discussed on other threads. Benefit of this approach is that it would be possibly to prevent loading of extra data, instead of reading and discarding it on server side.
Ability to do range queries seems useful, but even Cassandra seems/seemed to be moving away from this direction, since it is rather difficult to handle even load balancing with order-preserving key hash. It is tricky balance; usefulness of ordering versus difficulties in making it scale, especially when adding/removing hosts.
However I have been thinking that one way around this would be using combo-keys; where part of key is used for partitioning data, and other parts as "local" key. This would work for many (but not all) use cases: for example, where you have natural multi-level hierarchy (like S3 has account/bucket/key), and distribution of data is good enough even if only using part of key. In such cases one could even just hash part of the key, locate host(s) that have the data, and then access ranges from individual host.