Consistent Hashing Algorithm

Andy McCurdy

unread,

Apr 20, 2010, 5:31:02 PM4/20/10

to Redis Clients Development

What consistent hashing algorithm is everyone using? We should really
all use the same out-of-the-box default implementation for easy
interoperability between clients. No reason to reinvent the terrible
problems everyone ran into 5 years ago with memcached clients using
different algorithms between clients.

I would generally prefer libketama since it's well understood and
implementations are probably already available in most languages. But
I'll go with whatever for consistency sake.

-andy

--
Subscription settings: http://groups.google.com/group/redis-clients-dev/subscribe?hl=it

Andy McCurdy

unread,

Apr 20, 2010, 6:03:22 PM4/20/10

to Redis Clients Development

Also, several libraries are supporting special syntax in the keys to influence what shard the key ends up on. I believe both redis-rb and Predis currently allow a key like: {hash-me}:456:789, where only the "hash-me" string (without the { and }) is used to determine the node the key should be stored on. This can be very convenient for clustering keys together with specific meaning to your dataset.

I like this and will be adding it to the Python later today.

As Ezra suggested at the SF meetup, we should really create a client lib manifesto that deems how a good client should behave.

clayton collie

unread,

Apr 20, 2010, 6:08:40 PM4/20/10

to redis-cl...@googlegroups.com

In my client, hashing algos are configurable, though the default is ketama. I also implemented "hash-tags" (do these have a formal name ?)

Michel Martens

unread,

Apr 20, 2010, 7:00:22 PM4/20/10

to redis-cl...@googlegroups.com

On Tue, Apr 20, 2010 at 6:31 PM, Andy McCurdy <sed...@gmail.com> wrote:
> What consistent hashing algorithm is everyone using? We should really
> all use the same out-of-the-box default implementation for easy
> interoperability between clients. No reason to reinvent the terrible
> problems everyone ran into 5 years ago with memcached clients using
> different algorithms between clients.

I have some concerns with the current implementation in redis-rb, and
with the use of a hashring in general. It makes a lot of sense for
Memcached, and it makes sense for Redis if you want to use it to
replace Memcached, but for everything else it may not be a good
approach. It is designed to minimize the loss of keys, and in most
scenarios it's not desirable to lose a single key.

I talked to Salvatore about this and he has another structure in mind
for the redis-cluster project. I think agreeing on the algorithm is a
good thing in itself, but it would be great to discuss how can we
represent a distributed Redis without using a hashring.

Andy McCurdy

unread,

Apr 20, 2010, 9:01:10 PM4/20/10

to redis-cl...@googlegroups.com

On Tue, Apr 20, 2010 at 4:00 PM, Michel Martens <sov...@gmail.com> wrote:

I have some concerns with the current implementation in redis-rb, and
with the use of a hashring in general. It makes a lot of sense for
Memcached, and it makes sense for Redis if you want to use it to
replace Memcached, but for everything else it may not be a good
approach. It is designed to minimize the loss of keys, and in most
scenarios it's not desirable to lose a single key.

I talked to Salvatore about this and he has another structure in mind
for the redis-cluster project. I think agreeing on the algorithm is a
good thing in itself, but it would be great to discuss how can we
represent a distributed Redis without using a hashring.

I have similar concerns, but right now we don't have the infrastructure to coordinate instances and manage a cluster. In the interim, consistent hashing provides us with an option to go beyond the limits of a single CPU that many larger applications need.

What we can do, however, is provide some easy-to-use solutions to solve the common case, which is adding a new node to the cluster. Conveniently, Redis provides some hooks that make this far easier than other key-value db's. When adding a new node, we have a couple options:

- If the app can tolerate a slight downtime, find the position of the new node in the hash ring, then find the two adjacent nodes. Use the KEYS command on both of the adjacent nodes, find all the keys that should be migrated, and migrate them.

- For less (or perhaps no) downtime, the application can be a bit smarter. It can create two clients -- one with a hash ring representing the cluster prior to the new node(s), and another representing the cluster after. All requests first attempt the new client instance, falling back to the old client if the key wasn't found. In the background, run the script above to migrate keys around.

These approaches will work better with lots of smaller Redis instances so that there's less keys to scan through during the migration process.

Definitely interested others handling this in production.

Daniele Alessandri

unread,

Apr 21, 2010, 5:17:51 AM4/21/10

to redis-cl...@googlegroups.com

On Tue, Apr 20, 2010 at 23:31, Andy McCurdy <sed...@gmail.com> wrote:

> What consistent hashing algorithm is everyone using? We should really
> all use the same out-of-the-box default implementation for easy
> interoperability between clients. No reason to reinvent the terrible
> problems everyone ran into 5 years ago with memcached clients using
> different algorithms between clients.

With Predis I'm using an algorithm similar to the one that can be
found in redis-rb but with fewer replicas (the value itself is
customizable, so it's actually easy to adjust the distribution to be
compatible with the one produced by redis-rb). Even switching to a
different algorithm wouldn't be a problem for me as the current
development version of Predis supports user-customizable key
distribution algorithms and it has an optional pure-php implementation
of libketama which uses MD5 for the hashing part.

--
Daniele Alessandri
http://www.clorophilla.net/
http://twitter.com/JoL1hAHN

Alejandro Crosa

unread,

Apr 21, 2010, 3:13:02 PM4/21/10

to redis-cl...@googlegroups.com

same thing with the scala redis library, it uses the redis-rb algorithm but it's easy to replace with another. Doing hashing on the server and rebalancing is not a trivial thing to implement at all, actually quite the opposite.

Alejandro

Reply all

Reply to author

Forward