About libketama

237 views
Skip to first unread message

Pau Freixes

unread,
Jun 23, 2009, 2:19:46 PM6/23/09
to memc...@googlegroups.com
Hi guys, this is a correct group to talk about some implementations and features regarded libketama ? if no stop here ;)

Today when I red the source code of libketama [1] to undertand how works and how consistent hashing works of corse I have found some "hot" characteristics with proably errors. Well may be I m in a mistake ;)

Algorithm follow this points :

- Create one vector of integer points sized in 160 * num_servers ( this 160 hardcode value may be have been seen good with heuristic proves )

- All of servers has one weight regarded the server memory and all memory.
  Perhaps, with three servers with 1, 1, 2 Gb, first server has a 0.25 of weight, second also and third 0.5

- Algoritm itereate for all servers and with weight calcule the  maxium points reached for server. Number of points iterated are regarded to weight, how many weight more points you can reached. For each iteration create a "random" integer points ( md5(str_ip + "-" + num_iteration )


At the end of algorithm you have one vector with n integers associated to servers, servers with more weight will have more points. If you think in original explanation in [2] the cyrcle will have a lot of points where one server is repeated.

Ok but whats happend when one server goes down, for example second server ?

Algorithm recalculates another time all contium, with all servers without one.  Now we have two servers one with 1 Gb and second with 2 Gb - the others server is down.
The weight of all serers will be change to 0.33 and 0.66. And the number of points assigned to each server will be change also !!

This change can be problematic because now the new continium has a more points for server A and server B , and probably keys that after this change was assigned in server A now can be assigned  in server B. And viceversa !

Without weight algoritm works fine.


[1] svn://svn.audioscrobbler.net/misc/ketama/
[2] http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients
--
--pau

Robey Pointer

unread,
Jun 23, 2009, 8:18:31 PM6/23/09
to memc...@googlegroups.com
On 23 Jun 2009, at 11:19 AM, Pau Freixes wrote:

> Hi guys, this is a correct group to talk about some implementations
> and features regarded libketama ? if no stop here ;)
>
> Today when I red the source code of libketama [1] to undertand how
> works and how consistent hashing works of corse I have found some
> "hot" characteristics with proably errors. Well may be I m in a
> mistake ;)

I think you're right. Because points-per-server is proportional to
(server weight / number of servers), when a server is added or
removed, the other servers may receive more or fewer points. Probably
ketama should have used points-per-server = server-weight * K.

robey

kroki

unread,
Jun 24, 2009, 4:00:48 AM6/24/09
to memcached
On Jun 24, 4:18 am, Robey Pointer <ro...@twitter.com> wrote:
> I think you're right. Because points-per-server is proportional to  
> (server weight / number of servers), when a server is added or  
> removed, the other servers may receive more or fewer points. Probably  
> ketama should have used points-per-server = server-weight * K.

IIRC original libketama implementation uses _fixed total_ number of
points
derived from the size of the shared memory region they reside in,
hence the
change per server on server add/remove.

Though many memcached clients implement consistent hashing, most
provide their own implementation of the algorithm, rather than link
with
libketama directly. For instance, in Perl client
Cache::Memcached::Fast
we use fixed points-per-server = server-weight * K, likely other
clients do
the same.

tomash

Pau Freixes

unread,
Jun 24, 2009, 6:56:05 PM6/24/09
to memc...@googlegroups.com

On Jun 24, 4:18 am, Robey Pointer <ro...@twitter.com> wrote:
> I think you're right. Because points-per-server is proportional to  
> (server weight / number of servers), when a server is added or  
> removed, the other servers may receive more or fewer points. Probably  
> ketama should have used points-per-server = server-weight * K.

IIRC original libketama implementation uses _fixed total_ number of
points
derived from the size of the shared memory region they reside in,
hence the
change per server on server add/remove.

I red the source code and i think it can has problems regarded the proportional value of weight, weight change if you add or remove some servers or change value memory of these.
 

Though many memcached clients implement consistent hashing, most
provide their own implementation of the algorithm, rather than link
with
libketama directly.  For instance, in Perl client
Cache::Memcached::Fast
we use fixed points-per-server = server-weight * K, likely other
clients do
the same.

K is a constant ? but server weight remains variable ?


--
--pau

kroki

unread,
Jun 25, 2009, 2:33:28 AM6/25/09
to memcached
On 25 июн, 02:56, Pau Freixes <pfrei...@gmail.com> wrote:
> I red the source code and i think it can has problems regarded the
> proportional value of weight, weight change if you add or remove some
> servers or change value memory of these.

I agree, that's why we use points-per-server = server-weight * K that
doesn't changes when servers are added or removed.

> K is a constant ? but server weight remains variable ?

In Cache::Memcached::Fast both K and server-weight may be specified by
user (via "ketama_points" and "weight" parameters), and they do not
change afterwards.

tomash

Pau Freixes

unread,
Jun 25, 2009, 2:34:57 PM6/25/09
to memc...@googlegroups.com
Sorry, Im a little boring ;)

But I red the source code of the new libmemcached library, and it has the same problem with "weighted" feature called MEMCACHED_BEHAVIOR_KETAMA_WEIGHTED.
--
--pau

Xaxo

unread,
Jun 26, 2009, 10:00:58 AM6/26/09
to memcached
On Jun 23, 8:19 pm, Pau Freixes <pfrei...@gmail.com> wrote:
> This change can be problematic because now the new continium has a more
> points for server A and server B , and probably keys that after this change
> was assigned in server A now can be assigned  in server B. And viceversa !
>
> Without weight algoritm works fine.

You are right, that the algorithm is somehow broken :) but you are
missing a point here that makes things even worse:
1) Imagine you have cache servers A, B and C
2) C goes down
3) keys are remapped to A and B
4) C comes up
and here comes the mess :)
* since you have assigned some keys (k1, k2) from C to A and B, A and
B might now have the new versions of k1 and k2. Imagine the cache on C
is still there (it was probably a network or firewall issue), now C is
serving the old keys k1 and k2 ;)
* I have not read the source you are referring to, but if it does
suff like you describe, namely remap keys from A to B and vice versa,
the mess is bigger. When C went down, you have remapped k3 from A to
B, so B might now have a newer version of k3. When C comes up, you
remap again k3 to A, so A will still serve the old version of k3.

Messy messy, most of the clients are broken in this way: they remap
keys upon server down and don't care that you might serve old stuff
after server up + remap again.

If you need consistency, don't use memcached at all or fix the problem
yourself ;) basically memcached stores key/value pairs on SINGLE
server, it doesn't care about the world outside and this is a big
problem, since it does not allow consistency over several instances.

Momchil

Pau Freixes

unread,
Jun 26, 2009, 11:11:24 AM6/26/09
to memc...@googlegroups.com
You are right, that the algorithm is somehow broken :) but you are
missing a point here that makes things even worse:
 1) Imagine you have cache servers A, B and C
 2) C goes down
 3) keys are remapped to A and B
 4) C comes up
and here comes the mess :)
 * since you have assigned some keys (k1, k2) from C to A and B, A and
B might now have the new versions of k1 and k2. Imagine the cache on C
is still there (it was probably a network or firewall issue), now C is
serving the old keys k1 and k2 ;)

Yes i Know, I talked in this list about this. Regarded network problems this situation can happen often. Some intermediate level to resolve consisteny cache will can help us, somethink like mcporxy ?


 * I have not read the source you are referring to, but if it does
suff like you describe, namely remap keys from A to B and vice versa,
the mess is bigger. When C went down, you have remapped k3 from A to
B, so B might now have a newer version of k3. When C comes up, you
remap again k3 to A, so A will still serve the old version of k3.

Messy messy, most of the clients are broken in this way: they remap
keys upon server down and don't care that you might serve old stuff
after server up + remap again.


This last night I spent some time working with downed_server_test.sh - shell script that you can see how many keys are remaped when one serve goes down and one server goes up, be careful it has some errors ;) - and another conclusion about the maxium but not perfect architecture with consistent hashing and memcached is use a several name host with plus one "neutral" host  for backup, and use monit and dns for automatic failover - only when one host down, at hand to come bak at status quo for clear cache mem.

Llike this

Hosts in you ketama or other consisten chache librarys :

server1.mycompany.com:1121 200
server2.mycompany.com:1121 200
server3.mycompany.com:1121 200

server1, server2, and server3 are mapped into 10.0.0.1, 10.0.0.2, 10.0.0.3. One plus server 10.0.0.4

When any server goes down, perhaps 10.0.0.3 monit change dns automaticly server3.mycompany.com to 10.0.0.4. This is fundamental to avoid different name rehashing !!!

To come back at status quo you need first restart memcached server to clean cache at 10.0.0.3 and change dns at hand ;)


Another solution is using some tool to avoid inconsisteny cache, bu i dont know any solution yet implemented

--
--pau

Henrik Schröder

unread,
Jun 26, 2009, 11:29:43 AM6/26/09
to memc...@googlegroups.com
Well, the problem of inconsistencies with automatic recovery from failover isn't a problem with memcached itself, it's client-dependent, and it's a trade-off between high availability or absolute consistency. It always depends on your application what's best for you. In the best of all worlds, all clients should offer the user the choice on how to handle failover. Mine doesn't, so I guess I'll have to spend some vacation time on fixing that. :-)


/Henrik

Pau Freixes

unread,
Jun 26, 2009, 1:23:56 PM6/26/09
to memc...@googlegroups.com
Sorry for the SPAM;

I wrote something about this

http://www.milnou.net/~pfreixes/blog/memcached/consistent_hashing_memcached_all_you_should_know.html


Good weekend

--pau
--
--pau
Reply all
Reply to author
Forward
0 new messages