High load on web servers when using consistent hashing enabled with PHP's Memcache module

137 views
Skip to first unread message

Pavel Aleksandrov

unread,
Feb 27, 2009, 9:50:47 AM2/27/09
to memcached
Hello, I am working for a big web site. We have around 9000 hits/s on
our MySQL replication trees and 500 000 unique visitors each day, just
to give a clue about the load we are experiencing. We run on MySQL,
Apache2, Gentoo, PHP 4 + PECL Memcache module. We've been using a
single 12G memcached instance for speeding up things (we've reached
the point where we can't solely rely on our DB). Using a single
instance is not what memcached is meant for, so we decided to scale
things up a bit, so we added 12 more instances, 2G each (32 bit
servers, 4 instances per server, 3 servers). Then we switched from the
"standard" (naive) method of hash distribution to the "consistent"
method.

What happened was that the load on our web nodes (we have 3 of them)
went up about 3 times the usual. I'm guessing it's the new hash
distribution method that's doing this. Am I missing something or using
this method is always so CPU intensive? Do we have another choice or
we should invest in more web nodes, to distribute the new load if we
decide to stick to the consistent hashing algorithm?

Pavel Aleksandrov

unread,
Feb 27, 2009, 9:52:58 AM2/27/09
to memcached
I forgot to mention - we're using persistent connections. Without them
the impact was even uglier...

Brian Moon

unread,
Feb 27, 2009, 9:53:33 AM2/27/09
to memc...@googlegroups.com

Are you really using PHP4? Not related just shocked.

Did you make all these changes at once?

--

Brian.

Pavel Aleksandrov

unread,
Feb 27, 2009, 10:00:33 AM2/27/09
to memcached
Never mind the PHP, it's a topic I don't want to discuss :)

About the changes - the only change that made this impact was changing
the hash distribution method. We are currently using the new memcache
instances, but with the standard, naive method and there are no
negative effects on the load of the web nodes. The moment we switch to
the consistent method the load jumps.

Boris Partensky

unread,
Feb 27, 2009, 10:07:31 AM2/27/09
to memc...@googlegroups.com
Also not related, just curious: why did you feel the need to switch to consistent hashing? Did the memcached instances go down/get restarted a lot or did you just feel that it was the right thing to do since you are anticipating more instances added?

Boris

2009/2/27 Pavel Aleksandrov <paff...@gmail.com>



--
--Boris

Brian Moon

unread,
Feb 27, 2009, 10:08:49 AM2/27/09
to memc...@googlegroups.com
On 2/27/09 9:00 AM, Pavel Aleksandrov wrote:
> Never mind the PHP, it's a topic I don't want to discuss :)
>
> About the changes - the only change that made this impact was changing
> the hash distribution method. We are currently using the new memcache
> instances, but with the standard, naive method and there are no
> negative effects on the load of the web nodes. The moment we switch to
> the consistent method the load jumps.

Well, I don't use the consistent hashing. I guess I am naive. I also
have not heard of this problem before however.

--

Brian.

Pavel Aleksandrov

unread,
Feb 27, 2009, 10:14:56 AM2/27/09
to memcached
Well, after some thinking on the problem, I'm thinking on sticking to
the standard method, we do not expect a lot of movement of instances.
I just didn't expect such a overhead when using the consistent method,
so I wondered if something is wrong.

On 27 Фев, 17:07, Boris Partensky <boris.parten...@gmail.com> wrote:
> Also not related, just curious: why did you feel the need to switch to
> consistent hashing? Did the memcached instances go down/get restarted a lot
> or did you just feel that it was the right thing to do since you are
> anticipating more instances added?
>
> Boris
>
> 2009/2/27 Pavel Aleksandrov <pafffk...@gmail.com>

Pavel Aleksandrov

unread,
Feb 27, 2009, 10:19:25 AM2/27/09
to memcached
I used "naive" for the standard method, because it's described as such
in many places where they talk about this algorithms. As I said in the
previous message, we don't expect the instances to go much up or down,
so using the standard hashing may be OK for what we need. My question
was about the overhead - apparently the module recalculates each time
where everything should go and this involves a lot of hashing for each
server and that translates in CPU load on the web nodes.

Marc Bollinger

unread,
Feb 27, 2009, 11:05:39 AM2/27/09
to memc...@googlegroups.com
We definitely saw this exact same behavior with the PHP PECL module, using consistent
hashing. Weirdly enough, I was just digging around before responding, and version 3.0.4 was released a few days ago with the changelog statement: "Improved performance of consistent hash strategy." You might check that out if you're using a version more than a week old, but it sounds like you've decided against consistent hashing. We initially thought it might be a good idea because we're running memcached instances on EC2, but realized even there, the servers are static enough that having additional overhead for each read isn't worth it.

- Marc

2009/2/27 Pavel Aleksandrov <paff...@gmail.com>

Henrik Schröder

unread,
Feb 27, 2009, 11:10:19 AM2/27/09
to memc...@googlegroups.com
That is really weird, since the only difference between naive and consistent server selection is that for the consistent one, you pre-calculate an array of integers that holds 100 values per server during startup, and for your actual server selection, you do a binary search into this array with your hashed key, but that's a really trivial operation.

However, I seem to remember that the PHP client uses an external C library, libketama or something, for doing the consistent server selection, this might cause a big overhead in your case compared to doing the naive selection which is probably implemented straight in PHP. I know that for the Perl clients, there's one in pure Perl, and one that also uses libketama, maybe there's something similar for PHP?


/Henrik

me from

unread,
Feb 27, 2009, 11:12:47 AM2/27/09
to memc...@googlegroups.com
nginx\lighthttpd + php 5.3RC2 will work 60-90% faster. with APC it will be greater than 300-500%. Even crypt algoritms work faster. What can you hear there if php4 isnt officially supported. memcached used in ur way will have a bottleneck in transmition. without facebooks patch u cant handle it.

PS: linux is not a wise choose. Use freebsd instead. its not an advice.

2009/2/27 Pavel Aleksandrov <paff...@gmail.com>

me from

unread,
Feb 27, 2009, 11:16:31 AM2/27/09
to memc...@googlegroups.com
it doesnt imho but i may be wrong. libketama its an old lib of last.fm. PECL lib implements its own might be based on libketama.

Pavel Aleksandrov

unread,
Feb 27, 2009, 11:21:40 AM2/27/09
to memcached
Yes, I new about the new version which claims to have higher
performance for the consistent method, but it's still in beta, so we
don't want to experiment with production servers.

On 27 Фев, 18:05, Marc Bollinger <mbollin...@gmail.com> wrote:
> We definitely saw this exact same behavior with the PHP PECL module, using
> consistent
> hashing. Weirdly enough, I was just digging around before responding, and
> version 3.0.4 was released a few days ago with the changelog statement:
> "Improved performance of consistent hash strategy." You might check that out
> if you're using a version more than a week old, but it sounds like you've
> decided against consistent hashing. We initially thought it might be a good
> idea because we're running memcached instances on EC2, but realized even
> there, the servers are static enough that having additional overhead for
> each read isn't worth it.
>
> - Marc
>
> 2009/2/27 Pavel Aleksandrov <pafffk...@gmail.com>

Pavel Aleksandrov

unread,
Feb 27, 2009, 11:25:06 AM2/27/09
to memcached
I am talking exactly about this pre-calculation of these 100 values
per server - this is done by every Apache instance, and if you
multiply it by our hitrate.. (around 18 000 000 impressions per day)
you get the picture. As far as I know the php module does not use
last.fm's implementation of the consistent hashing algorithm.

On 27 Фев, 18:10, Henrik Schröder <skro...@gmail.com> wrote:
> That is really weird, since the only difference between naive and consistent
> server selection is that for the consistent one, you pre-calculate an array
> of integers that holds 100 values per server during startup, and for your
> actual server selection, you do a binary search into this array with your
> hashed key, but that's a really trivial operation.
>
> However, I seem to remember that the PHP client uses an external C library,
> libketama or something, for doing the consistent server selection, this
> might cause a big overhead in your case compared to doing the naive
> selection which is probably implemented straight in PHP. I know that for the
> Perl clients, there's one in pure Perl, and one that also uses libketama,
> maybe there's something similar for PHP?
>
> /Henrik
>

Mikael Johansson

unread,
Feb 27, 2009, 12:48:37 PM2/27/09
to memc...@googlegroups.com
Hi,

The 2.2.x branch had the same performance improvement but it wasn't
released until a few minutes ago :) Try the 2.2.5 release, should give
you better performance when using the consistent hashing strategy.

Available at

http://pecl.php.net/package/memcache

//Mikael

signature.asc

Xaxo

unread,
Feb 28, 2009, 11:39:27 AM2/28/09
to memcached


On Feb 27, 6:48 pm, Mikael Johansson <mik...@synd.info> wrote:
> Hi,
>
> The 2.2.x branch had the same performance improvement but it wasn't
> released until a few minutes ago :) Try the 2.2.5 release, should give
> you better performance when using the consistent hashing strategy.

The performance improvement people are talking about is:

http://cvs.php.net/viewvc.cgi/pecl/memcache/memcache_consistent_hash.c?r1=1.6&r2=1.7

it might bring you some performance improvement in the algorithm
selecting a server for a particular key. Nevertheless, the
implementation is slow and clumsy: it creates <weight> * 160 hashes
per server and then takes 1024 of them. One of these 1024 is chosen
upon a simple division <your hash> % 1024, which points back to a
server from your pool.

In the case of the standard algorithm, you create an array in which
every server contributes <weight> elements. You decide upon a server
again by a simple division <your key> % <array size>.

As you can see, the consistent algorithm has a great performance
impact due to:
- calculating <weight> * 160 hashes for each server
- choosing 1024 out of all hashes

another negative impact, that you didn't saw is when you have a small
number of servers (3 - 4) and small weights (say 1), then the load on
the servers is unequally distributed, due to the fact that you have to
fill these 1024 places with say 640 elements. As the filling algorithm
works, some elements will take several places in the 1024 array,
therefore upon server selection via <key> % 1024, some servers will
have higher probability to be selected.

Having said this, if you need performance improvements:
- try the patch mentioned above
- try some other php client
- look how others do server selection and implement it in some
present php client
- design your own server selection algorithm and implement it
- stick with the standard straight forward algorithm

* The constants mentioned above are to be found in
http://cvs.php.net/viewvc.cgi/pecl/memcache/php_memcache.h?revision=1.39&view=markup
namely:
90 #define MMC_CONSISTENT_POINTS 160 /* points per server */
91 #define MMC_CONSISTENT_BUCKETS 1024 /* number of precomputed
buckets, should be power of 2 */

Mikael Johansson

unread,
Feb 28, 2009, 12:51:20 PM2/28/09
to memc...@googlegroups.com
Hi,

I believe the consistent hashing implementation in pecl/memcache is
simply a C port of the original one from Set-ConsistentHash, available from

http://search.cpan.org/~bradfitz/Set-ConsistentHash/

What algorithms are others using? From what I can pick up from the
libmemcached sources it works much the same but without the precomputed
1024 element array. The 100 (or 160 in the case of ketama) points per
server is still computed and sorted, so consistent hashing will always
be slower than the simple modulo operation of the standard algorithm.

It does however make sense to skip the precomputed array and instead
search the continuum for a matching server on each request. One would
have to make more than 1024 requests for the array to pay off, something
the average webpage rendering is unlikely to reach.

//Mikael

signature.asc

Henrik Schröder

unread,
Feb 28, 2009, 7:32:52 PM2/28/09
to memc...@googlegroups.com
Hi Mikael,

In BeITMemcached, i'm using the same algorithm as described in this libketama document: http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients

My implementation is here: http://code.google.com/p/beitmemcached/source/browse/trunk/ClientLibrary/ServerPool.cs

Also, I'm very curious as to why you would do the consistent server selection in an external library? It's really not a very complicated algorithm, as long as you can do binary search in an array of integers and find the nearest match, you're essentially done.

And the speed of consistent server selection really shouldn't be noticeably slower than naive server selection, in the former you do a binary search into a small array, and in the latter you do a modulo operation, and that's the only difference between them, but you still only hash your key once?

The idea of pre-calculating your server continuum is that you only do it once on each webserver or client application, if you have to do it once per lookup or once per webpage, you're screwed since the consistent server selection would be several thousand times slower, and that's probably very noticeable. On the other hand, if you don't have some sort of shared process for your web application, you can't do connection pooling, which probably has an even greater performance impact than doing thousands of unnecessary hashes...


/Henrik

Pavel Aleksandrov

unread,
Mar 1, 2009, 6:48:27 AM3/1/09
to memcached
Well, how do you exactly overcome doing the pre-calculation for each
apache/php instance (in my case)? That's where all the load comes
from, there are many implementations, but if you calculate several
hundred or more hashes each time a user opens a page, you get the
idea...

Otherwise - very interesting talk guys, I hope something comes up,
that can be useful not only for me, but for other people having the
same problem!

Mikael Johansson

unread,
Mar 1, 2009, 11:32:58 AM3/1/09
to memc...@googlegroups.com
Hi,

The server selection isn't done in an external library, it's just that
the algorithm used is the same as the one in Set-ConsistentHash, though
it uses crc32 or fnv1a instead of sha1 (versus md5 for ketama).

The big difference is that pecl/memcache and Set-ConsistentHash after
having built and sorted the continuum precomputes an array of 1024
buckets, which can then be used by hashing the key and using a modulo
operation on the array. It's computing this array that takes most of the
time. What can be done is, as you say, caching it in shared memory
alongside the persistent connection sockets. I'll have a look at this.

Consistent hashing takes O(log n*w*m) to find a server, not counting the
hash function itself, where n is the number of servers, w is the average
weight of a server (usually 1) and m is number of points per server
(usually 100 or 160). Standard modulo hashing is O(1), so it's
inherently faster.

//Mikael

> <http://search.cpan.org/%7Ebradfitz/Set-ConsistentHash/>


>
> What algorithms are others using? From what I can pick up from the
> libmemcached sources it works much the same but without the
> precomputed 1024 element array. The 100 (or 160 in the case of
> ketama) points per server is still computed and sorted, so
> consistent hashing will always be slower than the simple modulo
> operation of the standard algorithm.
>
> It does however make sense to skip the precomputed array and instead
> search the continuum for a matching server on each request. One
> would have to make more than 1024 requests for the array to pay off,
> something the average webpage rendering is unlikely to reach.
>
> //Mikael
>
>
> Xaxo wrote:
>
>
>
> On Feb 27, 6:48 pm, Mikael Johansson <mik...@synd.info

signature.asc

Xaxo

unread,
Mar 2, 2009, 6:37:17 AM3/2/09
to memcached
On Mar 1, 5:32 pm, Mikael Johansson <mik...@synd.info> wrote:
> The big difference is that pecl/memcache and Set-ConsistentHash after
> having built and sorted the continuum precomputes an array of 1024
> buckets, which can then be used by hashing the key and using a modulo
> operation on the array. It's computing this array that takes most of the
> time. What can be done is, as you say, caching it in shared memory
> alongside the persistent connection sockets. I'll have a look at this.

Caching it is the right way to go with this algorithm! You end up
doing just the modulo division as in the standard algorithm, so it
will be again O(1).

Nevertheless what the OP can do at the moment is play with the
constants :)

Henrik Schröder

unread,
Mar 2, 2009, 7:03:56 AM3/2/09
to memc...@googlegroups.com
Well, you could store the pre-calculated results in a shared cache.. oh wait..

No, it really shouldn't be a problem to do those calculations once per webserver, in a sort of average real-world case you'd end up needing a few thousand hashes, and if you have a website with lots of incoming requests that each generate a lot of memcache gets, you'll be looking at doing hundreds, maybe thousands of hashes per second. It doesn't seem right that you should have a problem initializing your memcached client with a few thousand hashes once, but have no problems sustaining a throughput of maybe a thousand hashes per second?

(Assuming this is, of course, what the php memcached client does, I've never used it myself. :-) )


/Henrik

Henrik Schröder

unread,
Mar 2, 2009, 7:25:43 AM3/2/09
to memc...@googlegroups.com
Hi,

This fixed-length array, how do you calculate it? It could be interesting to implement, but I couldn't find anything in the links provided so far in this discussion. I guess you could do a binary search for 1024 evenly spaced values, that should give you the properties of the array that you want, or does it do something even more clever?

About the complexity, yes, a single modulo operation is faster than a binary search, but if you look at the whole thing, network I/O is surely much, much slower, such that the difference in server selection algorithm becomes insignificant. It's clever though, I like that! :-)


/Henrik

Xaxo

unread,
Mar 2, 2009, 12:06:37 PM3/2/09
to memcached


On Mar 2, 1:25 pm, Henrik Schröder <skro...@gmail.com> wrote:
> Hi,
>
> This fixed-length array, how do you calculate it? It could be interesting to
> implement, but I couldn't find anything in the links provided so far in this
> discussion. I guess you could do a binary search for 1024 evenly spaced
> values, that should give you the properties of the array that you want, or
> does it do something even more clever?

http://cvs.php.net/viewvc.cgi/pecl/memcache/memcache_consistent_hash.c?revision=1.7&view=markup

80 static mmc_t *mmc_consistent_find(mmc_consistent_state_t *state,
unsigned int point) /* {{{ */
81 {
82 int lo = 0, hi = state->num_points - 1, mid;
83
84 while (1) {
85 /* point is outside interval or lo >= hi, wrap-around */
86 if (point <= state->points[lo].point || point > state->points
[hi].point) {
87 return state->points[lo].server;
88 }
89
90 /* test middle point */
91 mid = lo + (hi - lo) / 2;
92 MMC_DEBUG(("mmc_consistent_find: lo %d, hi %d, mid %d, point %u,
midpoint %u", lo, hi, mid, point, state->points[mid].point));
93
94 /* perfect match */
95 if (point <= state->points[mid].point && point > (mid ? state-
>points[mid-1].point : 0)) {
96 return state->points[mid].server;
97 }
98
99 /* too low, go up */
100 if (state->points[mid].point < point) {
101 lo = mid + 1;
102 }
103 else {
104 hi = mid - 1;
105 }
106 }
107 }
108 /* }}} */
109
110 static void mmc_consistent_populate_buckets(mmc_consistent_state_t
*state) /* {{{ */
111 {
112 unsigned int i, step = 0xffffffff / MMC_CONSISTENT_BUCKETS;
113
114 qsort((void *)state->points, state->num_points, sizeof
(mmc_consistent_point_t), mmc_consistent_compare);
115 for (i=0; i<MMC_CONSISTENT_BUCKETS; i++) {
116 state->buckets[i] = mmc_consistent_find(state, step * i);
117 }
118
119 state->buckets_populated = 1;
120 }

> About the complexity, yes, a single modulo operation is faster than a binary
> search, but if you look at the whole thing, network I/O is surely much, much
> slower, such that the difference in server selection algorithm becomes
> insignificant. It's clever though, I like that! :-)

I hope you see that "while (1)" cycle and realise that it is not
insignificant, since you always start the search from point 0 so you
are doing at least 1023 unneeded cycles -> unneeded cpu usage :) It
would be better if that 'lo' variable is static, so that it's value is
preserved upon next excution, you don't really need to start from 0
every time. Moreover a large number of hashes are being computed every
time, which adds to the total CPU usage, and network I/O is irrelevant
here, since the OPs problem is CPU usage, not latency.

Dean Harding

unread,
Mar 2, 2009, 5:00:44 PM3/2/09
to memc...@googlegroups.com
> Well, you could store the pre-calculated results in a shared cache.. oh
> wait..
>
> No, it really shouldn't be a problem to do those calculations once per
> webserver, in a sort of average real-world case you'd end up needing a
> few thousand hashes, and if you have a website with lots of incoming
> requests that each generate a lot of memcache gets, you'll be looking
> at doing hundreds, maybe thousands of hashes per second. It doesn't
> seem right that you should have a problem initializing your memcached
> client with a few thousand hashes once, but have no problems sustaining
> a throughput of maybe a thousand hashes per second?
>
> (Assuming this is, of course, what the php memcached client does, I've
> never used it myself. :-) )

That what I would have thought, but I'm guessing from what has been said that in this case, a new php process is being started for each request, which means you need to initialize the client once for each request. I'm talking about this quote specifically:

[Pavel Aleksandrov]


> if you calculate several hundred or more hashes each
> time a user opens a page, you get the idea...

Personally, I think the solution would be to set the MaxRequestsPerChild much higher in order to reuse the same php process for many requests, but maybe that's not an option? Or maybe I misunderstood that quote...

Dean.

Pavel Aleksandrov

unread,
Mar 4, 2009, 8:02:43 AM3/4/09
to memcached
We are already using high MaxRequestsPerChild. I talked with our
administrator. The problem can be solved using less apache processes
and using threads, then they will share the persistent connections,
but we use eaccelerator in combination with PHP4, which doesn't play
well with threads. We will try a migration to PHP5, which should solve
this problem and we will try using threaded apache processes. I will
write when we have some results.

Dustin

unread,
Mar 4, 2009, 3:12:57 PM3/4/09
to memcached

On Mar 2, 4:03 am, Henrik Schröder <skro...@gmail.com> wrote:
> Well, you could store the pre-calculated results in a shared cache.. oh
> wait..

The original published C ketama code did that -- you would calculate
your list once and it'd store it in a shm segment.
Reply all
Reply to author
Forward
0 new messages