Django & memcache hashing

38 views
Skip to first unread message

Ludvig Ericson

unread,
Oct 8, 2008, 4:37:08 AM10/8/08
to django-d...@googlegroups.com
Hello,
I had issues with Django opting to use cmemcache and secondarily
python-memcached.

This behavior is extremely dangerous-- lest you be aware, cmemcache
and python-memcached disagree on how to choose memcached in a list.
Now set up four-five machines with slightly varying needs, disaster.

The only two possible solutions:
1. ONLY use either cmemcache or python-memcached.
2. Use either cmemcache, or python-memcached with my cmemcache_hash
module which makes python-memcached hash like cmemcache does in
choosing server.

And before I get the "but it works for 90% of us" - I'm sorry, but
you'll have to choose. This took _ages_ to track down, and caused
personal loss as well as execution of a number of kittens.

Ludvig Ericson
ludvig....@gmail.com

Malcolm Tredinnick

unread,
Oct 8, 2008, 4:54:46 AM10/8/08
to django-d...@googlegroups.com

On Wed, 2008-10-08 at 10:37 +0200, Ludvig Ericson wrote:
> Hello,
> I had issues with Django opting to use cmemcache and secondarily
> python-memcached.
>
> This behavior is extremely dangerous-- lest you be aware, cmemcache
> and python-memcached disagree on how to choose memcached in a list.
> Now set up four-five machines with slightly varying needs, disaster.
>
> The only two possible solutions:
> 1. ONLY use either cmemcache or python-memcached.
> 2. Use either cmemcache, or python-memcached with my cmemcache_hash
> module which makes python-memcached hash like cmemcache does in
> choosing server.

Could you create a documentation patch, or at least open a ticket about
this, please, so that somebody else creates a documentation patch? This
is definitely worth recording in the docs. It's ultimately a
configuration issue with the people setting up the four or five machines
(although see below for how we can make it controllable without harming
the status quo).

Have you filed your patch for python-memcached with the upstream
maintainer yet? He's usually fairly responsive to bug reports. I haven't
had any personal dealings with the cmemcached guys, but they might be
interested in some sort of common agreement on hashing algorithms as
well.

Django should continue to try to do both imports by default, since even
if we introduced a setting, there's still no guarantee that the setting
would be the same on all the different installations and there's no way
to ensure that it is automatically (it's a configuration issue). I'm
very much in favour of "should work out of the box regardless of which
memcache wrapper you have installed" for the time being, at least.


However, a setting to force it to one or the other (and if no setting is
present we do the current fallback) would be a good solution for the
multi-machine installation case. Then somebody scaling to multiple
machines could use the same settings file with confidence that things
would fail with prejudice if they configured things correctly but left
off installing cmemcached.

Best of all (and in parallel), though, would be the idea where the
different memcached modules used the same hashing algorithm.

[...]


> This took _ages_ to track down, and caused
> personal loss as well as execution of a number of kittens.

Your violence towards animals is noted with disappointment. However,
when I rule the world there may be a place for you in my organisation.

Regards,
Malcolm

Ludvig Ericson

unread,
Oct 8, 2008, 10:56:58 AM10/8/08
to django-d...@googlegroups.com
On Oct 8, 2008, at 10:54, Malcolm Tredinnick wrote:

> Could you create a documentation patch, or at least open a ticket
> about
> this, please, so that somebody else creates a documentation patch?
> This
> is definitely worth recording in the docs. It's ultimately a
> configuration issue with the people setting up the four or five
> machines
> (although see below for how we can make it controllable without
> harming
> the status quo).

Opened ticket #9324 now.
http://code.djangoproject.com/ticket/9324

And yes, it's a configuration issue, but to err is human. :-)

> Have you filed your patch for python-memcached with the upstream
> maintainer yet? He's usually fairly responsive to bug reports. I
> haven't
> had any personal dealings with the cmemcached guys, but they might be
> interested in some sort of common agreement on hashing algorithms as
> well.

I haven't, but I haven't tried either -- if I got my say, I'd go for
cmemcache's, because pgmemcache does employ the same algorithm, and so
does the C library libmemcached (which python-libmemcached uses, as
well as my own pylibmc) to my knowledge.

I'll fire off an e-mail later, should be a quick change for him, as he
could just rip out the hash function from my cmemcache_hash.

> Django should continue to try to do both imports by default, since
> even

[...]


> However, a setting to force it to one or the other (and if no
> setting is

[...]


> Best of all (and in parallel), though, would be the idea where the
> different memcached modules used the same hashing algorithm.

I'm convinced and yes, this would remedy the issue to a great extent -
though it really is just moving the issue from having libraries
installed or not to having settings set. But a setting with some
googlable (new adj) documentation would be "friggin' ace, mate".

I suggest the following to be added to the docs (or something like it):

If you're experiencing problems where a user is logged out
sporadically, you should double-check that all your Django setups are
using the same memcached library.

You can tell Django to only use one library through the
`KITTENS_AND_BEER` setting.

> Your violence towards animals is noted with disappointment. However,
> when I rule the world there may be a place for you in my organisation.

Be sure to let me know.

Med vänliga hälsningar,
Ludvig Ericson

taleinat

unread,
Nov 19, 2008, 6:26:18 AM11/19/08
to Django developers
On Oct 8, 10:54 am, Malcolm Tredinnick wrote:
> On Wed, 2008-10-08 at 10:37 +0200, Ludvig Ericson wrote:
> > Hello,
> > I had issues with Django opting to use cmemcache and secondarily  
> > python-memcached.
>
> > This behavior is extremely dangerous-- lest you be aware, cmemcache  
> > and python-memcached disagree on how to choose memcached in a list.  
> > Now set up four-five machines with slightly varying needs, disaster.
>
> > The only two possible solutions:
> > 1. ONLY use either cmemcache or python-memcached.
> > 2. Use either cmemcache, or python-memcached with my cmemcache_hash  
> > module which makes python-memcachedhashlike cmemcache does in  
> > choosing server.
>
> Could you create a documentation patch, or at least open a ticket about
> this, please, so that somebody else creates a documentation patch? This
> is definitely worth recording in the docs. It's ultimately a
> configuration issue with the people setting up the four or five machines
> (although see below for how we can make it controllable without harming
> the status quo).
>
> Have you filed your patch for python-memcached with the upstream
> maintainer yet? He's usually fairly responsive to bug reports. I haven't
> had any personal dealings with the cmemcached guys, but they might be
> interested in some sort of common agreement on hashing algorithms as
> well.
>
> Django should continue to try to do both imports by default, since even
> if we introduced a setting, there's still no guarantee that the setting
> would be the same on all the different installations and there's no way
> to ensure that it is automatically (it's a configuration issue). I'm
> very much in favour of "should work out of the box regardless of whichmemcachewrapper you have installed" for the time being, at least.
>
> However, a setting to force it to one or the other (and if no setting is
> present we do the current fallback) would be a good solution for the
> multi-machine installation case. Then somebody scaling to multiple
> machines could use the same settings file with confidence that things
> would fail with prejudice if they configured things correctly but left
> off installing cmemcached.
>
> Best of all (and in parallel), though, would be the idea where the
> different memcached modules used the same hashing algorithm.

What about having Django's memcached cache backend implement its own
hashing algorithm? This could be the "standard" one used in
libmemcache and cmemcache, or perhaps a consistent hashing algorithm
[1] such as libketama[2].

- Tal


[1]http://www.socialtext.net/memcached/index.cgi?
faq#what_is_a_consistent_hashing_client

[2]http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-
_a_consistent_hashing_algo_for_memcache_clients

Ludvig Ericson

unread,
Nov 19, 2008, 3:37:54 PM11/19/08
to django-d...@googlegroups.com
On Nov 19, 2008, at 12:26, taleinat wrote:
> What about having Django's memcached cache backend implement its own
> hashing algorithm? This could be the "standard" one used in
> libmemcache and cmemcache, or perhaps a consistent hashing algorithm
> [1] such as libketama[2].


As I said in the ticket[1], using python-memcached with
cmemcache_hash[2] would probably be the most beneficial way to go.

So,
1. Try cmemcache
2. Try python-memcached
2.1. Try cmemcache_hash

Also, as said earlier, we should definitely have a setting for
enforcing one or the other.

[1]: http://code.djangoproject.com/ticket/9324
[2]: http://pypi.python.org/pypi/cmemcache_hash

(Apologies if the formatting got weird, but Mail.app is being real lame.)
- Ludvig

Malcolm Tredinnick

unread,
Nov 19, 2008, 6:11:11 PM11/19/08
to django-d...@googlegroups.com

On Wed, 2008-11-19 at 03:26 -0800, taleinat wrote:
[...]

> What about having Django's memcached cache backend implement its own
> hashing algorithm? This could be the "standard" one used in
> libmemcache and cmemcache, or perhaps a consistent hashing algorithm
> [1] such as libketama[2].

That's possible. The hash function in the pure-Python memcached wrapper
is replaceable (it's an attribute), so I was looking at replacing it
with the version from cmemcached. Using a third hashing algorithm would
be a bit silly, since then there's no compatibility with anything else
and it's just extra effort for no huge gain.

Regards,
Malcolm

Ludvig Ericson

unread,
Nov 19, 2008, 6:46:12 PM11/19/08
to django-d...@googlegroups.com
On Nov 20, 2008, at 00:11, Malcolm Tredinnick wrote:
> That's possible. The hash function in the pure-Python memcached
> wrapper
> is replaceable (it's an attribute), so I was looking at replacing it
> with the version from cmemcached. Using a third hashing algorithm
> would
> be a bit silly, since then there's no compatibility with anything else
> and it's just extra effort for no huge gain.

And that's _exactly_ what my cmemcache_hash package does! :-)

http://pypi.python.org/pypi/cmemcache_hash

I don't know if I attached a license, but it's BSD - so feel free to rip
it.

- Ludvig

Malcolm Tredinnick

unread,
Nov 19, 2008, 6:55:38 PM11/19/08
to django-d...@googlegroups.com

Okay. If we go this path, it's something to include in Django, rather
than recommending yet another caching package. We either make it a
configuration option to force python-memcache or cmemcache or we just
"Do The Right Thing", with the latter being preferable.

I hadn't realised from your earlier description that you had a
non-intrusive change that we could drop into Django. My
misunderstanding. Glad we're on the same page now.

Regards,
Malcolm


Ludvig Ericson

unread,
Nov 19, 2008, 6:59:31 PM11/19/08
to django-d...@googlegroups.com
On Nov 20, 2008, at 00:55, Malcolm Tredinnick wrote:
> Okay. If we go this path, it's something to include in Django, rather
> than recommending yet another caching package. We either make it a
> configuration option to force python-memcache or cmemcache or we just
> "Do The Right Thing", with the latter being preferable.

Absolutely -- that'd be my most favored solution as at that point it
wouldn't matter if you ran cmemcache or python-memcached (other than
that cmemcache might kill the process it runs in on protocol error.)

So, do you want me to make a patch or could you do it? I feel I'm not
entirely sure exactly where it should reside and so forth. But I could
take the time and find out if you're too busy.

> I hadn't realised from your earlier description that you had a
> non-intrusive change that we could drop into Django. My
> misunderstanding. Glad we're on the same page now.


I'm not a native speaker, so I might not be expressing my self in the
best English you've ever seen.

- Ludvig

Ivan Sagalaev

unread,
Nov 19, 2008, 11:20:57 PM11/19/08
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> Okay. If we go this path, it's something to include in Django, rather
> than recommending yet another caching package. We either make it a
> configuration option to force python-memcache or cmemcache or we just
> "Do The Right Thing", with the latter being preferable.

What concerns me is that this will break the usage of memcached without
Django's cache API. I had the need a couple of times to do plain
instantiation of memcache.Client and work with it. If it won't see the
cache the same way as Django does it would be that very issue, hard to
debug, that started this thread.

Malcolm Tredinnick

unread,
Nov 20, 2008, 2:58:27 AM11/20/08
to django-d...@googlegroups.com

This is indeed a concern. I was intending to put in a module that you
can import to get the same behaviour as Django. So instead of

import memcached

you write

from django.core.cache import memcached

I'm not 100% certain, though, that this is the way to go. I'm letting it
bounce around for a few days. Both options have their drawbacks and it's
kind of a matter of weighing up which inconvenience is more likely to
occur, given that they're both relatively uncommon (after all, if you're
accessing Django objects via direct usage, you need to be using Django's
get_cache_key() and the like anyway).

Regards,
Malcolm


Ludvig Ericson

unread,
Nov 20, 2008, 4:01:06 AM11/20/08
to django-d...@googlegroups.com
On Nov 20, 2008, at 05:20, Ivan Sagalaev wrote:
> What concerns me is that this will break the usage of memcached
> without
> Django's cache API. I had the need a couple of times to do plain
> instantiation of memcache.Client and work with it. If it won't see the
> cache the same way as Django does it would be that very issue, hard to
> debug, that started this thread.

True, but that's because python-memcached for some reason still uses its
own hashing algorithm (pure CRC32) while other libraries are more or
less
unified in their hashing algorithm. (Wouldn't know about libmemcached.)

*ugh* Why can you never eat the pie and have it. :(

- Ludvig

Eric Holscher

unread,
Nov 20, 2008, 8:26:56 AM11/20/08
to django-d...@googlegroups.com
Just wanted to say that we ran into this exact issue at work the other day as well. We had the C and Python versions of memcache running, and it was hashing things differently (to different servers or something as I understand it). This caused us a good couple hours of confusion. We eventually figured it out and made sure that each of our boxes had the same version of memcache.
--
Eric Holscher
Web Developer at The World Company in Lawrence, Ks
http://www.ericholscher.com
er...@ericholscher.com

Johan Bergström

unread,
Nov 21, 2008, 7:10:58 AM11/21/08
to Django developers


On Nov 20, 8:58 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
I'm not sure this is the way to go. Personally, I use memcached from
lots of
distributed applications where (only) one of them is backed by Django.
It would
be a bit inconvenient to import Django into my other applications in
order to
make sure that I consistently use the same hashing algorithm.

Hashing should be up to the library itself. Modern memcached libraries
nowadays
also give you alternatives for consistent distributions (ketama and
others) which makes
python-memcached look a bit old.

For reasons raised in this thread (as well as beeing linked to the
crash-prone libmemcache library),
I don't think that cmemcache belongs as an alternative in Django.
Either offer an alternative
(pylibmc comes to mind, albeit at very young age) for performance or
remove it.

>
> Regards,
> Malcolm

Thanks,
Johan
Reply all
Reply to author
Forward
0 new messages