Design design needed: #14093 - Unable to create a new session key on higher traffic

Tim Graham

unread,

Feb 8, 2011, 9:48:09 AM2/8/11

to Django developers

Hi,

I wanted to raise this ticket for discussion as to what the best
solution might be. As suggested in the ticket, I think adding a
setting to control the number of cache attempts instead of hard-coding
"10000" might be a reasonable approach. I'm not sure if that solution
would be considered a "feature addition" (adding a new setting) and
thus not appropriate for this stage of the release cycle.

Has anyone else experienced the issue and devised a workaround? It's
a minor issue since the side-effect seems to be that the login
silently fails and a user must retry.

Thanks,
Tim

Ryan McIntosh

unread,

Feb 8, 2011, 11:08:13 AM2/8/11

to django-d...@googlegroups.com

Hi,

This could be done as an optional configuration with a hardcoded default of 10000. Since no configuration change is necessary to retain original behavior, than would this still qualify as a feature change?

I think I will weigh in on one aspect of this as I have run into this problem before. The side effect is as you describe until traffic increases further. At that point, users may not be able to login successfully until traffic subsides. "Minor issue" could be relative.

peace,

Ryan McIntosh
Software Architect
PeaceWorks Computer Consulting
ph: (204) 480-0314
cell: (204) 770-3682
ry...@peaceworks.ca

Hi,

Thanks,
Tim

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Russell Keith-Magee

unread,

Feb 10, 2011, 1:56:31 AM2/10/11

to django-d...@googlegroups.com

On Wed, Feb 9, 2011 at 12:08 AM, Ryan McIntosh <ry...@peaceworks.ca> wrote:
> Hi,
>
> This could be done as an optional configuration with a hardcoded default of 10000. Since no configuration change is necessary to retain original behavior, than would this still qualify as a feature change?
>
> I think I will weigh in on one aspect of this as I have run into this problem before. The side effect is as you describe until traffic increases further. At that point, users may not be able to login successfully until traffic subsides. "Minor issue" could be relative.

If a configuration item is necessary to avoid a bug, my feeling is
that it could still come under the category of bugfix, rather than
feature addition.

However, my concern here is that for any value of N, there will be
some level of traffic that will render that N insufficient. I'm not
fundamentally convinced that allowing N to be configurable will
actually fix the problem. I'd be much more interested in seeing a
genuine attempt to fix the problem, rather than just paper over it a
little more.

Of course, I haven't given any thought to what that fix would look like...

Yours,
Russ Magee %-)

Tom Evans

unread,

Feb 10, 2011, 6:36:15 AM2/10/11

to django-d...@googlegroups.com

On Thu, Feb 10, 2011 at 6:56 AM, Russell Keith-Magee
<rus...@keith-magee.com> wrote:
> However, my concern here is that for any value of N, there will be
> some level of traffic that will render that N insufficient. I'm not
> fundamentally convinced that allowing N to be configurable will
> actually fix the problem. I'd be much more interested in seeing a
> genuine attempt to fix the problem, rather than just paper over it a
> little more.
>
> Of course, I haven't given any thought to what that fix would look like...
>
> Yours,
> Russ Magee %-)
>

Hi Russ

The main problem is that collisions in md5 are inevitable given enough
users. Would it be possible to consider changing the session key
generation for a uuid based solution? UUID4 in its standard format is
36 characters, and* if you were to generate 100 billion keys a second
for the next 100 years, the chance of creating one duplicate in all of
those would be ~50%.

uuid is not part of python 2.4, so we would have to add an
implementation to django.utils.hashcompat.

If this sounds amenable, I can code up a patch for the ticket.

Cheers

Tom

* I must admit, according to Wikipedia, I've not checked the maths!

Tom Evans

unread,

Feb 10, 2011, 7:19:54 AM2/10/11

to django-d...@googlegroups.com

On Thu, Feb 10, 2011 at 11:36 AM, Tom Evans <teva...@googlemail.com> wrote:
> If this sounds amenable, I can code up a patch for the ticket.
>

Indeed, I was piqued, so did it anyway. Running with it now.

Cheers

Tom

Russell Keith-Magee

unread,

Feb 10, 2011, 7:45:40 AM2/10/11

to django-d...@googlegroups.com

The DB model for cache keys provides for 40 characters, so we can
certainly store a UUID. If you can provide an implementation and can
demonstrate that it won't be prone to key collisions and won't impose
any computational limits (e.g., it's easy to avoid collisions if you
have a global synchronization lock allocating ids), then it's probably
worth looking at.

Yours,
Russ Magee %-)

Tom Evans

unread,

Feb 10, 2011, 8:54:57 AM2/10/11

to django-d...@googlegroups.com

On Thu, Feb 10, 2011 at 12:45 PM, Russell Keith-Magee
<rus...@keith-magee.com> wrote:
> The DB model for cache keys provides for 40 characters, so we can
> certainly store a UUID. If you can provide an implementation and can
> demonstrate that it won't be prone to key collisions and won't impose
> any computational limits (e.g., it's easy to avoid collisions if you
> have a global synchronization lock allocating ids), then it's probably
> worth looking at.
>
> Yours,
> Russ Magee %-)
>

Hi Russ

I've updated the ticket with a patch against trunk implementing uuid
session keys. IIRC django still supports 2.4, so I've included in the
patch a verbatim copy of the uuid module from python 2.6. This is also
available as a package on pypi, I don't know whether it would be more
correct to do this, or to have uuid as an additional dependency if you
must use python 2.4.

We would probably want to get some second opinions on whether it is
sufficiently unlikely to generate a collision. I'm afraid I'm not a
crypto guru, so I can only go on what I have read, and that says that
for random uuids, you do not need to worry about collisions.
The python implementation uses an OS specific uuid function if it is
available, and otherwise uses srandom, falling back to random, to
generate its random bits.

Cheers

Tom

Tom Evans

unread,

Feb 10, 2011, 3:59:18 PM2/10/11

to django-d...@googlegroups.com

On Thu, Feb 10, 2011 at 1:54 PM, Tom Evans <teva...@googlemail.com> wrote:
> I've updated the ticket with a patch against trunk implementing uuid

> session keys....

One of the reasons why it was coded like this was because you can not
tell the difference in the cache backend between a key collision and
memcache being unavailable (as I read the comments). This is why it
will try 10,000 times to come up with a key before bailing.
With this switching the key to a UUID, the chance of collision is so
tiny as to be discounted, should this logic be changed slightly, so
that if the cache is not there, we don't go to the effort of
generating 10,000 UUIDs and trying to update a non-functioning
memcache instance? Would we still want to try a 'reasonable' number of
times, to allow memcache to recover?

Cheers

Tom

Tim Graham

unread,

Feb 11, 2011, 12:51:33 PM2/11/11

to Django developers

Tom,

That definitely seems reasonable to me. Seems like the risk of key
collision is low enough that we shouldn't have to loop at all? My
only concern regarding uui4 is the risk of collision with multiple web
servers. From what I've read uuid1 would prevent that. That being
said, I'm not an expert on this, so if there's anyone who could weigh
in and ensure we're not overlooking something that would be a big
plus.

Tim

On Feb 10, 3:59 pm, Tom Evans <tevans...@googlemail.com> wrote:

Tom Evans

unread,

Feb 11, 2011, 2:09:51 PM2/11/11

to django-d...@googlegroups.com

On Fri, Feb 11, 2011 at 5:51 PM, Tim Graham <timog...@gmail.com> wrote:
> Tom,
>
> That definitely seems reasonable to me. Seems like the risk of key
> collision is low enough that we shouldn't have to loop at all? My
> only concern regarding uui4 is the risk of collision with multiple web
> servers. From what I've read uuid1 would prevent that. That being
> said, I'm not an expert on this, so if there's anyone who could weigh
> in and ensure we're not overlooking something that would be a big
> plus.
>
> Tim
>

UUID-1 is simply the concatenation of (one of) the computer's MAC
addresses and a time counter from a known point. Since the format is
known, if you know one UUID-1, you can derive the MAC address, and it
is then trivial to predict the UUIDs that would be generated. Probably
a bad idea for session keys! The purpose for generating identifiable
UUIDs like these are to allow an organization to generate UUIDs that
clearly belong to them, and can be "well known".
Versions 2,3 and 5 are built using similar algorithms, and so aren't
particularly opaque. Only version 4 UUIDs contain random data. Each
UUID contains its version number, so a UUID-4 won't ever collide with
a UUID-1.

I'll update the patch on the ticket to remove the loop (which was the
smelly bit of the code anyway!)

Cheers

Tom

Paul McMillan

unread,

Feb 11, 2011, 9:31:38 PM2/11/11

to django-d...@googlegroups.com

Do we absolutely have proof that the original problem is caused for
sure by session key collisions? The assertion here is that we're
generating 10000 md5s and that NONE of those is unique and so the
function is timing out.

This is preposterous.

If this were actually the case, I could generate my own 10,000 md5s
and log into each of those users sessions as an attacker. Sessions
expire in 15 days. MD5 provides a reasonable distribution across the
namespace. If you've generated 10k md5s and NONE of them is unique,
you've got the md5 namespace pretty well covered (or you're using
really bad random input data, which we are not), meaning your session
key database has (conservatively) more than 10^20 petabytes of data.

Right.

The alternative explanation is that the cache backend is timing out.
The 10k loop fix basically masked that issue by trying a bunch of
times till it got a response. The fix for this issue is going to lie
in improving behavior when the cache doesn't respond properly, not in
fixing the key collision problem.

Additionally, UUID4's are only very probably unique. You are proposing
to move to a 103 bit random value (after excluding the fixed bits)
instead of the existing 512 bit value.

I predict that a patch which removes the 10k loop will make the
problem worse, with or without UUIDs.

Sorry to be blunt, but lets tackle the real problem here, rather than
making things harder for ourselves.

Best,
-Paul

Paul McMillan

unread,

Feb 11, 2011, 10:15:26 PM2/11/11

to django-d...@googlegroups.com

Sorry, I stuffed up some of the numbers.

MD5 is 128 bits equivalent value. UUID4 is 113. A table with enough
content to cause collisions for 10k input values would probably be in
the exabyte range, but the point still stands. Cache timeouts are
causing this problem.

-Paul

Tom Evans

unread,

Feb 14, 2011, 11:34:29 AM2/14/11

to django-d...@googlegroups.com

An md5 hash of pure random data might have 128 bits of purely random
data, but our md5 hashes aren't formulated like that. I don't know
what difference that makes...

UUID-4, as per the python implementation, is 122 bits of randomness, 4
bits are used for version, 2 for variant - I'm not sure where you are
getting 103/113 bits from.

I still haven't updated the patch to remove the code smell that is the
10k loop testing whether your cache is available. From what you've
said, it seems unlikely that md5 collisions are occurring in that
number, so I'll update it to remove that loop, and change the error
message to indicate that it is the cache that is failing.