Possible session key creation regression in 1.4

593 views
Skip to first unread message

Tom Evans

unread,
May 16, 2012, 11:09:45 AM5/16/12
to django-d...@googlegroups.com
Hi all

One of our sites was recently upgraded to 1.4 and now some previously
working code is failing.

The code tries to access and utilize the session id from the user, but
if it tries to do this on the request the session is created on, the
session id is None. The equivalent session object in 1.3 always
presents a valid session id.

1.3.1:

>>> import django
>>> from django.http import HttpRequest
>>> from django.conf import settings
>>> from django.utils.importlib import import_module
>>>
>>> engine = import_module(settings.SESSION_ENGINE)
>>> req = HttpRequest()
>>> req.session = engine.SessionStore(None)
>>> print django.VERSION
(1, 3, 1, 'final', 0)
>>> print req.session.session_key
a8b409a6b29ac5a59022350445f8bfa4


1.4:

>>> import django
>>> from django.http import HttpRequest
>>> from django.conf import settings
>>> from django.utils.importlib import import_module
>>>
>>> engine = import_module(settings.SESSION_ENGINE)
>>> req = HttpRequest()
>>> req.session = engine.SessionStore(None)
>>> print django.VERSION
(1, 4, 0, 'final', 0)
>>> print req.session.session_key
None


This is using the DB backend, but perusing the changes to
contrib.session, I think all backends will be similarly affected.

So, is the session key being available part of the API, or is relying
on the session key existing incorrect?

Cheers

Tom

Aymeric Augustin

unread,
May 16, 2012, 11:38:42 AM5/16/12
to django-d...@googlegroups.com
2012/5/16 Tom Evans <teva...@googlemail.com>:
> So, is the session key being available part of the API, or is relying
> on the session key existing incorrect?

Hi Tom,

Accessing the session key before saving the session is incorrect.

Previously, it used to return something, but that something could be
random data instead of the actual session key. Now it returns None,
making the error more obvious.

For more information, see https://code.djangoproject.com/ticket/11555

Best regards,

--
Aymeric.

Tom Evans

unread,
May 16, 2012, 1:22:59 PM5/16/12
to django-d...@googlegroups.com
Hi Aymeric

Looking at the 1.3 code for sessions, I can understand why this
changed. If the generated key collides with a pre-existing session,
new session keys are generated until it does not, and the session key
is changed from the initial one.

However, I do not understand the logic behind the change that fixes this issue.

When we create a session object for a user who does not have a session
id, it seems logical to me that at that point we are creating the
session - the user does not have a session, all users have sessions,
this user must be created a session.

Instead, the session is not created until after the request has been
finished, I presume in order to only create a session when something
is written to it, and to avoid saving the session twice in the initial
request.

Even with this optimization, if I access the session_key of a session
object, it should be apparent that I want the session id of the
current session. If that means that the session must be saved in order
to determine what that session id is, then that is what should happen.
I believe in POLA, and to me it is astonishing that a valid session
does not have an session id when queried, even temporarily.

I find it strange this change was neither documented nor mentioned in
the release notes. In the ticket, you said:

"From the user's point of view, returning None instead of a wrong
session key is probably better, although it might break code that
relies on getting a string." (actually you return None instead of a
correct session id, except when that session id collides with an
existing one)

and

"That is a rather big change, so I'd like to hear from someone else
before working on it."

You had clearly grasped that this was a change that could easily break
peoples code. I'd like to know why that information was not relayed
into the release notes, or anything about the volatility of session
keys added to the docs.

Even though I don't like the API, if it had been in the relnotes, I
wouldn't have spent half a day hunting through the relnotes, docs and
finally diffs to determine what had changed and produce a test case, I
would simply have been able to update the places that refer to session
id to ensure that they existed before use.

Cheers

Tom

PS: I sometimes am accused of coming of as personally confrontational
or critical. This is _absolutely_ not what I'm trying to say here!
Something like this should have been included in the relnotes, so
perhaps a better process can be used in future to ensure that changes
like this are properly documented. To Aymeric's credit, he had
realised when he changed this that it could/would affect people, his
knowledge should have been included in the relnotes!

Aymeric Augustin

unread,
May 16, 2012, 2:49:23 PM5/16/12
to django-d...@googlegroups.com
Hi Tom,

On 16 mai 2012, at 19:22, Tom Evans wrote:
> Even with this optimization, if I access the session_key of a session
> object, it should be apparent that I want the session id of the
> current session. If that means that the session must be saved in order
> to determine what that session id is, then that is what should happen.

When you access the id of an unsaved model, it's apparent that you want the id of the model, but the model isn't saved and you get None.

> I believe in POLA, and to me it is astonishing that a valid session
> does not have an session id when queried, even temporarily.

In my opinion, the current behavior obeys the principle of least surprise:
1) Saving the session as a side effect of accessing its session_key attribute would be very surprising. And generally there's no way to get the session_key without saving the session — you need an atomic create-if-not-exist operation to avoid collisions.
2) As I said above, sessions behave exactly like Django models in this situation.

> You had clearly grasped that this was a change that could easily break
> peoples code. I'd like to know why that information was not relayed
> into the release notes, or anything about the volatility of session
> keys added to the docs.
>

> Even though I don't like the API, if it had been in the relnotes, I
> wouldn't have spent half a day hunting through the relnotes, docs and
> finally diffs to determine what had changed and produce a test case, I
> would simply have been able to update the places that refer to session
> id to ensure that they existed before use.


I'm sorry that you wasted so much time on this problem and I understand your frustration. I admit I underestimated how many people relied on this attribute being a string: this is the second thread on this topic.

The reason why I considered this a bug fix and not a backwards-incompatible change is simple: session_key is an internal API, and internal APIs are subject to change without notice. Well, since then, I added a bit more information in the docs, so it's drifting towards becoming a public API. Maybe we should document its behavior fully and make it a public, stable API.

When I committed the fix, I couldn't find many legitimate uses for accessing session_key directly — except for writing a custom session backend, but when you start doing this sort of things, you're generally aware of the risks.

Best regards,

--
Aymeric.

Tom Evans

unread,
May 18, 2012, 5:51:59 AM5/18/12
to django-d...@googlegroups.com
On Wed, May 16, 2012 at 4:38 PM, Aymeric Augustin
<aymeric....@polytechnique.org> wrote:
> 2012/5/16 Tom Evans <teva...@googlemail.com>:
>> So, is the session key being available part of the API, or is relying
>> on the session key existing incorrect?
>
> Hi Tom,
>
> Accessing the session key before saving the session is incorrect.
>

Accessing the session key before saving the session is incorrect, but
there is nothing in the session API to determine if a session is
saved.

Accessing the session key is a documented feature of sessions.

I don't see a good way to support 1.3 and 1.4, or in 1.3 to work
around the bug that this fixes, without explicitly saving the session
object each time prior to accessing the session key, which is not a
particularly clever way of doing things. This API needs to be looked
at.

Cheers

Tom

Tom Evans

unread,
May 18, 2012, 7:01:56 AM5/18/12
to django-d...@googlegroups.com
Further to this, there is a potential race condition calling
session.save() on a unsaved session using the DB backend.

When the session key is generated, _get_new_session_key() is called,
which generates a session key until one is found that doesn't exist in
the backend store. save() then tries one time to store this session in
the database. If two sessions are being saved simultaneously, and
generate the same (unused) session id, then one of the session saves
will fail.

This behaviour contrasts with create(), which will keep cycling
session ids until it manages to persist one in the backend.

So to sum up:

You can't look at session.session_key without the session having being saved.
You can't determine whether a session has been saved or not.
Saving a session has a race condition, failing with an IntegrityError
in case of key collision.

Cheers

Tom

Aymeric Augustin

unread,
May 18, 2012, 8:38:06 AM5/18/12
to django-d...@googlegroups.com, django-d...@googlegroups.com
Le 18 mai 2012 à 11:51, Tom Evans <teva...@googlemail.com> a écrit :

> On Wed, May 16, 2012 at 4:38 PM, Aymeric Augustin
> <aymeric....@polytechnique.org> wrote:
>> 2012/5/16 Tom Evans <teva...@googlemail.com>:
>>> So, is the session key being available part of the API, or is relying
>>> on the session key existing incorrect?
>>
>> Hi Tom,
>>
>> Accessing the session key before saving the session is incorrect.
>>
>
> Accessing the session key before saving the session is incorrect, but
> there is nothing in the session API to determine if a session is
> saved.

if session.session_key == None:
...

> I don't see a good way to support 1.3 and 1.4 or in 1.3 to work around the bug that this fixes

If I understand correctly, you're writing a pluggable app and you want to make it compatible with Django 1.3 and 1.4.

You didn't know this bug existed in 1.3, so I don't understand your sudden urge to work around it in your application. Just keep your current code as is and if your users complain, show them that it's a bug in Django and tell them that they should upgrade.

> without explicitly saving the session
> object each time prior to accessing the session key, which is not a
> particularly clever way of doing things. This API needs to be looked at.

The following sounds perfectly simple and reasonable to me:

if session.session_key == None:
session.save()

The if clause will match only under Django 1.4, since under 1.3 session_key isn't None when it should. So you support automatically both versions.

I'm sorry, but it's hard to read your criticism in a constructive way when the solution is obvious and doesn't need any changes in Django's APIs.

You raised several related points in your next email; could you create tickets on Trac so we can verify and fix these issues?

Thanks!

--
Aymeric (mobile).

Tom Evans

unread,
May 18, 2012, 10:29:15 AM5/18/12
to django-d...@googlegroups.com
On Fri, May 18, 2012 at 1:38 PM, Aymeric Augustin
<aymeric.au...@polytechnique.org> wrote:
> Le 18 mai 2012 à 11:51, Tom Evans <teva...@googlemail.com> a écrit :
>
>> On Wed, May 16, 2012 at 4:38 PM, Aymeric Augustin
>> <aymeric....@polytechnique.org> wrote:
>>> 2012/5/16 Tom Evans <teva...@googlemail.com>:
>>>> So, is the session key being available part of the API, or is relying
>>>> on the session key existing incorrect?
>>>
>>> Hi Tom,
>>>
>>> Accessing the session key before saving the session is incorrect.
>>>
>>
>> Accessing the session key before saving the session is incorrect, but
>> there is nothing in the session API to determine if a session is
>> saved.
>
> if session.session_key == None:
>    ...
>
>> I don't see a good way to support 1.3 and 1.4 or in 1.3 to work around the bug that this fixes
>
> If I understand correctly, you're writing a pluggable app and you want to make it compatible with Django 1.3 and 1.4.
>
> You didn't know this bug existed in 1.3, so I don't understand your sudden urge to work around it in your application. Just keep your current code as is and if your users complain, show them that it's a bug in Django and tell them that they should upgrade.

That may be fine for open source code, it's not appropriate here. I
need to support my features in Django 1.2 and upwards. We have seen
strange things where the session id that we extract is not correct,
and this bug, which I was not aware of, explains that issue. Therefore
I do need to work around it, now that I am aware of it.

Saying to my users, "oh, and now I don't support the version of Django
you use any more, drop all your other projects and upgrade now" is not
acceptable.

>
>> without explicitly saving the session
>> object each time prior to accessing the session key, which is not a
>> particularly clever way of doing things. This API needs to be looked at.
>
> The following sounds perfectly simple and reasonable to me:
>
> if session.session_key == None:
>    session.save()
>
> The if clause will match only under Django 1.4, since under 1.3 session_key isn't None when it should. So you support automatically both versions.

Grrr - session_key should never be None. This should be part of the
explicit contract of the session API - if I'm looking at the
session_key, I want to know what is OR WILL BE the session id. If the
session needs to be saved in order to generate the key, it should be
saved by the backend.

Or to put it another way; my code is never interested in a None
session_key. A valid session never has a key of None. Therefore I now
have to put code in to explicitly ensure that I have a valid
session_key

The bug that was fixed was that accessing session_key before the
session was saved could potentially give you a different session_key
than the one which eventually ends up being used. The fix was to make
the session_key property return None when unsaved - changing the API -
rather than at that point determining what session_key should be used,
which would have kept the same API.

Due to the API changing, and Django not merging bug fixes back through
branches, I now know that I have an issue with 1.3 and below clients
that must be accounted for, and that I must ensure the session is
saved before accessing it, as this ensures in both 1.3 and 1.4 that
the session.session_key is the key for the users session.

This means that each and every place I access session_key in code that
must work in 1.3 and 1.4, I must put in place code like this:

# We cannot access session.session_key before the session is saved
# Django provides no way of determining if the session has been saved
# so we must always save it here. However, there is a race in the
# session creation that may cause it to fail on key collision.
while True:
try:
request.session.save()
except IntegrityError:
# In <=1.4, session.save() does not generate a new session key
# if it conflicts, therefore we must do it for Django.
if django.VERSION[0] < 2 and django.VERSION[1] < 5:
# There is no API to request a new session key without churning
# the session. Use internal API.
request.session._session_key = request.session._get_new_session_key()
continue
else:
break
session_identifier = SSOIdentifier(identifier=request.session.session_key)

Yeah, that API totally rocks. Totally better than determining what the
session_key is when someone wants to know what the session_key is and
we haven't saved the session yet.

>
> I'm sorry, but it's hard to read your criticism in a constructive way when the solution is obvious and doesn't need any changes in Django's APIs.
>
> You raised several related points in your next email; could you create tickets on Trac so we can verify and fix these issues?
>
> Thanks!

It was only one point. Ticket 18344.

Cheers

Tom

Aymeric Augustin

unread,
May 18, 2012, 2:46:41 PM5/18/12
to django-d...@googlegroups.com, django-d...@googlegroups.com
Le 18 mai 2012 à 16:29, Tom Evans <teva...@googlemail.com> a écrit :

> That may be fine for open source code, it's not appropriate here.

Tom,

I'm sorry, I can't follow your logic here.

Also, given my experience in the area of sessions, I don't believe I could translate your requirements into a consistent, backwards compatible and secure implementation (wrt. session fixation, etc.) Of course that doesn't mean it's impossible.

At this point, I think you should write the code for a better API for sessions. This will give us a solid basis for discussing. It is certainly the best way to move forwards on this topic.

Best regards,

--
Aymeric.

Reply all
Reply to author
Forward
0 new messages