Default session data serializer doesn't support extended data types

423 views
Skip to first unread message

Davide Rizzo

unread,
Sep 19, 2013, 10:46:44 AM9/19/13
to django-d...@googlegroups.com
#20922 introduced the option to choose a custom session data serializer. The default option is to use the new JSONSerializer starting from 1.6, since using pickle would lead to a remote code execution vulnerability when session data is stored in cookies.

While this can be considered a sensible security choice, it becomes inconvenient as the JSON encoder used by JSONSerializer is not the same used elsewhere in Django, as it only support basic data types: string, int/floats, booleans, nested dicts and lists, None.

The inconvenience is breaking compatibility with all third party apps that rely on storing extended data types (such as those supported by DjangoJSONEncoder) with the default settings. Properly serializing datetime (possibly tz-aware) can be hard, and changing the default puts the burden on third party apps coders.

They would have the option to either add two complexity layers (properly serializing/deserializing datetime objects, and not breaking compatibility with the previous versions of the same app), or to break compatibility with Django default settings.

As an example of commonly used data types that can't be stored anymore with default settings:
  • datetime, timedelta objects (supported by DjangoJSONEncoder)
  • decimal objects (supported by DjangoJSONEncoder)
  • arbitrary binary strings
  • Geometry objects
I think the option of reverting the default to pickle should be also considered.

[[I originally posted this as #21124, where it was closed as not a bug. What follows is the response I got, for reference:

by timo (core developer)

Thanks for your thoughts. I think most of the points you've raised were discussed during the implementation of this, either on the ticket (#20922) or on the linked pull request (or the documentation itself). Could you please take a look at the discussion there if you haven't? If after reading that you still have disagreements, please raise the issue on django-developers rather than this ticket tracker. Thanks!

Suggestions for documentations edits or additions would also be welcome.

p.s. To address one of your points, one of the decisions was indeed to put the burden on third party app coders to serialize session data as simple data types like strings which would be compatible with JSON. We made this change to contrib.messages for example.

]]

Tim Graham

unread,
Sep 19, 2013, 12:23:56 PM9/19/13
to django-d...@googlegroups.com
Hi Davide,

Did you take a look at the design decisions as described in the ticket and pull request? We made these decisions in order to push the community toward developing more secure apps and the transition isn't expected to be painless. We had several core developers review the patch and discuss the decisions. It seems like you are asking us to reconsider, but you haven't presented anything we haven't discussed. Would you like me to address anything specific?

Tim 

Florian Apolloner

unread,
Sep 19, 2013, 12:35:40 PM9/19/13
to django-d...@googlegroups.com
Hi Davide,


On Thursday, September 19, 2013 4:46:44 PM UTC+2, Davide Rizzo wrote:
The inconvenience is breaking compatibility with all third party apps that rely on storing extended data types (such as those supported by DjangoJSONEncoder) with the default settings. Properly serializing datetime (possibly tz-aware) can be hard, and changing the default puts the burden on third party apps coders.

In all fairness, we didn't just break third party apps, we broke our code too… We always said that security will trump inconvenience, the only inconvenience I can see here is that users have to switch to the pickle backend if they use old third party apps (although only for those which use the session at all…).

They would have the option to either add two complexity layers (properly serializing/deserializing datetime objects, and not breaking compatibility with the previous versions of the same app), or to break compatibility with Django default settings.

We added those two complexity layers for our code too, it was merely one extra line to ensure backwards compatibility.
 
I think the option of reverting the default to pickle should be also considered.

No, this has been discussed and security will trump minor inconvenience.

Florian

Davide Rizzo

unread,
Sep 20, 2013, 4:24:00 AM9/20/13
to django-d...@googlegroups.com
Hi Tim,

Indeed I looked at the comments in the ticket and pull request. I don't feel like these changes provide a significant security improvement:
- using JSONSerializer over PickleSerializer is only relevant if you are using any non-default configuration that allows session data to be tampered. Incidentally, the whole contrib.auth (project-default) mechanism is compromised in this case anyway, even if you use JSONSerializer.
- using the raw JSONEncoder by default is not offering any significant security advantage over using an extended encoder. I feel like it's going to discourage coders to use JSONSerializer at all.

Pushing the community, as you say it, is a decision with a rather big impact and carries some responsibility. I understand this has been discussed and I respect that the core developers agreed on a decision. But I believe this decision didn't give a realistic weight to the impact on the community.

Davide

Florian Apolloner

unread,
Sep 20, 2013, 8:55:33 AM9/20/13
to django-d...@googlegroups.com


On Friday, September 20, 2013 10:24:00 AM UTC+2, Davide Rizzo wrote:
- using the raw JSONEncoder by default is not offering any significant security advantage over using an extended encoder. I feel like it's going to discourage coders to use JSONSerializer at all.

Btw could it be that you are mixing out Encoder and Serializer? Or are they supposed to be the same, if not please add import names, so one knows which and from where you mean. Personally I don't see any improvement in using an extended encoder -- in the end it's just more work for us and people complaining why we don't support their $magical class. In most if not all cases storing full objects in the session is wrong; what we could have supported would be timestamps, but those are storable as utc seconds easily enough… Why would the current status be discouraging anyone?

But I believe this decision didn't give a realistic weight to the impact on the community.

I still fail to see the issue; 3rd party projects have to adapt, this is why we have deprecation paths and accelerated paths for security related stuff. The net result is the same, at some point you will be forced to use a new feature (although in this case a bit sooner, but again: security and you can reenable the old default); eg newforms->forms, Model.Admin class to newadmin etc. And those last two literally affected everyone, that's what I'd call impact; the current impact is __only__ on people which do use the session (granted that's probably everyone) and did put complex data into the session. This pretty much narrows the list down to a percentage of the whole usergroup, when I audited my projects for this switch I had a few datetimes saved, although most of them saved them as integers already (which makes sense from a performance standpoint).

So all in all I think 3rd party authors will adapt without any problems since they are most of the time not affected, or they'll have to change a few lines of code… Can you provide a big list of 3rd party apps storing Models etc in the session?

Regards,
Florian

Davide Rizzo

unread,
Sep 20, 2013, 9:52:30 AM9/20/13
to django-d...@googlegroups.com


On Friday, September 20, 2013 2:55:33 PM UTC+2, Florian Apolloner wrote:

Btw could it be that you are mixing out Encoder and Serializer?

No, I say Serializer when I mean... well, a serializer, as specified by SESSION_SERIALIZER. I say Encoder when I mean the Encoder class used by JSONSerializer. It could be the one built into the json module, or it could be a derived one, such as DjangoJSONEncoder here https://github.com/django/django/blob/1.6b4/django/core/serializers/json.py#L79

Personally I don't see any improvement in using an extended encoder -- in the end it's just more work for us and people complaining why we don't support their $magical class. In most if not all cases storing full objects in the session is wrong; what we could have supported would be timestamps, but those are storable as utc seconds easily enough… Why would the current status be discouraging anyone?

Easily enough is still more complicated than current solution storing datetime objects, possibly with tzinfo. It's discouraging to the users because, after you find out what the problem is, it's easier to revert to PickleSerializer (with no security compromise, if you don't use SignedCookies) than to circumvent the extra difficulties of adapting one's local status data to JSON. And third party apps devs will have the choice whether to deal with the same difficulties (without breaking compatibility with already existing projects) or just advice users to switch to PickleSerializer. It's just one line in your settings.py and has no inconvenience.

I can see what you mean about people complaining. But it would be mitigating the migration issues implied by switching to JSONSerializer. You are trading a few vocal complaints for a generalized struggle.

[...] that's what I'd call impact; the current impact is __only__ on [...]

The examples you mentioned (forms, admin) were significant improvements themselves, with big benefits to Django users. Switching the default session serializer to JSONSerializer is providing no benefit to any user. It's addressing a security problem (which is already documented) that only a few users have (can you provide a big^H^H^H list of users who use signed cookies but can't add a SESSION_SERIALIZER setting next to their SESSION_COOKIE_HTTPONLY and SESSION_COOKIE_SECURE settings?).

I don't want to lose focus on my proposal. It's either to change the default to PickleSerializer or to mitigate the JSONSerializer issue supporting these data types:
    • datetime, timedelta objects (supported by DjangoJSONEncoder)
    • decimal objects (supported by DjangoJSONEncoder)
    • arbitrary binary strings (b'\xe8')
    • Geometry objects
    This doesn't include Models or other objects which don't have any obvious serialization format.

    As much as I'd hate to see (de)serialization code in views, I wouldn't consider a third option: a mechanism to provide custom serialization for other object types. I see it's already been discarded.

    Davide

    Donald Stufft

    unread,
    Sep 20, 2013, 9:59:47 AM9/20/13
    to django-d...@googlegroups.com

    On Sep 20, 2013, at 9:52 AM, Davide Rizzo <sor...@gmail.com> wrote:

    The examples you mentioned (forms, admin) were significant improvements themselves, with big benefits to Django users. Switching the default session serializer to JSONSerializer is providing no benefit to any user. It's addressing a security problem (which is already documented) that only a few users have (can you provide a big^H^H^H list of users who use signed cookies but can't add a SESSION_SERIALIZER setting next to their SESSION_COOKIE_HTTPONLY and SESSION_COOKIE_SECURE settings?).

    This isn't exactly true. While it *is* true that the biggest risk to using pickle as the session serialization is if you're using the cookie storage engine, it is *not* true that it is the only place that it can be attacked. Particularly it can be used to daisy chain from a compromised database, memcache, redis, etc host (wherever you're storing your sessions) to exploiting the app server.

    A basic tenant in securing systems is that you make each piece of the system responsible for it's own security and you don't have it depend on the security of another system. Moving away from pickle as the default serialization engine ensures this property for the storage of session data.

    -----------------
    Donald Stufft
    PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

    signature.asc

    Florian Apolloner

    unread,
    Sep 20, 2013, 10:35:07 AM9/20/13
    to django-d...@googlegroups.com


    On Friday, September 20, 2013 3:52:30 PM UTC+2, Davide Rizzo wrote:
    On Friday, September 20, 2013 2:55:33 PM UTC+2, Florian Apolloner wrote:
    Btw could it be that you are mixing out Encoder and Serializer?

    No, I say Serializer when I mean... well, a serializer, as specified by SESSION_SERIALIZER. I say Encoder when I mean the Encoder class used by JSONSerializer. It could be the one built into the json module, or it could be a derived one, such as DjangoJSONEncoder here https://github.com/django/django/blob/1.6b4/django/core/serializers/json.py#L79

    Ok, I though as much, but wanted to be sure. 

    It's either to change the default to PickleSerializer

    Most likely not gonna happen since we choose secure by default.

    or to mitigate the JSONSerializer issue supporting these data types:
    • datetime, timedelta objects (supported by DjangoJSONEncoder)
    • decimal objects (supported by DjangoJSONEncoder)
    • arbitrary binary strings (b'\xe8')
    • Geometry objects
    I could live with using DjangoJSONEncoder, but support for arbitrary binary strings and geometry objects would have to come via that instead of beeing specialized for sessions. That said, it's quite late for 1.6, so the patch would have to be rocksolid and would require agreement of the rest of the core devs.
     

    Curtis Maloney

    unread,
    Sep 20, 2013, 10:40:39 AM9/20/13
    to django-d...@googlegroups.com
    I talked with the OP [or someone who talks a _lot_ like the OP:)] on IRC about this issue before recommending they open a ticket... and aside from anything else discussed, since someone already saw fit to include an extended JSONEncoder class in core/serializers, why doesn't the session machinery re-use it?

    All it does is add support for date, time, datetime and Decimal.

    And the answer is: there's no way for a matching Decoder to know when to decode any of these types, since there's no schema available.

    The only "simple" alternative that comes to mind is something like MsgPack, with a bunch of pre-defined Extension types.

    As far as the security benefits, I think Donald has nailed it -- no part of the system should base its security around relying on the integrity of any other part.

    --
    Curtis

    Florian Apolloner

    unread,
    Sep 20, 2013, 12:10:20 PM9/20/13
    to django-d...@googlegroups.com


    On Friday, September 20, 2013 4:40:39 PM UTC+2, Curtis Maloney wrote:
    And the answer is: there's no way for a matching Decoder to know when to decode any of these types, since there's no schema available.

    Good point, it would be doable by serializing into something like '{_type: datetime, value: 2012…}' but I doubt that's really sensible.

    Davide Rizzo

    unread,
    Sep 21, 2013, 3:43:01 AM9/21/13
    to django-d...@googlegroups.com
    On Friday, September 20, 2013 3:59:47 PM UTC+2, Donald Stufft wrote:

    A basic tenant in securing systems is that you make each piece of the system responsible for it's own security and you don't have it depend on the security of another system. Moving away from pickle as the default serialization engine ensures this property for the storage of session data.
     
    Thank you for the response, Donald. This is a point of view I didn't consider. I tend to think of the "backend platform" as unitary and data-centered, implying that compromising the security of the data servers will compromise the entire business. What I didn't consider is that this is just MY point of view. Your point makes perfect sense, the decision coming from a web framework.

    Davide

    Davide Rizzo

    unread,
    Sep 21, 2013, 6:18:35 AM9/21/13
    to django-d...@googlegroups.com
    On Friday, September 20, 2013 4:40:39 PM UTC+2, Curtis Maloney wrote:
    I talked with the OP [or someone who talks a _lot_ like the OP:)]

    Oh, I should meet this animal–pardon, this guy. :)

    And the answer is: there's no way for a matching Decoder to know when to decode any of these types, since there's no schema available.

    One of the solutions, besides storing datatype within JSON (which can still be sensible, given the extra info was already stored with the old serializer, and would allow for an extension mechanism for developers) is to just match string formats. The case for a false positive matching "_datetime(2013-09-21T09:45+1)" does exist, but will be in a minority compared to those who store datetime.

    I guess there is no chance of this getting into 1.6 as a default, right?

    Davide
    Reply all
    Reply to author
    Forward
    0 new messages