Handling cookies that contain illegal values

116 views
Skip to first unread message

William Harris

unread,
Feb 4, 2016, 1:53:54 PM2/4/16
to Django users
I recently ran into this problem on a production server, and it was causing my users to lose their sessions.

Many browsers will happily post UTF-8 encoded data in cookie strings. This will result in cookie data such as this, which I captured from my nginx log:

'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'

When Django tries to parse this cookie input, it will lose the cookies from "bad" onwards:

Python 2.7.10 (default, Sep 23 2015, 04:34:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>> django.VERSION
(1, 9, 2, 'final', 0)
>>> from django.http.cookie import parse_cookie
>>> cookie = 'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'
>>> print cookie
Good=1234;bad=清風;sessionid=abc
>>> parse_cookie(cookie)
django/http/cookie.py:92: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if cookie == '':
{'Good': '1234'}


Python 3.5.1 (default, Dec 26 2015, 18:11:22) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>> django.VERSION
(1, 9, 2, 'final', 0)
>>> from django.http.cookie import parse_cookie
>>> cookie = 'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'
>>> print(cookie)
Good=1234;bad=清風;sessionid=abc
>>> parse_cookie(cookie)
{'Good': '1234'}
>>> cookie = b'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'.decode('utf-8')
>>> print(cookie)
Good=1234;bad=清風;sessionid=abc
>>> parse_cookie(cookie)
{'Good': '1234'}

This link on SO has an interesting discussion about encoding in cookies: http://stackoverflow.com/questions/1969232/allowed-characters-in-cookies. The take away for me was this statement: "so in practice you cannot use non-ASCII characters in cookies at all".

Unfortunately in my case, my server is running as a sub-domain, and some other server in the domain has set a domain cookie with UTF-8 characters in it. Since this other server is often going to be a gateway to my server, this is causing problems for me as I'm also getting hit with those cookies, and Django is losing everything after the illegal characters.

I have resolved this in my instance as follows in django/http/cookie.py:

def parse_cookie(cookie):
    cookie = re.sub('[^\x20-\x7e]+', 'X', cookie)
    ...

This limits the cookie characters to the printable lower ASCII characters. I consider anything else to be a bad use of cookies, and since I have control of my own cookies I'm not worried about this.

I'm not sure if this would be considered such an edge case that it's not worthy of a patch, but it might also be considered a DoS vector.

Interested to hear other thoughts or ideas for a better solution.

Will

Daniel Chimeno

unread,
Feb 4, 2016, 6:31:30 PM2/4/16
to Django users
Hello, 

I have resolved this in my instance as follows in django/http/cookie.py:

def parse_cookie(cookie):
    cookie = re.sub('[^\x20-\x7e]+', 'X', cookie)
    ...



It would be preferable to write that code in a middleware than in the Django code itself.
Before the middleware that handles the cookie (I guess it would be Session), you can *sanitize* that cookie.

Hope it helps.
 

Will Harris

unread,
Feb 5, 2016, 3:13:14 AM2/5/16
to Django users
Hey Daniel,

Thanks for the reply. Unfortunately doing this in a custom middleware is not an option, as the this processing needs to take place at a very low level, at the point where the Request object is being built. By the time the request is passed in to the middleware layers for processing, the cookies would already have been lost.

Will

Tim Graham

unread,
Feb 5, 2016, 7:52:34 AM2/5/16
to Django users
This is caused by a security fix in Python (which Django uses for cookie parsing). I think the issue can be fixed without cause security problems but I'm not sure. Please follow https://code.djangoproject.com/ticket/26158 and related Python tickets.

Will Harris

unread,
Feb 5, 2016, 9:25:05 AM2/5/16
to Django users
Thanks Tim, fascinating. At least I can tell the big boss the problem was "caused" by the BDFL ;-)

Will
Reply all
Reply to author
Forward
0 new messages