I recently ran into this problem on a production server, and it was causing my users to lose their sessions.
Many browsers will happily post UTF-8 encoded data in cookie strings. This will result in cookie data such as this, which I captured from my nginx log:
'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'
When Django tries to parse this cookie input, it will lose the cookies from "bad" onwards:
Python 2.7.10 (default, Sep 23 2015, 04:34:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>> django.VERSION
(1, 9, 2, 'final', 0)
>>> from django.http.cookie import parse_cookie
>>> cookie = 'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'
>>> print cookie
Good=1234;bad=清風;sessionid=abc
>>> parse_cookie(cookie)
django/http/cookie.py:92: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if cookie == '':
{'Good': '1234'}
Python 3.5.1 (default, Dec 26 2015, 18:11:22)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>> django.VERSION
(1, 9, 2, 'final', 0)
>>> from django.http.cookie import parse_cookie
>>> cookie = 'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'
>>> print(cookie)
Good=1234;bad=清風;sessionid=abc
>>> parse_cookie(cookie)
{'Good': '1234'}
>>> cookie = b'Good=1234;bad=\xe6\xb8\x85\xe9\xa2\xa8;sessionid=abc'.decode('utf-8')
>>> print(cookie)
Good=1234;bad=清風;sessionid=abc
>>> parse_cookie(cookie)
{'Good': '1234'}
Unfortunately in my case, my server is running as a sub-domain, and some other server in the domain has set a domain cookie with UTF-8 characters in it. Since this other server is often going to be a gateway to my server, this is causing problems for me as I'm also getting hit with those cookies, and Django is losing everything after the illegal characters.
I have resolved this in my instance as follows in django/http/cookie.py:
def parse_cookie(cookie):
cookie = re.sub('[^\x20-\x7e]+', 'X', cookie)
...
This limits the cookie characters to the printable lower ASCII characters. I consider anything else to be a bad use of cookies, and since I have control of my own cookies I'm not worried about this.
I'm not sure if this would be considered such an edge case that it's not worthy of a patch, but it might also be considered a DoS vector.
Interested to hear other thoughts or ideas for a better solution.
Will