[Django] #31344: Django raises UnicodeEncodeError when there is a cookie with a non-latin character

14 views
Skip to first unread message

Django

unread,
Mar 5, 2020, 10:00:13 AM3/5/20
to django-...@googlegroups.com
#31344: Django raises UnicodeEncodeError when there is a cookie with a non-latin
character
-----------------------------------------+------------------------
Reporter: ozgurakcali | Owner: nobody
Type: Bug | Status: new
Component: HTTP handling | Version: 2.2
Severity: Normal | Keywords: cookie
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
I know non-latin characters are not suggested to be used in cookies, but
when one such cookie is sent with a request, django raises a
UnicodeEncodeError. It is raised on get_bytes_from_wsgi method of wsgi.py,
on the following line:

return value.encode('iso-8859-1')

Not sure how this should be handled, 'ignore' could bu supplied as the
second parameter to encode method, but that would change the value of the
cookie silently, and I'm not sure if that would be a desired behavior.

--
Ticket URL: <https://code.djangoproject.com/ticket/31344>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 6, 2020, 3:43:27 AM3/6/20
to django-...@googlegroups.com
#31344: Django raises UnicodeEncodeError when there is a cookie with a non-latin
character.
-------------------------------+--------------------------------------
Reporter: Ozgur Akcali | Owner: nobody
Type: Bug | Status: closed

Component: HTTP handling | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: cookie | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by felixxm):

* cc: Florian Apolloner (added)
* status: new => closed
* resolution: => needsinfo


Comment:

Non-ASCII values in the WSGI environ are arbitrarily decoded with
ISO-8859-1, that's why Django uses this encoding (see also
[https://www.python.org/dev/peps/pep-0333/ PEP 333]). You shouldn't get a
value in other encodings. Please feel-free to reopen this ticket if you
can provide a sample project to reproduce yours issue.

--
Ticket URL: <https://code.djangoproject.com/ticket/31344#comment:1>

Django

unread,
Mar 6, 2020, 2:23:58 PM3/6/20
to django-...@googlegroups.com
#31344: Django raises UnicodeEncodeError when there is a cookie with a non-latin
character.
-------------------------------+--------------------------------------
Reporter: Ozgur Akcali | Owner: nobody
Type: Bug | Status: closed

Component: HTTP handling | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: cookie | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Comment (by Florian Apolloner):

I am with Mariusz on this one. If your environment contains data that is
**not** encodable to iso-8859-1 then you have an app-server that doesn't
implement WSGI correctly.

Not sure how this should be handled, 'ignore' could bu supplied as the
second parameter to encode method, but that would change the value of the
cookie silently, and I'm not sure if that would be a desired behavior.

The value of the cookie is already changed silently, if you were to send
"name=öäü".encode('iso-8859-1') as literal cookie value over the wire and
execute the following view:

{{{
from django.core.handlers.wsgi import get_bytes_from_wsgi

print(get_bytes_from_wsgi(request.environ, "HTTP_COOKIE", ""))
print(request.COOKIES["name"])
}}}
you'd get:
{{{
b'name=\xf6\xe4\xfc'
���
}}}
As you can see the "bytes" in the raw environment are reproducable
correctly, but django later on converts to a string using utf-8 with
`replace`. Even if you had any other non-latin character the actual byte
sequence would be correct. We'd need a full traceback and reproducer like
Mariusz said.

--
Ticket URL: <https://code.djangoproject.com/ticket/31344#comment:2>

Reply all
Reply to author
Forward
0 new messages