[Django] #34709: charset should be ignored for the application/x-www-form-urlencoded content type.

7 views
Skip to first unread message

Django

unread,
Jul 13, 2023, 4:00:21 AM7/13/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz | Owner: Mariusz Felisiak
Felisiak |
Type: Bug | Status: assigned
Component: HTTP | Version: 4.2
handling |
Severity: Normal | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
`charset` parameter is
[https://github.com/django/django/blob/02966a30dd31d2b9d35f8c481a448b9bf377895e/django/http/request.py#L379-L383
used] in `application/x-www-urlencoded` content type. However, per
[https://url.spec.whatwg.org/#application/x-www-form-urlencoded, the
current spec] (check out RFC 1866) the content type `application/x-www-
form-urlencoded` does not have a `charset` and should be treated as UTF-8.

Thanks Eki Xu for the report.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 13, 2023, 4:34:05 AM7/13/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz

| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

See related ticket #5076.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:1>

Django

unread,
Jul 13, 2023, 4:34:41 AM7/13/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Mariusz Felisiak:

Old description:

> `charset` parameter is
> [https://github.com/django/django/blob/02966a30dd31d2b9d35f8c481a448b9bf377895e/django/http/request.py#L379-L383
> used] in `application/x-www-urlencoded` content type. However, per
> [https://url.spec.whatwg.org/#application/x-www-form-urlencoded, the
> current spec] (check out RFC 1866) the content type `application/x-www-
> form-urlencoded` does not have a `charset` and should be treated as
> UTF-8.
>
> Thanks Eki Xu for the report.

New description:

`charset` parameter is
[https://github.com/django/django/blob/02966a30dd31d2b9d35f8c481a448b9bf377895e/django/http/request.py#L379-L383
used] in `application/x-www-urlencoded` content type. However, per

[https://url.spec.whatwg.org/#application/x-www-form-urlencoded the


current spec] (check out RFC 1866) the content type `application/x-www-
form-urlencoded` does not have a `charset` and should be treated as UTF-8.

Thanks Eki Xu for the report.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:2>

Django

unread,
Jul 13, 2023, 5:00:37 AM7/13/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

The current behavior, even if not correct, is documented and tested.
Should we use a deprecation path?

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:3>

Django

unread,
Jul 13, 2023, 2:45:38 PM7/13/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

I think it would be difficult to provide a sensible deprecation path
(visible by devs, I mean), do you have a plan in mind?

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:4>

Django

unread,
Jul 14, 2023, 12:11:20 AM7/14/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

Replying to [comment:4 Claude Paroz]:


> I think it would be difficult to provide a sensible deprecation path
(visible by devs, I mean), do you have a plan in mind?

We could raise a warning when `self._encoding` is not `utf-8`, that it
will be ignored in Django 6.0. I'm just not sure it's worth doing.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:5>

Django

unread,
Jul 15, 2023, 6:12:21 AM7/15/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Shamil Abdulaev):

* cc: Shamil Abdulaev (added)


Comment:

Greetings to you! I would like to tackle this bug and solve it!)

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:6>

Django

unread,
Jul 15, 2023, 1:41:29 PM7/15/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

Shamil, this ticket is already assign to me. We're discussing an
acceptable approach.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:7>

Django

unread,
Jul 17, 2023, 6:25:40 AM7/17/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

Replying to [comment:5 Mariusz Felisiak]:


> We could raise a warning when `self._encoding` is not `utf-8`, that it
will be ignored in Django 6.0. I'm just not sure it's worth doing.

The warning will probably only be ever raised on production servers, so
hardly visible in practice.

Did you explore what will happen in practice if an incoming request is
encoded in a different encoding and we try to decode it with `utf-8`?
Server error (500)? Bad request(400)?

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:8>

Django

unread,
Jul 17, 2023, 7:05:28 AM7/17/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

Replying to [comment:8 Claude Paroz]:


> Did you explore what will happen in practice if an incoming request is
encoded in a different encoding and we try to decode it with `utf-8`?
Server error (500)? Bad request(400)?

I was only able to achieve a badly decoded string, no crash 🤔

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:9>

Django

unread,
Jul 17, 2023, 8:50:12 AM7/17/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

Indeed, by default `parse_sql` is calling `decode()` with
`errors='replace'`, which will produce � (U+FFFD, the official REPLACEMENT
CHARACTER) for invalid UTF-8 sequences. I'm still not convinced it will be
a real improvement over the current situation…

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:10>

Django

unread,
Jul 17, 2023, 1:30:49 PM7/17/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

Replying to [comment:10 Claude Paroz]:


> Indeed, by default `parse_sql` is calling `decode()` with
`errors='replace'`, which will produce � (U+FFFD, the official REPLACEMENT
CHARACTER) for invalid UTF-8 sequences. I'm still not convinced it will be
a real improvement over the current situation…

We could raise 400 instead of silently ignoring a custom charset.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:11>

Django

unread,
Jul 19, 2023, 5:47:31 PM7/19/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

Replying to [comment:11 Mariusz Felisiak]:


> We could raise 400 instead of silently ignoring a custom charset.

Sure, I'd prefer that, failing loudly, instead of silently getting
wrongly-encoded input.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:12>

Django

unread,
Aug 16, 2023, 5:59:28 PM8/16/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Adam Johnson):

A 400 response sounds sensible to me too.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:13>

Django

unread,
Aug 22, 2023, 5:11:51 PM8/22/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/17184 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:14>

Django

unread,
Aug 23, 2023, 8:45:47 AM8/23/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

While starting to review the patch, and looking for more recent
considerations than the old 1866 RFC, I read
https://url.spec.whatwg.org/#application/x-www-form-urlencoded which is
worth a read. Quoting a note:

`A legacy server-oriented implementation might have to support encodings
other than UTF-8 as well as have special logic for tuples of which the
name is ``_charset``. Such logic is not described here as only UTF-8 is
conforming.`

I don't necessarily re-question our previous discussions/decisions,
however we might be prepared to receive some complaints as it may be that
non-conforming agents start to produce BadRequest errors. Difficult to say
before going to production!

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:15>

Django

unread,
Aug 23, 2023, 9:18:39 AM8/23/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:16>

Django

unread,
Aug 23, 2023, 11:35:05 PM8/23/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* cc: Shai Berger (added)


Comment:

Replying to [comment:15 Claude Paroz]:


> While starting to review the patch, and looking for more recent
considerations than the old 1866 RFC, I read
https://url.spec.whatwg.org/#application/x-www-form-urlencoded which is
worth a read. Quoting a note:
>
> `A legacy server-oriented implementation might have to support encodings
other than UTF-8 as well as have special logic for tuples of which the
name is ``_charset``. Such logic is not described here as only UTF-8 is
conforming.`
>
> I don't necessarily re-question our previous discussions/decisions,
however we might be prepared to receive some complaints as it may be that
non-conforming agents start to produce BadRequest errors. Difficult to say
before going to production!

Unfortunately, I don't see a way to support this with a loud crash at the
same time.

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:17>

Django

unread,
Aug 24, 2023, 2:58:55 AM8/24/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

I found some issues/PRs to remove `charset` for the `application/x-www-
form-urlencoded` content type in other libraries. Even explicitly passing
`charset=utf-8` caused issues. As far as I'm aware, we can move it
forward:

- https://github.com/requests/requests-oauthlib/issues/437
- https://github.com/request/request/issues/701

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:18>

Django

unread,
Aug 25, 2023, 10:55:58 AM8/25/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: assigned
Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

Sure, let's go ahead!

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:19>

Django

unread,
Aug 25, 2023, 3:27:32 PM8/25/23
to django-...@googlegroups.com
#34709: charset should be ignored for the application/x-www-form-urlencoded content
type.
-------------------------------------+-------------------------------------
Reporter: Mariusz Felisiak | Owner: Mariusz
| Felisiak
Type: Bug | Status: closed

Component: HTTP handling | Version: 4.2
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by GitHub <noreply@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"11920e77959deaa65eb86ccc5d39da903fd3dd41" 11920e7]:
{{{
#!CommitTicketReference repository=""
revision="11920e77959deaa65eb86ccc5d39da903fd3dd41"
Fixed #34709 -- Raised BadRequest for non-UTF-8 requests with the
application/x-www-form-urlencoded content type.

Thanks Eki Xu for the report.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:20>

Reply all
Reply to author
Forward
0 new messages