Thanks Eki Xu for the report.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Comment (by Mariusz Felisiak):
See related ticket #5076.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:1>
Old description:
> `charset` parameter is
> [https://github.com/django/django/blob/02966a30dd31d2b9d35f8c481a448b9bf377895e/django/http/request.py#L379-L383
> used] in `application/x-www-urlencoded` content type. However, per
> [https://url.spec.whatwg.org/#application/x-www-form-urlencoded, the
> current spec] (check out RFC 1866) the content type `application/x-www-
> form-urlencoded` does not have a `charset` and should be treated as
> UTF-8.
>
> Thanks Eki Xu for the report.
New description:
`charset` parameter is
[https://github.com/django/django/blob/02966a30dd31d2b9d35f8c481a448b9bf377895e/django/http/request.py#L379-L383
used] in `application/x-www-urlencoded` content type. However, per
[https://url.spec.whatwg.org/#application/x-www-form-urlencoded the
current spec] (check out RFC 1866) the content type `application/x-www-
form-urlencoded` does not have a `charset` and should be treated as UTF-8.
Thanks Eki Xu for the report.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:2>
Comment (by Mariusz Felisiak):
The current behavior, even if not correct, is documented and tested.
Should we use a deprecation path?
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:3>
Comment (by Claude Paroz):
I think it would be difficult to provide a sensible deprecation path
(visible by devs, I mean), do you have a plan in mind?
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:4>
Comment (by Mariusz Felisiak):
Replying to [comment:4 Claude Paroz]:
> I think it would be difficult to provide a sensible deprecation path
(visible by devs, I mean), do you have a plan in mind?
We could raise a warning when `self._encoding` is not `utf-8`, that it
will be ignored in Django 6.0. I'm just not sure it's worth doing.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:5>
* cc: Shamil Abdulaev (added)
Comment:
Greetings to you! I would like to tackle this bug and solve it!)
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:6>
Comment (by Mariusz Felisiak):
Shamil, this ticket is already assign to me. We're discussing an
acceptable approach.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:7>
Comment (by Claude Paroz):
Replying to [comment:5 Mariusz Felisiak]:
> We could raise a warning when `self._encoding` is not `utf-8`, that it
will be ignored in Django 6.0. I'm just not sure it's worth doing.
The warning will probably only be ever raised on production servers, so
hardly visible in practice.
Did you explore what will happen in practice if an incoming request is
encoded in a different encoding and we try to decode it with `utf-8`?
Server error (500)? Bad request(400)?
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:8>
Comment (by Mariusz Felisiak):
Replying to [comment:8 Claude Paroz]:
> Did you explore what will happen in practice if an incoming request is
encoded in a different encoding and we try to decode it with `utf-8`?
Server error (500)? Bad request(400)?
I was only able to achieve a badly decoded string, no crash 🤔
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:9>
Comment (by Claude Paroz):
Indeed, by default `parse_sql` is calling `decode()` with
`errors='replace'`, which will produce � (U+FFFD, the official REPLACEMENT
CHARACTER) for invalid UTF-8 sequences. I'm still not convinced it will be
a real improvement over the current situation…
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:10>
Comment (by Mariusz Felisiak):
Replying to [comment:10 Claude Paroz]:
> Indeed, by default `parse_sql` is calling `decode()` with
`errors='replace'`, which will produce � (U+FFFD, the official REPLACEMENT
CHARACTER) for invalid UTF-8 sequences. I'm still not convinced it will be
a real improvement over the current situation…
We could raise 400 instead of silently ignoring a custom charset.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:11>
Comment (by Claude Paroz):
Replying to [comment:11 Mariusz Felisiak]:
> We could raise 400 instead of silently ignoring a custom charset.
Sure, I'd prefer that, failing loudly, instead of silently getting
wrongly-encoded input.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:12>
Comment (by Adam Johnson):
A 400 response sounds sensible to me too.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:13>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/17184 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:14>
Comment (by Claude Paroz):
While starting to review the patch, and looking for more recent
considerations than the old 1866 RFC, I read
https://url.spec.whatwg.org/#application/x-www-form-urlencoded which is
worth a read. Quoting a note:
`A legacy server-oriented implementation might have to support encodings
other than UTF-8 as well as have special logic for tuples of which the
name is ``_charset``. Such logic is not described here as only UTF-8 is
conforming.`
I don't necessarily re-question our previous discussions/decisions,
however we might be prepared to receive some complaints as it may be that
non-conforming agents start to produce BadRequest errors. Difficult to say
before going to production!
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:15>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:16>
* cc: Shai Berger (added)
Comment:
Replying to [comment:15 Claude Paroz]:
> While starting to review the patch, and looking for more recent
considerations than the old 1866 RFC, I read
https://url.spec.whatwg.org/#application/x-www-form-urlencoded which is
worth a read. Quoting a note:
>
> `A legacy server-oriented implementation might have to support encodings
other than UTF-8 as well as have special logic for tuples of which the
name is ``_charset``. Such logic is not described here as only UTF-8 is
conforming.`
>
> I don't necessarily re-question our previous discussions/decisions,
however we might be prepared to receive some complaints as it may be that
non-conforming agents start to produce BadRequest errors. Difficult to say
before going to production!
Unfortunately, I don't see a way to support this with a loud crash at the
same time.
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:17>
Comment (by Mariusz Felisiak):
I found some issues/PRs to remove `charset` for the `application/x-www-
form-urlencoded` content type in other libraries. Even explicitly passing
`charset=utf-8` caused issues. As far as I'm aware, we can move it
forward:
- https://github.com/requests/requests-oauthlib/issues/437
- https://github.com/request/request/issues/701
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:18>
Comment (by Claude Paroz):
Sure, let's go ahead!
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:19>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"11920e77959deaa65eb86ccc5d39da903fd3dd41" 11920e7]:
{{{
#!CommitTicketReference repository=""
revision="11920e77959deaa65eb86ccc5d39da903fd3dd41"
Fixed #34709 -- Raised BadRequest for non-UTF-8 requests with the
application/x-www-form-urlencoded content type.
Thanks Eki Xu for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34709#comment:20>