Re: [Django] #35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked

23 views
Skip to first unread message

Django

unread,
Oct 14, 2024, 11:53:55 AM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Changes (by Carlton Gibson):

* cc: Carlton Gibson (added)
* component: Uncategorized => HTTP handling

--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 14, 2024, 12:10:54 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Description changed by Klaas van Schelven:

Old description:

> Django's request.read() returns 0 bytes when there's no Content-Length
> header.
> i.e. it silently fails.
>
> But not having a Content-Length header is perfectly fine when there's a
> HTTP/1.1 Transfer-Encoding: Chunked request.
>
> WSGI servers like gunicorn and mod_wsgi are able to handle this just
> fine, i.e. Gunicorn handles the hexidecimally encoded lengths and just
> passes you the right chunks and Apache's mod_wsgi does the same I
> believe.
>
> Discussions/docs over at Gunicorn / mod_wsgi:
>
> * https://github.com/benoitc/gunicorn/issues/1264
> * https://github.com/benoitc/gunicorn/issues/605
> * https://github.com/benoitc/gunicorn/issues/2947
> * https://modwsgi.readthedocs.io/en/develop/configuration-
> directives/WSGIChunkedRequest.html
>
> The actual single line of code that's problematic is this one:
> https://github.com/django/django/blob/97c05a64ca87253e9789ebaab4b6d20a1b2370cf/django/core/handlers/wsgi.py#L77
>
> My personal reason I ran into this:
>
> * https://github.com/bugsink/bugsink/issues/9
>
> which is basically solved by using a lot of lines to undo the effects of
> that single line of code:
>
> {{{
> import django
> from django.core.handlers.wsgi import WSGIHandler, WSGIRequest
>
> os.environ.setdefault('DJANGO_SETTINGS_MODULE', '....
>

> class MyWSGIRequest(WSGIRequest):
>
> def __init__(self, environ):
> super().__init__(environ)
>
> if "CONTENT_LENGTH" not in environ and "HTTP_TRANSFER_ENCODING"
> in environ:
> # "unlimit" content length
> self._stream = self.environ["wsgi.input"]
>

> class MyWSGIHandler(WSGIHandler):
> request_class = MyWSGIRequest
>

> def my_get_wsgi_application():
> # Like get_wsgi_application, but returns a subclass of WSGIHandler
> that uses a custom request class.
> django.setup(set_prefix=False)
> return MyWSGIHandler()
>

> application = my_get_wsgi_application()
>
> }}}
>
> But I'd rather not solve this in my own application only, but have it be
> a Django thing. In the Gunicorn links, there's some allusion to "this
> won't happen b/c wsgi spec", but that seems like a bad reason from my
> perspective. At the very least request.read() should not just silently
> give a 0-length answer. And having tools available such that you don't
> need to make a "MyXXX" hierarchy would also be nice.
>
> That's the bit for "actually getting request.read() to work when behind
> gunicorn".
>
> There's also the part where this doesn't work for the debugserver. Which
> is fine, given its limited scope. But an error would be better than
> silently returning nothing (again).
>
> My solution for that one is the following middleware:
>

> {{{
> class DisallowChunkedMiddleware:
> def __init__(self, get_response):
> self.get_response = get_response
>
> def __call__(self, request):
> if request.META.get("HTTP_TRANSFER_ENCODING").lower() ==
> "chunked" and \
> not request.META.get("wsgi.input_terminated"):
>
> # If we get here, it means that the request has a Transfer-
> Encoding header with a value of "chunked", but we
> # have no wsgi-layer handling for that. This probably means
> that we're running the Django development
> # server, and as such our fixes for the Gunicorn/Django
> mismatch that we put in wsgi.py are not in effect.
> raise ValueError("This server is not configured to support
> Chunked Transfer Encoding (for requests)")
>
> return self.get_response(request)
> }}}
>

> Some links:
>
> * This one seemed related, but probably isn't (it's about forms):
> https://code.djangoproject.com/ticket/35289
> * This seemed related, but is about uwsgi and uses special uwsgi
> features: https://github.com/btimby/uwsgi-
> chunked/blob/master/uwsgi_chunked/chunked.py
>
> All of the above is when using WSGI (I did not test/think about ASGI)

New description:

Django's request.read() returns 0 bytes when there's no Content-Length
header.
i.e. it silently fails.

But not having a Content-Length header is perfectly fine when there's a
HTTP/1.1 Transfer-Encoding: Chunked request.

WSGI servers like gunicorn and mod_wsgi are able to handle this just fine,
i.e. Gunicorn handles the hexidecimally encoded lengths and just passes
you the right chunks and Apache's mod_wsgi does the same I believe.

Discussions/docs over at Gunicorn / mod_wsgi:

* https://github.com/benoitc/gunicorn/issues/1264
* https://github.com/benoitc/gunicorn/issues/605
* https://github.com/benoitc/gunicorn/issues/2947
* https://modwsgi.readthedocs.io/en/develop/configuration-
directives/WSGIChunkedRequest.html

The actual single line of code that's problematic is this one:
https://github.com/django/django/blob/97c05a64ca87253e9789ebaab4b6d20a1b2370cf/django/core/handlers/wsgi.py#L77

My personal reason I ran into this:

* https://github.com/bugsink/bugsink/issues/9

which is basically solved by using a lot of lines to undo the effects of
that single line of code:

{{{
import django
from django.core.handlers.wsgi import WSGIHandler, WSGIRequest

os.environ.setdefault('DJANGO_SETTINGS_MODULE', '....


class MyWSGIRequest(WSGIRequest):

def __init__(self, environ):
super().__init__(environ)

if "CONTENT_LENGTH" not in environ and "HTTP_TRANSFER_ENCODING" in
environ:
# "unlimit" content length
self._stream = self.environ["wsgi.input"]


class MyWSGIHandler(WSGIHandler):
request_class = MyWSGIRequest


def my_get_wsgi_application():
# Like get_wsgi_application, but returns a subclass of WSGIHandler
that uses a custom request class.
django.setup(set_prefix=False)
return MyWSGIHandler()


application = my_get_wsgi_application()

}}}

But I'd rather not solve this in my own application only, but have it be a
Django thing. In the Gunicorn links, there's some allusion to "this won't
happen b/c wsgi spec", but that seems like a bad reason from my
perspective. At the very least request.read() should not just silently
give a 0-length answer. And having tools available such that you don't
need to make a "MyXXX" hierarchy would also be nice.

That's the bit for "actually getting request.read() to work when behind
gunicorn".

There's also the part where this doesn't work for the debugserver. Which
is fine, given its limited scope. But an error would be better than
silently returning nothing (again).

My solution for that one is the following middleware:


{{{
class DisallowChunkedMiddleware:
def __init__(self, get_response):
self.get_response = get_response

def __call__(self, request):
if "HTTP_TRANSFER_ENCODING" in request.META and \
request.META["HTTP_TRANSFER_ENCODING"].lower() ==
"chunked" and \
not request.META.get("wsgi.input_terminated"):

# If we get here, it means that the request has a Transfer-
Encoding header with a value of "chunked", but we
# have no wsgi-layer handling for that. This probably means
that we're running the Django development
# server, and as such our fixes for the Gunicorn/Django
mismatch that we put in wsgi.py are not in effect.
raise ValueError("This server is not configured to support
Chunked Transfer Encoding (for requests)")

return self.get_response(request)
}}}


Some links:

* This one seemed related, but probably isn't (it's about forms):
https://code.djangoproject.com/ticket/35289
* This seemed related, but is about uwsgi and uses special uwsgi features:
https://github.com/btimby/uwsgi-
chunked/blob/master/uwsgi_chunked/chunked.py

All of the above is when using WSGI (I did not test/think about ASGI)

--
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:3>

Django

unread,
Oct 14, 2024, 1:47:50 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by Klaas van Schelven):

Side-note: I found stumbling upon this bug quite surprising: you would
think that requests w/ chunked transfer-encoding are used often enough for
this to have surfaced more broadly _at some point_, but I couldn't find
much more than the things I already linked. Perhaps it's not broadly used?
Or perhaps it is, but in contexts where it is Django is always behind some
proxy that removes it, and sends the WSGI server (and Django) something
"more plain"?
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:4>

Django

unread,
Oct 14, 2024, 2:14:05 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Changes (by bcail):

* cc: bcail (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:5>

Django

unread,
Oct 14, 2024, 2:37:14 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by bcail):

[https://peps.python.org/pep-3333/#other-http-features Here's] a relevant
section of the WSGI spec.
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:6>

Django

unread,
Oct 14, 2024, 2:44:40 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by Klaas van Schelven):

> WSGI servers must handle any supported inbound “hop-by-hop” headers on
their own, such as by decoding any inbound Transfer-Encoding, including
chunked encoding if applicable.

That's the part that's relevant to the present discussion (I think it's
also the only part, but I might be wrong)

What do you take that to say w/ regards to the above? Because e.g.
Gunicorn "decodes" just fine (it parses the hex-coded chunk-lenghts and
removes those from the stream as seen by Django) as long as you don't
expect "decoding" to mean "also adds fake Content-Length header"
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:7>

Django

unread,
Oct 14, 2024, 3:13:22 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by bcail):

Here's the first part of that paragraph:
> However, because WSGI servers and applications do not communicate via
HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI
internal communications. WSGI applications must not generate any “hop-by-
hop” headers, attempt to use HTTP features that would require them to
generate such headers, or rely on the content of any incoming “hop-by-hop”
headers in the environ dictionary.

There's also this quote, which is actually in a different section of the
spec:
> Applications and middleware are forbidden from using HTTP/1.1 “hop-by-
hop” features or headers...

There's also
[https://github.com/benoitc/gunicorn/issues/1264#issuecomment-254183925
these]
[https://github.com/benoitc/gunicorn/pull/1265#issuecomment-219251058
three]
[https://github.com/benoitc/gunicorn/pull/1265#issuecomment-219312733
quotes] from the issues you listed.

It seems like the idea from the spec is that the WSGI servers would de-
chunk the requests, so Django would see the full request, not chunks.

Maybe it would be good for Django to not follow the spec here, but if so,
it would probably be good to keep that in mind and explicitly say so in
the docs.
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:8>

Django

unread,
Oct 14, 2024, 4:32:15 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by Klaas van Schelven):

> first part of that paragraph

but:

* "applications must not generate" does not apply here (that's for
Response Transfer-Encoding)
* "attempt to use [..] generate such headers" same
* "rely on the content" I don't think my proposed direction for Django is
relying on anything other than _not_ relying on Content-Length

Regarding the quotes: admittedly I didn't have the full discussions top-
of-mind after spending a day debugging. I suppose I also discounted some
points I didn't agree with. But if I reread those whole threads, I mostly
get the sense of Benoit asking "why can't we just fix this" and it falling
on deaf ears... btw the "FWIW" by Graham actually points to him agreeing
with Benoit.

By the way, I can't actually find the part of the spec that says Content-
Length is required; nor can I find the part that says "if it's not there,
just assume 0". I'm using ^F... perhaps it's implied through something
else?

having said all that, I have an open point and an open question:

1. _even if_ one would say that Django cannot read from a request's body
on the application-side of WSGI, as per the WSGI spec, that does not imply
Django should just silently throw away the data for that case. Would
raising an exception not be much better?

2. does the "solution" (workaround) that I proposed (and for my own
project: implemented) have any actual downsides other than not following
the WSGI spec? I have personal reasons to ask, but the answer would also
be generally useful in telling us whether it's a viable general solution
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:9>

Django

unread,
Oct 14, 2024, 4:37:19 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by Klaas van Schelven):

> you would think that requests w/ chunked transfer-encoding are used
often enough for this to have surfaced more broadly

though probably not much by browsers
[https://github.com/bugsink/bugsink/issues/8#issuecomment-2411500753 but I
ran into this in the context of a non-browser client]
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:10>

Django

unread,
Oct 14, 2024, 4:56:25 PM10/14/24
to django-...@googlegroups.com
#35838: request.read() returns empty for Requests w/ Transfer-Encoding: Chunked
------------------------------------+--------------------------------------
Reporter: Klaas van Schelven | Owner: (none)
Type: Uncategorized | Status: new
Component: HTTP handling | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Comment (by bcail):

> _even if_ one would say that Django cannot read from a request's body on
the application-side of WSGI, as per the WSGI spec, that does not imply
Django should just silently throw away the data for that case. Would
raising an exception not be much better?
Yes, I agree - if Django can see that there is data, but it's not correct
according to the spec, it would be better to raise an exception than
silently throw away that data.

> does the "solution" (workaround) that I proposed (and for my own
project: implemented) have any actual downsides other than not following
the WSGI spec? I have personal reasons to ask, but the answer would also
be generally useful in telling us whether it's a viable general solution
I don't know - it would need to be analyzed, tested, ... One immediate
question to answer would be whether it causes any problems for WSGI
servers that implement the spec correctly. (Note: if you think that the
WSGI spec **does** allow for chunked transfer encoding at the Django app
level, you can also try to convince people of that.)

One thing you could try is to open a PR that implements the change you
want to make in Django, and let people give feedback on it and see if they
like it.
--
Ticket URL: <https://code.djangoproject.com/ticket/35838#comment:11>
Reply all
Reply to author
Forward
0 new messages