Hello Tom,
I'll do my best to remember my intentions when this code was first
written. But anyway it might need some fresh thinking. Thanks!
I've decided not to wrap streams in LimitedStream indefinitely because
it would affect performance. I confess guilty for never actually
measuring the hit but reading from connection seemed to me a hot enough
place to operate as close to the socket as possible, especially because
all the thing was made for performance-sensitive cases with potentially
big request payloads.
In most practical situations request.read() was working just fine
without the wrapper: mod_wsgi, mod_python, flup. The only exception at
that time was the Django's runserver that would just hang when trying to
read past content_length bytes. Actually there's a big comment on this
matter right before LimitedStream wrapping in WSGIRequest.__init__. Also
I thought that one could use LimitedStream explicitly in the user code
if needed.
> 1. This code should not raise an Assertion Error...
>
> >>> from django.test.client import RequestFactory
> >>> req=RequestFactory().post('/', {'foo':'bar'})
> >>> req.read(9999)
> Traceback (most recent call last):
> File "<console>", line 1, in <module>
> File
> "/Users/tomchristie/workspace/django-rest-framework/env/lib/python2.6/site-packages/django/http/__init__.py",
> line 296, in read
> return self._stream.read(*args, **kwargs)
> File
> "/Users/tomchristie/workspace/django-rest-framework/env/lib/python2.6/site-packages/django/test/client.py",
> line 51, in read
> assert self.__len >= num_bytes, "Cannot read more than the available
> bytes from the HTTP incoming data."
> AssertionError: Cannot read more than the available bytes from the HTTP
> incoming data.
Judging by svn logs this mock object (FakePayload) was written back at
the days of uploads refactoring. Judging by its comment it is a safety
measure to fail early and loud to protect client code from even
attempting to read input stream past content_length.
So from all these here are my thoughts:
- I still believe that not using the limitator for everything is a good
thing (doesn't break real code, doesn't degrade performance)
- It may be a good idea for a FakePayload to imitate real-world data
providers and just return empty buffers when there's no data left.
Thoughts?
> 2. Isn't the use of LimitBytes in MultipartParser.parse() now redundant?
I didn't even know another such thing exists though I remember searching
for it before writing LimitedStream. Now they just seem to be duplicates
and it's a good idea to ditch LimitBytes (since LimitedStream implements
a readline() too) and move it into some common place. http.utils seems a
good fit for it.
On 04/03/2011 07:03 AM, Tom Christie wrote:
> It's not very obvious from the docs or source if HttpRequest.read() can
> always be safely treated as a limited input stream, or if the developer
> needs to respect HttpRequest.META['CONTENT_LENGTH'].
>
> As far as I can tell the intention is that it can always be treated as a
> limited stream, that seems to at least be the implication in
> WSGIRequest.__init__, in which case it looks to me like there are two bugs.Hello Tom,
I'll do my best to remember my intentions when this code was first
written. But anyway it might need some fresh thinking. Thanks!I've decided not to wrap streams in LimitedStream indefinitely because
it would affect performance. I confess guilty for never actually
measuring the hit but reading from connection seemed to me a hot enough
place to operate as close to the socket as possible, especially because
all the thing was made for performance-sensitive cases with potentially
big request payloads.In most practical situations request.read() was working just fine
without the wrapper: mod_wsgi, mod_python, flup. The only exception at
that time was the Django's runserver that would just hang when trying to
read past content_length bytes. Actually there's a big comment on this
matter right before LimitedStream wrapping in WSGIRequest.__init__. Also
I thought that one could use LimitedStream explicitly in the user code if needed.
Last night I actually did test it :-). You're right the difference in
performance is less than a statistical deviation between different
uploads over network.
Tom Christie:
> Even so, presumably we wouldn't want WSGIRequest to create a
> LimitedStream object for _every_ single incoming HTTP request,
> regardless of if it actually has a payload or not?
>
> In that case deferring the LimitedStream creation, but always
> ensuring that HttpRequest._stream is a LimitedStream rather than the
> underlying stream would be a sensible thing to do.
And this one looks like another premature optimization :-). Creating a
single wrapper object is a negligible hit but the code bloat from making
it lazy is very real. So I'm -1 on laziness.
> #15785 - HttpRequest.read(NUM_BYTES) can read beyond the end of
> wsgi.input stream.
>
> There is a patch (with tests) attached. [2]
Apart from the laziness issue your patch looks fine to me. Thanks!