returning large files from django

1,070 views
Skip to first unread message

TP

unread,
Aug 13, 2008, 6:17:50 PM8/13/08
to Django developers
I've recently been returning lots of large files from a web app (and
yes, I need to do this).

If I use Apache with mod_wsgi to directly invoke my python script I
can return files with about 30x less CPU than if I use django to
return the same files.

I believe this is likely because mod_wsgi gives access to sendfile on
linux while with Django the best I can do is use HttpResponse() with
an iterator that returns data in big blocks (ie,
HttpResponse(FileWrapper(filehandle), content_type='image/jpeg') where
FileWrapper returns 32kb blocks from the file).

I see that ticket 2131 http://code.djangoproject.com/ticket/2131 was
closed as wontfix due to "django not being designed to serve static
files". While I know this is true in general, I can think of a lot of
specific cases where it's not true too.

Is there any hope (post-1.0) of resurrecting this patch and getting it
or something like it in?

Graham Dumpleton

unread,
Aug 13, 2008, 7:53:22 PM8/13/08
to Django developers
On Aug 14, 8:17 am, TP <thomaspinckn...@gmail.com> wrote:
> I've recently been returning lots of large files from a web app (and
> yes, I need to do this).
>
> If I use Apache withmod_wsgito directly invoke my python script I
> can return files with about 30x less CPU than if I use django to
> return the same files.
>
> I believe this is likely becausemod_wsgigives access to sendfile on
> linux while with Django the best I can do is use HttpResponse() with
> an iterator that returns data in big blocks (ie,
> HttpResponse(FileWrapper(filehandle), content_type='image/jpeg') where
> FileWrapper returns 32kb blocks from the file).
>
> I see that ticket 2131http://code.djangoproject.com/ticket/2131was
> closed as wontfix due to "django not being designed to serve static
> files". While I know this is true in general, I can think of a lot of
> specific cases where it's not true too.
>
> Is there any hope (post-1.0) of resurrecting this patch and getting it
> or something like it in?

It should be pointed out that the patch also isn't mod_wsgi specific.
Technically it should work on any WSGI hosting mechanism which
implements the wsgi.file_wrapper extension. Whether a WSGI hosting
mechanism actually implements wsgi.file_wrapper with as something as
efficient as sendfile() is a different matter, but important thing is
that it is a standard API for returning file objects efficiently when
using WSGI.

If this is the same patch someone discussed with me previously and
gave input on, the only bit which is mod_wsgi specific is that it
knows that on mod_wsgi the wsgi.file_wrapper is also implemented
somewhat more correctly than other WSGI hosting mechanism due to it
recognising Content-Length header returned by application and only
returning that amount of data, thus allowing range requests more
efficiently as well. Other WSGI hosting mechanisms can be wrong and
return more content from file than may be intended and specified by
Content-Length header, so you are stuck with Python version of file
wrapper instead for range requests for them.

The patch also supports Django developers preferred deployment
platform of mod_python, so overall I thought it would have been an
easy decision to see the benefit in including it. Seems though that
not enough thought put in to how it might be used before dismissing
it. :-(

Graham

Jeremy Dunck

unread,
Aug 13, 2008, 8:05:40 PM8/13/08
to django-d...@googlegroups.com
On Wed, Aug 13, 2008 at 6:53 PM, Graham Dumpleton
<Graham.D...@gmail.com> wrote:
...

> The patch also supports Django developers preferred deployment
> platform of mod_python, so overall I thought it would have been an
> easy decision to see the benefit in including it. Seems though that
> not enough thought put in to how it might be used before dismissing
> it. :-(

FWIW, I think the general consensus is that mod_wsgi wins over
modpython these days. I don't speak for core devs, but that's my
thought. Thanks for your work, Graham.

Simon Willison

unread,
Aug 14, 2008, 3:23:27 AM8/14/08
to Django developers
On Aug 13, 11:17 pm, TP <thomaspinckn...@gmail.com> wrote:
> I've recently been returning lots of large files from a web app (and
> yes, I need to do this).

Are the files static or dynamically generated? If they are static and
you're serving them using Django so that you can use Django logic to
decide if the user is allowed to download the file or not then a much
more efficient solution is to put Django behind nginx and use nginx's
special custom header for "send the user this protected file". That
way Django only needs to be involved long enough to decide which file
to send; nginx does all the heavy lifting leaving your Django
processes free to handle more requests.

That said, my vote is for addressing this issue post-1.0 to cover the
case of serving large dynamic files (db dumps, csv exports, on-demand
zip and jar files etc)

Justin Fagnani

unread,
Aug 14, 2008, 7:25:52 AM8/14/08
to django-d...@googlegroups.com
On Thu, Aug 14, 2008 at 12:23 AM, Simon Willison
<si...@simonwillison.net> wrote:
>
> On Aug 13, 11:17 pm, TP <thomaspinckn...@gmail.com> wrote:
>> I've recently been returning lots of large files from a web app (and
>> yes, I need to do this).
>
> Are the files static or dynamically generated? If they are static and
> you're serving them using Django so that you can use Django logic to
> decide if the user is allowed to download the file or not then a much
> more efficient solution is to put Django behind nginx and use nginx's
> special custom header for "send the user this protected file". That
> way Django only needs to be involved long enough to decide which file
> to send; nginx does all the heavy lifting leaving your Django
> processes free to handle more requests.

Isn't this basically what wsgi.file_wrapper does? With this patch
HttpResponseSendFile (should probably be HttpResponseFileWrapper) is
used to hand the file off to the wsgi server which takes over and
leaves Django to continue on with other requests.

Justin

Tim Chase

unread,
Aug 14, 2008, 8:41:14 AM8/14/08
to django-d...@googlegroups.com
>> Are the files static or dynamically generated? If they are static and
>> you're serving them using Django so that you can use Django logic to
>> decide if the user is allowed to download the file or not then a much
>> more efficient solution is to put Django behind nginx and use nginx's
>> special custom header for "send the user this protected file". That
>> way Django only needs to be involved long enough to decide which file
>> to send; nginx does all the heavy lifting leaving your Django
>> processes free to handle more requests.
>
> Isn't this basically what wsgi.file_wrapper does? With this patch
> HttpResponseSendFile (should probably be HttpResponseFileWrapper) is
> used to hand the file off to the wsgi server which takes over and
> leaves Django to continue on with other requests.

Is there a way to abstract this server-specific method of
returning large-but-authenticated files? I have an item on my
plate of things to do that includes returning media (images,
audio, and video) to authenticated users, and I'd like to keep my
code fairly server-agnostic. Solutions were described above for
nginx and wsgi, which mostly leaves mod_python as the remaining
"big" way of serving for which a solution hasn't been listed (if
there is one). For the smaller content, I can just use
authenticated static-media serving. But for the big content, it
could drag a server to its knees pretty quickly with a 500 meg
video file. :)

Yeah, I know the proximity to 1.0 doesn't make this immediately
likely, but maybe post-1.0? It's certainly new stuff, not "get
existing stuff right for 1.0".

-tim


TP

unread,
Aug 14, 2008, 9:01:42 AM8/14/08
to Django developers


On Aug 14, 8:41 am, Tim Chase <django.us...@tim.thechases.com> wrote:
> >> Are the files static or dynamically generated? If they are static and
> >> you're serving them using Django so that you can use Django logic to

We are using our DB as the our content distribution system to our
webservers. When a request for an image comes in, if it is the first
request the web server has ever received for this image, our app pulls
the image out of the db, thumbnails it if necessary, stores it into
the local filesystem and then streams the contents back to the client.
Subsequent requests can bypass Django and be served directly by the
webserver.

>
> > Isn't this basically what wsgi.file_wrapper does? With this patch
> > HttpResponseSendFile (should probably be HttpResponseFileWrapper) is
> > used to hand the file off to the wsgi server which takes over and
> > leaves Django to continue on with other requests.
>
> Is there a way to abstract this server-specific method of
> returning large-but-authenticated files?  I have an item on my

The patch associated with this ticket works for mod_python and WSGI
containers.

Gábor Farkas

unread,
Aug 15, 2008, 2:58:30 AM8/15/08
to django-d...@googlegroups.com
On Thu, Aug 14, 2008 at 3:01 PM, TP <thomasp...@gmail.com> wrote:
>
>
>
> On Aug 14, 8:41 am, Tim Chase <django.us...@tim.thechases.com> wrote:
>> >> Are the files static or dynamically generated? If they are static and
>> >> you're serving them using Django so that you can use Django logic to
>
> We are using our DB as the our content distribution system to our
> webservers. When a request for an image comes in, if it is the first
> request the web server has ever received for this image, our app pulls
> the image out of the db, thumbnails it if necessary, stores it into
> the local filesystem and then streams the contents back to the client.
> Subsequent requests can bypass Django and be served directly by the
> webserver.

hi,

if that's the case, isn't it possible to end the "processing" path
with a redirect?

i mean, when you finish the "processing", is it really necessary to
serve the file?
couldn't you just send back a http-redirect?

that way even for the first request, you could serve the image
directly by the webserver

gabor

TP

unread,
Aug 15, 2008, 11:47:53 AM8/15/08
to Django developers

> if that's the case, isn't it possible to end the "processing" path
> with a redirect?
>

Yes, this is one of the alternatives we're considering. The others
are:

1) Just write a separate application outside of Django
2) Hack Django to give access to the underlying mod_wsgi file_wrapper
interface

I think #1 or #2 is slightly better for us since it doesn't require
the latency of two round trips for the client.

Scott Moonen

unread,
Aug 15, 2008, 2:12:04 PM8/15/08
to django-d...@googlegroups.com
if that's the case, isn't it possible to end the "processing" path with a redirect?

Gábor, that works, but depending on your security requirements it may not be desirable.  The reason is that access to the file itself is no longer under direct control of the Django app; someone could cut and paste the target URL and share it with others.
 
Yes, this is one of the alternatives we're considering. The others are:

1) Just write a separate application outside of Django
2) Hack Django to give access to the underlying mod_wsgi file_wrapper
interface

Thomas, what server are you using?  I believe that lighttpd supports the X-Sendfile header, and Apache does as well if you're running with mod_xsendfile.  If you've got such a setup you can serve up the file by doing something like:

response = HttpResponse()
response['X-Sendfile'] = file_name
response['Content-Type'] = file_mime
response['Content-Length'] = file_size
# response['Content-Disposition'] = 'attachment; filename="' + basename(file_name) + '"'
return response
 
Perhaps a little more hacky than exploiting mod_python, but it's something to consider, at least.

  -- Scott

--
http://scott.andstuff.org/ | http://truthadorned.org/

Robert Coup

unread,
Aug 16, 2008, 1:57:05 AM8/16/08
to django-d...@googlegroups.com
On Sat, Aug 16, 2008 at 6:12 AM, Scott Moonen <smo...@andstuff.org> wrote:
 
Yes, this is one of the alternatives we're considering. The others are:

1) Just write a separate application outside of Django
2) Hack Django to give access to the underlying mod_wsgi file_wrapper
interface

Thomas, what server are you using?  I believe that lighttpd supports the X-Sendfile header, and Apache does as well if you're running with mod_xsendfile.  If you've got such a setup you can serve up the file by doing something like:

Perlbal has something similar: http://www.danga.com/perlbal/
Internal redirection to file or URL(s)
  • Big one for us; a backend can instruct Perlbal to fetch the user's data from a completely separate server and port and URL, 100% transparent to the user
  • Can actually give Perlbal a list of URLs to try. Perlbal will find one that's alive. Again, the end user sees no redirects happening.
  • Can also redirect to a file, which Perlbal will serve non-blocking. See webserver mode above.
HTH,

Rob :)

Julien Phalip

unread,
Aug 16, 2008, 2:21:16 AM8/16/08
to Django developers
> Thomas, what server are you using?  I believe that lighttpd supports the
> X-Sendfile header, and Apache does as well if you're running with
> mod_xsendfile <http://tn123.ath.cx/mod_xsendfile/>.

A while back, John Hensley had given some good hints on the user-list
about mod_xsendfile:
http://groups.google.com/group/django-users/browse_thread/thread/b4ceae1956e003e5/

At the time I hadn't tried hard enough and finally gave up... I
remember though, that I couldn't find a way to seemlessly make it work
both on Apache and with the development server.

Gábor Farkas

unread,
Aug 16, 2008, 6:34:22 AM8/16/08
to django-d...@googlegroups.com
On Fri, Aug 15, 2008 at 8:12 PM, Scott Moonen <smo...@andstuff.org> wrote:
>>> if that's the case, isn't it possible to end the "processing" path with a
>>> redirect?
>
> Gábor, that works, but depending on your security requirements it may not be
> desirable. The reason is that access to the file itself is no longer under
> direct control of the Django app; someone could cut and paste the target URL
> and share it with others.

yes, i understand. but i assumed based on the original poster's comments


that it's ok, because he said:

>> Subsequent requests can bypass Django and be served directly by the
>> webserver.

> Thomas, what server are you using? I believe that lighttpd supports the


> X-Sendfile header, and Apache does as well if you're running with
> mod_xsendfile.

nginx can do something very similar too:

http://wiki.codemongers.com/NginxXSendfile

gabor

Patryk Zawadzki

unread,
Aug 21, 2008, 2:53:07 PM8/21/08
to django-d...@googlegroups.com
On Fri, Aug 15, 2008 at 8:58 AM, Gábor Farkas <ga...@nekomancer.net> wrote:
> if that's the case, isn't it possible to end the "processing" path
> with a redirect?
>
> i mean, when you finish the "processing", is it really necessary to
> serve the file?
> couldn't you just send back a http-redirect?
>
> that way even for the first request, you could serve the image
> directly by the webserver

I'm in a similar situation where authorized users can generate a lot
of different reports in PDF format. The files are clearly not static
and while I probably can just save them with random names so random
URI attacks are hard to perform, I would then need to setup a cron job
to remove old files (and an individual could easily DoS the service by
performing a lot of requests until the machine runs out of disk
space).

--
Patryk Zawadzki

Reply all
Reply to author
Forward
0 new messages