[web2py] Caching downloads

291 views
Skip to first unread message

Mariano Reingart

unread,
May 4, 2010, 7:11:42 PM5/4/10
to web...@googlegroups.com
To cache images, I'm trying to do:

@cache(request.env.path_info,60,cache.ram)
def download(): return response.download(request,db)

But seems that is not working:
http://www.web2py.com.ar/raf10dev/default/index
(see images at sidebar, if you quickly reload pages, they fail)

The book says something about response.render, but nothing about download...
Anyway, I'm not sure if this is a good use of @cache, are there any other way ?

BTW, why Cache-Control: no?...

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

mdipierro

unread,
May 4, 2010, 9:23:43 PM5/4/10
to web2py-users
caching downloads does not make sense. This is because the role of
download is to check permissions to download a file (if they are set).
if you cache it then you do not check. If you do not need to check do
not use download. Use

def mydownload():
return
response.stream(open(os.path.join(request.folder,'uploads',request.args(0)),'rb'))

or better use the web server to download the uploaded files.

Mariano Reingart

unread,
May 4, 2010, 10:04:59 PM5/4/10
to web...@googlegroups.com
I thought so,

I had to modify mydownload so browsers do client-side caching,
speeding up the web-page load:

def fast_download():
# very basic security:
if not request.args(0).startswith("sponsor.logo"):
return download()
# remove/add headers that prevent/favors caching
del response.headers['Cache-Control']
del response.headers['Pragma']
del response.headers['Expires']
filename = os.path.join(request.folder,'uploads',request.args(0))
response.headers['Last-Modified'] = time.strftime("%a, %d %b %Y
%H:%M:%S +0000", time.localtime(os.path.getmtime(filename)))
return response.stream(open(filename,'rb'))

TODO: handle If-Modified-Since (returning 304 if not modified), but as
you said, let the browser do that if so much performance is needed (so
far, fast_download is working fine for me now :-)

Thanks very much for your help, and please let me know if there is
anything wrong with this approach,

Best regards,

Mariano Reingart
http://www.web2py.com.ar
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

mdipierro

unread,
May 4, 2010, 10:25:58 PM5/4/10
to web2py-users
response.stream (which you use) handles if-modified-since and range
requests automatically.

On May 4, 9:04 pm, Mariano Reingart <reing...@gmail.com> wrote:
> I thought so,
>
> I had to modify mydownload so browsers do client-side caching,
> speeding up the web-page load:
>
> def fast_download():
>     # very basic security:
>     if not request.args(0).startswith("sponsor.logo"):
>         return download()
>     # remove/add headers that prevent/favors caching
>     del response.headers['Cache-Control']
>     del response.headers['Pragma']
>     del response.headers['Expires']
>     filename = os.path.join(request.folder,'uploads',request.args(0))
>     response.headers['Last-Modified'] = time.strftime("%a, %d %b %Y
> %H:%M:%S +0000", time.localtime(os.path.getmtime(filename)))
>     return response.stream(open(filename,'rb'))
>
> TODO: handle If-Modified-Since (returning 304 if not modified), but as
> you said, let the browser do that if so much performance is needed (so
> far, fast_download is working fine for me now :-)
>
> Thanks very much for your help, and please let me know if there is
> anything wrong with this approach,
>
> Best regards,
>
> Mariano Reingarthttp://www.web2py.com.arhttp://www.sistemasagiles.com.arhttp://reingart.blogspot.com

Thadeus Burgess

unread,
May 4, 2010, 10:55:49 PM5/4/10
to web...@googlegroups.com
What webserver are you using?

You could use the X-Sendfile header if it supports it. This way the
webserver will send cache headers and web2py does not have to serve
them.

--
Thadeus

Mariano Reingart

unread,
May 4, 2010, 11:39:07 PM5/4/10
to web...@googlegroups.com
I'm using Apache & mod_wsgi

I'm looking at the other thread that massimo suggests changes to
apache.conf, but after using fast_download (changing headers and using
stream) it runs really quickly!

(I know, serving through apache would be even faster, but in this case
I prefer portability and a easy configuration)

You can see how it's running here:

http://www.pyday.com.ar/rafaela2010/

(look at images at the sidebar)

Thanks so much,
http://reingart.blogspot.com

mdipierro

unread,
May 4, 2010, 11:47:20 PM5/4/10
to web2py-users
+1

On May 4, 10:39 pm, Mariano Reingart <reing...@gmail.com> wrote:
> I'm using Apache & mod_wsgi
>
> I'm looking at the other thread that massimo suggests changes to
> apache.conf, but after using fast_download (changing headers and using
> stream) it runs really quickly!
>
> (I know, serving through apache would be even faster, but in this case
> I prefer portability and a easy configuration)
>
> You can see how it's running here:
>
> http://www.pyday.com.ar/rafaela2010/
>
> (look at images at the sidebar)
>
> Thanks so much,
>
> Mariano Reingarthttp://www.web2py.com.arhttp://www.sistemasagiles.com.arhttp://reingart.blogspot.com
>
> On Tue, May 4, 2010 at 11:55 PM, Thadeus Burgess <thade...@thadeusb.com> wrote:
> > What webserver are you using?
>
> > You could use the X-Sendfile header if it supports it. This way the
> > webserver will send cache headers and web2py does not have to serve
> > them.
>
> > --
> > Thadeus
>
> >>> Mariano Reingarthttp://www.web2py.com.arhttp://www.sistemasagiles.com.arhttp://reinga...

Iceberg

unread,
May 6, 2010, 3:49:13 AM5/6/10
to web2py-users
It seems Mariano's story has a happy ending. Congratulations. But on a
second thought, can anyone explain why "if you quickly reload pages,
they fail" in the very first caching-download version? Caching
download can improve speed, can with a side effect of bypassing
priviledge check, but no matter what, it shall not cause content fail
to load.

I remember I once tried @cache(...) but encounter similar problems,
then I give up. :-( Nice to pick it up if someone can throw some
light. Thanks!

Regards,
iceberg

On May5, 11:39am, Mariano Reingart <reing...@gmail.com> wrote:
> ...... after using fast_download (changing headers and using
> stream) it runs really quickly!
>
> (I know, serving through apache would be even faster, but in this case
> I prefer portability and a easy configuration)
>
> You can see how it's running here:
>
> http://www.pyday.com.ar/rafaela2010/
>
> (look at images at the sidebar)
>
> Thanks so much,
>
> Mariano >
>
>
>
> >> On May 4, 9:04 pm, Mariano Reingart <reing...@gmail.com> wrote:
> >>> I thought so,
>
> >>> I had to modify mydownload so browsers do client-side caching,
> >>> speeding up the web-page load:
>
> >>> def fast_download():
> >>>     # very basic security:
> >>>     if not request.args(0).startswith("sponsor.logo"):
> >>>         return download()
> >>>     # remove/add headers that prevent/favors caching
> >>>     del response.headers['Cache-Control']
> >>>     del response.headers['Pragma']
> >>>     del response.headers['Expires']
> >>>     filename = os.path.join(request.folder,'uploads',request.args(0))
> >>>     response.headers['Last-Modified'] = time.strftime("%a, %d %b %Y
> >>> %H:%M:%S +0000", time.localtime(os.path.getmtime(filename)))
> >>>     return response.stream(open(filename,'rb'))
>
> >>> TODO: handle If-Modified-Since (returning 304 if not modified), but as
> >>> you said, let the browser do that if so much performance is needed (so
> >>> far, fast_download is working fine for me now :-)
>
> >>> Thanks very much for your help, and please let me know if there is
> >>> anything wrong with this approach,
>
> >>> Best regards,
>
> >>> Mariano
>
> >>> On Tue, May 4, 2010 at 10:23 PM, mdipierro <mdipie...@cs.depaul.edu> wrote:
> >>> > caching downloads does not make sense. This is because the role of
> >>> > download is to check permissions to download a file (if they are set).
> >>> > if you cache it then you do not check. If you do not need to check do
> >>> > not use download. Use
>
> >>> > def mydownload():
> >>> >     return
> >>> > response.stream(open(os.path.join(request.folder,'uploads',request.args(0)) ,'rb'))

mdipierro

unread,
May 6, 2010, 9:15:19 AM5/6/10
to web2py-users
Can you provide an example of code that causes cache failure?
Remember that you cannot @cache def download because of range
requests.

Chris S

unread,
Jun 30, 2010, 2:57:33 PM6/30/10
to web2py-users
I've had this bookmarked and have been looking over it recently. I
added a c_download (cached download) function as described above to
allow local caching of files. The above code did not get me there
though I ended up using:

def c_download():
controller=request.vars.c
file=request.vars.f
response.headers['Cache-Control']='private'
del response.headers['Content-Type']
del response.headers['Pragma']
del response.headers['Expires']
filename = os.path.join(request.folder,'static',controller,file)
response.headers['Last-Modified'] = time.strftime("%a, %d %b %Y %H:
%M:%S +0000", time.localtime(os.path.getmtime(filename)))
return response.stream(open(filename,'rb'))

The key difference being I found I had to set the 'Cache-Control'
header, just deleting it didn't do the trick.
What I'm not clear on is why this is necessary. From the book:

When static files are downloaded, web2py does not create a session,
nor does it issue a cookie or execute the models. web2py always
streams static files in chunks of 1MB, and sends PARTIAL CONTENT when
the client sends a RANGE request for a subset of the file. web2py
also supports the IF_MODIFIED_SINCE protocol, and does not send the
file if it is already stored in the browser's cache and if the file
has not changed since that version.

Link:
http://web2py.com/book/default/section/4/2?search=supports+the+IF_MODIFIED_SINCE+protocol%2C+and+does+not+send+the+file+if+it+is+already+stored+in+the+browser%27s+cache+and+if+the+file+has+not+changed+since+that+version.

So then, if I serve a style.css file from static, or build a link from
URL() to a file in static. Why do these files get downloaded every
time the page is loaded?

Here's an example. Using http://127.0.0.1:8080/welcome/static/menu.gif
running on the GAE development server I get:
Header:
HTTP/1.0 200
Server: Development/1.0
Date: Wed, 30 Jun 2010 18:37:05 GMT
Content-Type: image/gif
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 264

Cache:
Last Modified Wed Jun 30 2010 13:37:06 GMT-0500 (Central Daylight
Time)
Last Fetched Wed Jun 30 2010 13:37:06 GMT-0500 (Central Daylight Time)
Expires Wed Dec 31 1969 18:00:00 GMT-0600 (Central Standard Time)
Data Size 264
Fetch Count 7
Device disk


Is this working as intended? I *can* wrap every single download in a
function call to c_download, but should that be necessary? Am I just
missing a configuration option somewhere? I feel like I'm re-
inventing the wheel since 'static' files were in my understanding not
meant to change often anyway.

On May 6, 8:15 am, mdipierro <mdipie...@cs.depaul.edu> wrote:
> Can you provide an example of code that causes cache failure?
> Remember that you cannot @cache def download because of range
> requests.
>
> On May 6, 2:49 am, Iceberg <iceb...@21cn.com> wrote:
>
> > It seems Mariano's story has a happy ending. Congratulations. But on a
> > second thought, can anyone explain why "if you quickly reload pages,
> > they fail" in the very first caching-download version? Caching
> > download can improve speed, can with a side effect of bypassing
> > priviledge check, but no matter what, it shall not cause content fail
> > to load.
>
> > I remember I once tried @cache(...) but encounter similar problems,
> > then I give up. :-(  Nice to pick it up if someone can throw some
> > light. Thanks!
>
> > Regards,
> > iceberg
>
> > On May5, 11:39am, Mariano Reingart <reing...@gmail.com> wrote:
>
> > > ...... after usingfast_download(changing headers and using
> > > >>> far,fast_downloadis working fine for me now :-)

mdipierro

unread,
Jun 30, 2010, 4:01:08 PM6/30/10
to web2py-users
Unfortunately settings cache-control breaks IE with SSL

http://support.microsoft.com/kb/316431
> Link:http://web2py.com/book/default/section/4/2?search=supports+the+IF_MOD....
>
> So then, if I serve a style.css file from static, or build a link from
> URL() to a file in static.  Why do these files get downloaded every
> time the page is loaded?
>
> Here's an example.  Usinghttp://127.0.0.1:8080/welcome/static/menu.gif

Chris S

unread,
Jun 30, 2010, 4:25:09 PM6/30/10
to web2py-users
I'm not sure I understand the comment. Following the link it says
that Microsoft file formats can not be opened if the cache-control is
set to no-cache.

What I'm seeing is cache-control is *always* set to no-cache when I
expected it to allow caching of files in /static.
It seems that with the above support issue IE would be unable to open
any Microsoft document served by Web2py because the cache-control is
always being set to no-cache.

Shouldn't files in static always be served with cache enabeled?

mdipierro

unread,
Jun 30, 2010, 4:35:21 PM6/30/10
to web2py-users
I may have sent the wrong link. There are two issues:

1) we tried to set a cache for static files in the past and we run
into problems with ssl and ie. This was discussed at lenght in an old
thread but I cannot find it now.
2) It is unclear whether serving static content should cache. In a
production environment with apache yes. I am in favor of caching as
long as the expire time is small since we do not have a mechanism for
setting it.

If you send me a patch and we try on different browsers with and
without ssl, than we can include it.

Chris S

unread,
Jun 30, 2010, 4:42:31 PM6/30/10
to web2py-users
I'll take a look and see what I can do.

Can you point me to where this is happening? I see a streamer.py but
no where does it set Cache-Control = no-cache.
Where is that decision being made?
I'm assuming in the same gluon module I'll find the logic behind the
auto-stream of /static/filename?

mdipierro

unread,
Jun 30, 2010, 4:56:19 PM6/30/10
to web2py-users
If I understand you talk about normal static files. That is done in
gluon/main.py

static_file = parse_url(request, environ)
if static_file:
if request.env.get('query_string', '')[:10] ==
'attachment':
response.headers['Content-Disposition'] =
'attachment'
response.stream(static_file, request=request)
Message has been deleted

Chris S

unread,
Jul 1, 2010, 12:45:29 AM7/1/10
to web2py-users
Got it. That was driving me nuts.

By default the static folder is handled by App.yaml on GAE. But no
expiration date was set in the default file provided with web2py.
Adding an expiration date causes static files to start properly
caching again. I also tested removing the "static" section of
app.yaml and that allows web2py to handle the files if you prefer.

I've e-mailed this to Mdipierro, but here's what it looks like for
anyone that wants to enable the caching on their current GAE
applications.

You really just need to add an expiration time, here I've chosen 90
days.
----------Patch---------------
diff -r a7af8604b5e4 app.yaml
--- a/app.yaml Tue Jun 29 17:13:00 2010 -0500
+++ b/app.yaml Wed Jun 30 23:22:58 2010 -0500
@@ -9,6 +9,7 @@
static_files: applications/\1/static/\2
upload: applications/(.+?)/static/(.+)
secure: optional
+ expiration: "90d"

- url: /admin-gae/.*
script: $PYTHON_LIB/google/appengine/ext/admin
---------/Patch---------------
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages