[cherrypy-devel] Serving files when using sessions

1 view
Skip to first unread message

Lida Tang

unread,
Feb 4, 2008, 11:56:43 PM2/4/08
to cherryp...@googlegroups.com

When serving files, say a few hundred megs of video, cherrypy will use a
generator so you don't incur the memory and speed costs of trying to write
all the data at once. However, if you are using a session, due to the fix to
http://www.cherrypy.org/ticket/594, the generator will be collapsed before
the session is saved.

This delays the response to the request by many seconds while python is busy
doing file IO and memcpys.

I am not sure why saving of a session must be after the response has been
collapsed, and why you can't wait until on_end_request. I guess there could
some timing problem that could occur if the generator is modifying session
data? And waiting until on_end_request would lock the session object for too
long causing performance issues?

For now, I am just commented out the collapse_body call since I am not
depending on complex session data at the moment.
--
View this message in context: http://www.nabble.com/Serving-files-when-using-sessions-tp15283573p15283573.html
Sent from the cherrypy-devel mailing list archive at Nabble.com.

Robert Brewer

unread,
Feb 5, 2008, 12:40:05 AM2/5/08
to cherryp...@googlegroups.com
Lida Tang wrote:
> When serving files, say a few hundred megs of video, cherrypy
> will use a generator so you don't incur the memory and speed
> costs of trying to write all the data at once. However, if you
> are using a session, due to the fix to ticket 594, the generator

> will be collapsed before the session is saved.
>
> This delays the response to the request by many seconds while
> python is busy doing file IO and memcpys.
>
> I am not sure why saving of a session must be after the response
> has been collapsed, and why you can't wait until on_end_request.

According to http://www.cherrypy.org/changeset/1426, you *can* wait
until on_end_request if you set response.stream to True. Have you tried
doing that?

> I guess there could some timing problem that could occur if the
> generator is modifying session data? And waiting until
> on_end_request would lock the session object for too long
> causing performance issues?

Right on both counts. However, setting response.stream = True should
allow you to declare you're aware of those risks and have mitigated
them.


Robert Brewer
fuma...@aminus.org

Lida Tang

unread,
Feb 5, 2008, 1:06:29 AM2/5/08
to cherryp...@googlegroups.com

Aren't there other consequences for setting request.stream is True such as
content-length not being set?

Why isn't save on_end_request for all requests? Does locking the session
mean that no other threads will run during that time?

I want to make sure I understand all the risks involved before deciding if I
want to set streaming to true or live with the hack I made.
--
View this message in context: http://www.nabble.com/Serving-files-when-using-sessions-tp15283573p15284082.html

Robert Brewer

unread,
Feb 5, 2008, 1:24:31 AM2/5/08
to cherryp...@googlegroups.com
Lida Tang wrote:

> Robert Brewer wrote:
> > Lida Tang wrote:
> >> When serving files, say a few hundred megs of video, cherrypy
> >> will use a generator so you don't incur the memory and speed
> >> costs of trying to write all the data at once. However, if you
> >> are using a session, due to the fix to ticket 594, the generator
> >> will be collapsed before the session is saved.
> >>
> >> This delays the response to the request by many seconds while
> >> python is busy doing file IO and memcpys.
> >>
> >> I am not sure why saving of a session must be after the response
> >> has been collapsed, and why you can't wait until on_end_request.
> >
> > According to http://www.cherrypy.org/changeset/1426, you *can* wait
> > until on_end_request if you set response.stream to True. Have you
> > tried doing that?
> >
> >> I guess there could some timing problem that could occur if the
> >> generator is modifying session data? And waiting until
> >> on_end_request would lock the session object for too long
> >> causing performance issues?
> >
> > Right on both counts. However, setting response.stream = True should
> > allow you to declare you're aware of those risks and have mitigated
> > them.
>
> Aren't there other consequences for setting request.stream is True
> such as content-length not being set?

No, at least not in trunk:

if self.stream:
if dict.get(headers, 'Content-Length') is None:
dict.pop(headers, 'Content-Length', None)

The Content-Length header is only popped off if it's set to None, and it
won't be None because serve_file sets it to the file length.

> Why isn't save on_end_request for all requests? Does locking the
> session mean that no other threads will run during that time?

Threads which are trying to access the same session will be blocked.
That may seem like a low order of probability, but so many apps run
multiple requests in parallel nowadays, it's best to release the session
lock as soon as possible (but no sooner).

> I want to make sure I understand all the risks involved before
deciding
> if I want to set streaming to true or live with the hack I made.

There aren't many other risks (grep -R \.stream cherrypy is pretty
short). You should also read http://www.cherrypy.org/wiki/ReturnVsYield.
But it sounds to me like your use case is exactly what response.stream
was designed for.


Robert Brewer
fuma...@aminus.org

Reply all
Reply to author
Forward
0 new messages