Hook reentry safety

1 view
Skip to first unread message

Robert Brewer

unread,
Nov 7, 2006, 5:20:34 PM11/7/06
to cherryp...@googlegroups.com
Lots of hooks, builtins and user-defined, need to guard against being
run twice. See http://www.cherrypy.org/wiki/HooksAndTools for a diagram
of potential loops in the request process, and
http://www.cherrypy.org/ticket/599 and
http://www.cherrypy.org/ticket/577 for just a couple of examples of
related problems.

We need to discuss ways to make reentry protection easier to
declare--the current tools all do their own ad-hoc protection, usually
by setting and inspecting custom attributes on cherrypy.request. For
example:

def virtual_host(use_x_forwarded_host=True, **domains):
...
# Guard against running twice.
if hasattr(request, "virtual_prefix"):
return
request.virtual_prefix = domains.get(domain, "")

It would be preferable, in my opinion, to write that this way:

def virtual_host(use_x_forwarded_host=True, **domains):
...
virtual_host.runonce = True

However, "runonce" isn't sufficient, because different hooks/tools have
different circumstances under which they are safe to run twice; several
would be considered broken if they did *not* run twice. I audited all of
the builtin tools for CP 3 yesterday, trying to determine this for each
hook. Here's the initial results just so I don't lose them:


Type 0: Really, truly should only run once regardless of looping
Type 1: Should run once per respond() call (i.e. once per
InternalRedirect)
Type 2: Should run every time response.body changes (but shouldn't run
again on its own output; you don't want to re-gzip a body that's already
gzipped).

S: Safe to run repeatedly. These might not need protection, because
their operation is not dangerous if executed more than once.


S _d.session_auth = SessionAuthTool(cptools.session_auth)
Not sure if "del headers['Content-Length']" is really
safe.
S _d.proxy = Tool('before_request_body', cptools.proxy,
priority=30)
S _d.response_headers = Tool('on_start_resource',
cptools.response_headers)
0 _d.virtual_host = Tool('on_start_resource',
cptools.virtual_host, priority=40)
0 _d.log_tracebacks = Tool('before_error_response',
cptools.log_tracebacks)
0 _d.log_headers = Tool('before_error_response',
cptools.log_request_headers)
0 _d.err_redirect = ErrorTool(cptools.redirect)
1 _d.etags = Tool('before_finalize', cptools.validate_etags)
0 S _d.decode = Tool('before_handler', encoding.decode)
# the order of encoding, gzip, caching is important
2 _d.encode = Tool('before_finalize', encoding.encode,
priority=70)
2 _d.gzip = Tool('before_finalize', encoding.gzip,
priority=80)
1 _d.staticdir = HandlerTool(static.staticdir)
1 _d.staticfile = HandlerTool(static.staticfile)
S _d.expires = Tool('before_finalize', _caching.expires)
2 _d.tidy = Tool('before_finalize', tidy.tidy)
2 _d.nsgmls = Tool('before_finalize', tidy.nsgmls)
0 S _d.ignore_headers = Tool('before_request_body',
cptools.ignore_headers)
0 S _d.referer = Tool('before_request_body', cptools.referer)
0 S _d.basic_auth = Tool('on_start_resource', auth.basic_auth)
0 S _d.digest_auth = Tool('on_start_resource', auth.digest_auth)
0 _d.trailing_slash = Tool('before_handler',
cptools.trailing_slash)
2 _d.flatten = Tool('before_finalize', cptools.flatten)
1 _d.accept = Tool('on_start_resource', cptools.accept)
# _sessions.init must be bound after headers are read
_d.sessions = SessionTool('before_request_body',
_sessions.init)
0 _d.xmlrpc = XMLRPCTool()
1 _d.wsgiapp = WSGIAppTool(_wsgiapp.run)
1 _d.caching = CachingTool('before_handler', _caching.get,
'caching')
2 caching.tee_output

I'd like some verification of my analysis, so if someone could
spot-check my assignments of type-0, 1, and 2, I'd really appreciate it.
And I didn't do sessions yet; a quick look tells me the session hooks
are going to be different types.

So the question is: are the three types sufficient (thinking about
user-provided tools, both real and imaginary)? If they are, we should
provide builtin helpers for this kind of protection for CP 3 final. I'd
like to implement this as attributes on the functions themselves (like
"func.runonce = True"), but I'm not sure if that's the best way. IMO,
they eventually need to be attributes on the Hook objects, and
HookMap.run should do the actual checking and guarding (as it already
does for failsafe).


Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

Reply all
Reply to author
Forward
0 new messages