First cut at improving BehindApache recipe for 2.2

7 views
Skip to first unread message

Jason Earl

unread,
Feb 9, 2006, 5:06:23 PM2/9/06
to cherryp...@googlegroups.com

Here's my first cut at improving the Behind Apache recipe so that it
works with CherryPy 2.2. I am planning on changing the bit about
''Locating your application away from the root of the host'', but I
need to do a little testing first to see exactly how it should work.
I think that it can be done much more simply than the example given in
the recipe. More importantly, unless I am mistaken example code with
stuff like cpg.request... doesn't actually work any more, does it?

I also would like to add something about SSL connections, but I am
still trying to figure out what exactly I want to say and whether or
not it goes on this particular page. Currently, when I ask my test
server for https://jearl.dsl.xmission.com/data I get redirected to
http://jearl.dsl.xmission.com/data/ instead of an https url.
Apparently CherryPy is not aware that it is serving up secure content
when you place it behind Apache. That's probably the sort of thing
that you want a recipe for. If there is a standard way to do this
then I would be happy to write about it (assuming someone tells me how
to do it). If there isn't a standard way then I will play with
Apache's mod_headers and see if I can't find a way to pass that
information to CherryPy.

Jason

cherrypy-wiki.txt

Sylvain Hellegouarch

unread,
Feb 10, 2006, 11:23:54 AM2/10/06
to cherry...@googlegroups.com
Hi Jason,

Cheers for that. I am currently working on the new doc system we said we
would put online soon so I might not put your code online straight away.

If you have other updates, forward them to this list :)

Thanks again
- Sylvain

Jason Earl a écrit :

> ------------------------------------------------------------------------
>
> Running CherryPy behind Apache through mod_rewrite >
> Configuring Apache >
> [http://httpd.apache.org/docs/mod/mod_rewrite.html mod_rewrite documentation]
>
> Let's assume that CP application is listening on port 8000. The thing I did was add to the apache's config file (usually /etc/apache/httpd.conf or /etc/httpd/conf/httpd.conf) the following lines (mod_rewrite works as well with .htaccess if you cannot edit your httpd.conf) :
> {{{
> RewriteEngine on
> RewriteRule ^(.*) http://127.0.0.1:8000$1 [P]
> }}}
> in the proper !VirtualHost directive. Be careful with Directory directives because Apache will strip the directory prefix for pattern matching and not add it back. So the above configuration would result in Apache trying to proxy
> {{{
> http://127.0.0.1:8000page
> }}}
> instead of
> {{{
> http://127.0.0.1:8000/page
> }}}
> You would remedy this situation by adding a '/' or whatever prefix you need into the rewrite rule. For example:
> {{{
> RewriteEngine on
> RewriteRule ^(.*) http://127.0.0.1:8000/$1 [P]
> }}}
>
>
> If you want to configure Apache to serve all your static files directly (and thus free CherryPy from this task), use the a configuration like this:
> {{{
> RewriteEngine on
> RewriteRule ^/static/(.*) /home/user/files/static/$1 [L]
> RewriteRule ^(.*) http://127.0.0.1:8000$1 [P]
> }}}
>
> If you don't want to (or can not) use Apache's Virtual Hosts, just add one line after !RewriteEngine. For example, you want to map the requests to the www.example.info host to your !CherryPy, so you get:
> {{{
> RewriteEngine on
> RewriteCond %{HTTP_HOST} www\.example\.info
> RewriteRule ^(.*) http://127.0.0.1:8000$1 [P]
> }}}
>
> If your application is not running and a user tries to access it, Apache will give him 502 Proxy Error. So, there's an easy way to start the application then: just add the !ErrorDocument directive that runs the CGI script starting your application and redirecting to it. You will also need to disable the mod_rewrite for that script (otherwise apache would try to get the CGI script from your CP application, and get another 502 error). So, I added 2 more lines to my configuration, and it now looks like this:
> {{{
> RewriteEngine on
> RewriteCond %{SCRIPT_FILENAME} !autostart\.cgi$
> RewriteCond %{HTTP_HOST} www\.example\.info
> RewriteRule ^(.*) http://127.0.0.1:8000/$1 [P]
> ErrorDocument 502 /cgi-bin/autostart.cgi
> }}}
>
> The autostart.cgi file is a 5-line python script:
> {{{
> #!python
> #!/usr/local/bin/python
> print "Content-type: text/html\r\n"
> print """<html><head><META HTTP-EQUIV"Refresh" CONTENT"1; URL/"></head><body>Restarting site ...<a href"/">click here<a></body></html>"""
> import os
> os.setpgid(os.getpid(), 0)
> os.system('/usr/local/bin/python2.4 webserver.py &')
> }}}
>
> If you get ''"Forbidden - You don't have permission to access / on this server"'' errors, try enabling the proxy module.
>
> Note: The "os.setpgid(os.getpid(), 0)" line seems to prevent Apache from killing the CP process after a period of inactivity (many thanks to Matt Lewis for this trick).
>
> Getting the right Host header in CherryPy >
> One problem with this setup is that requests that arrive to !CherryPy will look like they're coming from "localhost" (the "Host" header will say "localhost:port)
>
> This is not a problem if you're only using relative or absolute URLs (the browser will do the right thing), but this is a problem if you're using canonical URLs (URLs that include the protocol and domain name) generated by !CherryPy
>
> This is also a problem if you want to issue a redirect because a redirect should include a canonical URL.
>
> Warning: This is an especially important issue because !CherryPy will create redirects for you in cases where the url is missing a final '/'. Redirecting users to http://127.0.0.1:8080/data/ when they typed in http://www.example.info/data is not likely to win you many friends.
>
> The way to work around this is to use the !BaseUrlFilter, which tells !CherryPy to use a different "Host" header than the one coming from the request.
>
> For '''Apache 1.x''', you can tell !CherryPy what the actual "Host" should be, like this:
> {{{
> #!python
> from cherrypy.lib.filter.baseurlfilter import BaseUrlFilter
> class Root:
> _cpFilterList [BaseUrlFilter(baseUrl 'http://mydomain.com')]
> ....
> }}}
>
> Update: This recipe has changed for !CherryPy 2.2. Setting the base url is now much simpler.
>
> Instead of mucking about with baseurlfilter simply set a few configuration options. To be precise you need to set the ''base_url_filter.on'' option and the ''base_url_filter.base_url'' option. I tend to do this in my code like this:
>
> {{{
> #!python
> cherrypy.config.update({'base_url_filter.on': True})
> cherrypy.config.update({'base_url_filter.base_url':
> "http://www.example.info"})
> cherry.server.start()
> }}}
>
> You can do the same thing by modifying the configuration file so that it looks like:
>
> Warning: I haven't actually tried this :)
>
> {{{
> [global]
> server.socketPort 8080
> server.threadPool 10
> server.environment "production"
> base_url_filter.on True
> base_url_filter.base_url "http://www.example.info"
> }}}
>
> Or, if you're running your site behind '''Apache 2.x''' (or newer 1.x - works for 1.3.33), you can tell !CherryPy to use the '''X-Forwarded-Host''' header provided by Apache, like this:
> {{{
> #!python
> from cherrypy.lib.filter.baseurlfilter import BaseUrlFilter
> class Root:
> _cpFilterList [BaseUrlFilter(useXForwardedHost True)]
> ....
> }}}
>
> Update: This is also easier in !CherryPy 2.2.
>
> In !CherryPy 2.2 this is also a configuration option ''base_url_filter.use_x_forwarded_host''. You can set it in your code using:
>
> {{{
> #!python
> cherrypy.config.update({'base_url_filter.on': True})
> cherrypy.config.update({'base_url_filter.use_x_forwarded_host': True})
> cherry.server.start()
> }}}
>
> You can set it via configuration file with:
>
> {{{
> [global]
> server.socketPort 8080
> server.threadPool 10
> server.environment "production"
> base_url_filter.on True
> base_url_filter.use_x_forwarded_host True
> }}}
>
> TIP: If you are using Apache 2.x and but also want to view your site directly (i.e. without going through Apache) you can use both baseUrl and X-Forwarded-Host together. For example if your CP server is running on port 8080:
> {{{
> #!python
> from cherrypy.lib.filter.baseurlfilter import BaseUrlFilter
> class Root:
> _cpFilterList [BaseUrlFilter(baseUrl 'localhost:8080', useXForwardedHost True)]
> ....
> }}}
> Some people call this abuse of the baseUrl; Other's think it's cool. Regardless, you'll need something like this if you're viewing directly and want redirects to work correctly.
>
> Redirects will still break if Apache is serving HTTPS, instead of HTTP. This patch forces the canonical URL to be HTTPS:
>
> {{{
> --- _cphttptools.py.orig
> +++ _cphttptools.py
> @@ -166,7 +166,7 @@
> parseFirstLine(requestLine)
> cookHeaders(clientAddress, remoteHost, headers, requestLine)
>
> - cpg.request.base "http://" + cpg.request.headerMap['Host']
> + cpg.request.base "https://" + cpg.request.headerMap['Host']
> cpg.request.browserUrl cpg.request.base + cpg.request.browserUrl
> cpg.request.isStatic False
> cpg.request.parsePostData True


> }}}
>
> Locating your application away from the root of the host >

> If you want to locate your application at a subdirectory of the URL space - for example, you want {{{http://my.domain.com/myapp/}}} to be the location of your application - you can use mod_proxy from within a {{{<Location>}}} element.
>
> ''Note'': This only works for Apache 2. It also doesn't require mod_rewrite.
>
> The following snippet from the Apache configuration file shows how to set this up:
>
> {{{
> <Location /myapp>
> ProxyPass http://localhost:8080
> ProxyPassReverse http://localhost:8080
> RequestHeader set CP-Location /myapp
> </Location>
> }}}
>
> The !ProxyPass and !ProxyPassReverse directives make Apache pass any requests for URLs below /myapp onto the !CherryPy server.
>
> The !RequestHeader directive adds a custom header to all requests going to !CherryPy. While this is not strictly necessary, it makes it possible for the application to use absolute URLs without too much difficulty.
>
> The problem with absolute URLs can be illustrated as follows. If your application has a login page located at "/login", you quite likely ''don't'' want to refer to it using a relative URL, as this would mean that the URL would have to be changed when you change the structure of your site. But if you use the URL "/login", this will work with the standalone !CherryPy server, but not from behind Apache (as /login is not located inside the /myapp location).
>
> So, I added a !RequestHeader directive to let the request "know" where the base of the application is. You can find the base by using {{{cpg.request.headerMap.get("Cp-Location","")}}} and then add that to the beginning of any absolute URL. The following function automates this:
>
> {{{
> #!python
> def build_url(url):
> if url.startswith('/'):
> location cpg.request.headerMap.get("Cp-Location", "")
> return location + url
> return url
> }}}
>
> You may also be able to get !BaseUrlFilter to help here - but so far, my attempts to do so have failed. You certainly can't use a request header, as the filter is created before you have a request to use. So you'd have to hard-code the location (or get it from a config file) which means that you wouldn't have something that would work unchanged (and, indeed, simultaneously) both directly (using {{{http://localhost:8080}}} URLs) and behind Apache. This isn't likely to be a huge issue in practice, though.
>
> Beware the encoding bug >
> URL's that are requested via HTTP must be escaped (%xx-encoded) before they are sent, but Apache2's mod_rewrite unescapes path information which may generate invalid HTTP requests. In particular, spaces (which should be escaped as "%20") are not. If CherryPy recieves a request with a raw space character in the URL, it chokes, because spaces are used to delimit the three parts of a request line (like "GET /path/to%20my/page HTTP/1.1"). A workaround to this is to add the following to your apache configuration:
>
> {{{
> # this cannot be on .htaccess (only on httpd.conf)
> RewriteMap escape int:escape
>
> #and when writing RewriteRule:
> RewriteRule ^(.*)$ http://localhost:6674/${escape:$1} [P]
> #(i.e., use ${escape:$1} instead of $1)
> }}}
>
> AFAIK, this is a bug on mod_rewrite/apache since I've researched HTTP/1.1 and URI RFC's and they all state that there must be only 2 spaces on the HTTP request line, i.e., CherryPy is parsing the request line correctly and Apache is sending invalid HTTP requests. Either way, I think this workaround will help people using CherryPy under apache's modrewrite. I've only tested this on Apache2, I don't know if RewriteMap int:escape exists on older versions of mod_rewrite. But the Apache people seems to be aware of this bug: http://issues.apache.org/bugzilla/show_bug.cgi?id 265
>

Jason Earl

unread,
Feb 10, 2006, 5:02:49 PM2/10/06
to cherry...@googlegroups.com
Sylvain Hellegouarch <s...@defuze.org> writes:

> Hi Jason,
>
> Cheers for that. I am currently working on the new doc system we
> said we would put online soon so I might not put your code online
> straight away.

That's fine. I already know how this works, and now so does google
:).

Playing with Apache integration has been fun. Last night I played
around with setting headers in Apache so that I could tell if a
connection was secure or not (so that I could redirect if they weren't
secure), and today I will spend a little time figuring out how to use
Apache2's mod_deflate to gzip output instead of CherryPy. As I sort
this stuff out I will try and make updates as appropriate. I am also
playing around with using Apache in front of several different
application servers.

> If you have other updates, forward them to this list :)

I'll keep the group updated. I am very interested to see what the new
documentation stuff looks like.

> Thanks again
> - Sylvain

Thank you,
Jason

Reply all
Reply to author
Forward
0 new messages