using Tornado as a proxy

7,876 views
Skip to first unread message

Bill Janssen

unread,
Feb 4, 2011, 1:35:07 PM2/4/11
to Tornado Web Server
I'm writing an application in which some calls need to authenticate
the user, then do some work on behalf of that user, but most calls are
simply passed on to other services on behalf of the authenticated
user. So I'd like to have a real programming language available to do
the work, but I'd also like to use a framework which makes it easy to
forward or proxy calls to other servers.

Tornado looks like a good bet; Python for programming, good auth
library, httpclient module.

However, I don't see any built-in support for proxying an instance of
httpserver.HTTPRequest to another server. Ideally, I'd like a
function that transforms an instance of httpserver.HTTPRequest to an
instance of httpclient.HTTPRequest, doing appropriate proxy mods to
the headers. It should handle POST with multipart/form-data, too.
And, of course, a matching function that takes an instance of
httpclient.HTTPResponse and an instance of web.RequestHandler, and
appropriately returns that response back to the original browser (or
caller).

Anyone have such code lying around? Thanks.

Bill

Bill Janssen

unread,
Feb 5, 2011, 9:41:20 PM2/5/11
to Tornado Web Server
Thought I'd share what I've come up with so far (which seems to
work). Apologies for the line-wrapping; I'd think that Google of all
companies would have some way to avoid it.

Bill

class ForwardingRequestHandler (tornado.web.RequestHandler):

def handle_response(self, response):
if response.error and not isinstance(response.error,
tornado.httpclient.HTTPError):
note("response has error %s", response.error)
self.set_status(500)
self.write("Internal server error:\n" +
str(response.error))
self.finish()
else:
self.set_status(response.code)
for header in ("Date", "Cache-Control", "Server", "Content-
Type", "Location"):
v = response.headers.get(header)
if v:
self.set_header(header, v)
if response.body:
self.write(response.body)
self.finish()

def forward(self, port=None, host=None):
try:
tornado.httpclient.AsyncHTTPClient().fetch(
tornado.httpclient.HTTPRequest(
url="%s://%s:%s%s" % (
self.request.protocol, host or "127.0.0.1",
port or 80, self.request.uri),
method=self.request.method,
body=self.request.body,
headers=self.request.headers,
follow_redirects=False),
self.handle_response)
except tornado.httpclient.HTTPError, x:
note("tornado signalled HTTPError %s", x)
if hasattr(x, response) and x.response:
self.handle_response(x.response)
except tornado.httpclient.CurlError, x:
note("tornado signalled CurlError %s", x)
self.set_status(500)
self.write("Internal server error:\n" +
''.join(traceback.format_exception(*sys.exc_info())))
self.finish()
except:
self.set_status(500)
self.write("Internal server error:\n" +
''.join(traceback.format_exception(*sys.exc_info())))
self.finish()

Srini

unread,
Feb 6, 2011, 8:30:29 AM2/6/11
to Tornado Web Server
This is seems like a nice use of httpclient application. Thanks for
sharing.

Jacob Kristhammar

unread,
Feb 6, 2011, 12:07:15 PM2/6/11
to python-...@googlegroups.com
If you're running Tronado behind nginx and don't need to intercept the response before sending it back to the client, the nginx XSendfile [1] feature is great.

Basically it will let you process the incoming request and do whatever you like, e.g. authentication.
After that you write and empty response back to nginx with the special X-Accel-Redirect header set to a new location.
nginx will intercept you're response and pass the original request to the new location.

-- Jacob

David Birdsong

unread,
Feb 7, 2011, 12:37:03 AM2/7/11
to python-...@googlegroups.com
On Sun, Feb 6, 2011 at 12:07 PM, Jacob Kristhammar
<krist...@gmail.com> wrote:
> If you're running Tronado behind nginx and don't need to intercept the
> response before sending it back to the client, the nginx XSendfile [1]
> feature is great.
> Basically it will let you process the incoming request and do whatever you
> like, e.g. authentication.
> After that you write and empty response back to nginx with the special
> X-Accel-Redirect header set to a new location.
> nginx will intercept you're response and pass the original request to the
> new location.
> -- Jacob

++ i used nginx and tornado to front mogilefs and served images for a
major image hosting site this way. tornado was the 'app' layer that
helped key into the correct sharded mogilefs cluster and then spoke to
the trackers using iostreams (full non blocking). once all the
sharding + actual db lookup was complete, tornado kicked out an
X-Accel-Redirect containing all the info nginx needed to proxy the
file from a remote mogilefs storage node.

this is of course also useful for locally stored files.

Bill Janssen

unread,
Feb 8, 2011, 4:28:02 PM2/8/11
to Tornado Web Server
Good build, Jacob. Thank you.

Bill

pivt

unread,
Aug 29, 2011, 4:54:26 AM8/29/11
to python-...@googlegroups.com
hi, Bill

I want to use tornado as a proxy, in my case, I need to specific a IP for that request.
So my script gets 2 arguments: IP(GET parameter) and request(Multipart POST parameter), and I have to parse the POST parameter.
When I run this script, there is a assertion:
    assert not self._finished
AssertionError
ERROR:root:Cannot send error response after headers written

I don't know what't going on.My code is:
from tornado.simple_httpclient import SimpleAsyncHTTPClient
from tornado.httpclient import HTTPRequest
from tornado import httputil
from tornado.escape import utf8, native_str, parse_qs_bytes
import tornado.ioloop
import logging
import tornado.web

class _BadRequestException(Exception):
    """Exception class for malformed HTTP requests."""
    pass

class ProxyHandler(tornado.web.RequestHandler):
    def post(self):
        ip = self.get_argument('ip')
        req = self.get_argument('request')
        _host = self._parse_host(req)
        req_obj = self._gen_request(req, _host)
        client = SimpleAsyncHTTPClient(hostname_mapping = {_host : ip})
        client.fetch(req_obj, self._callback)

    def _callback(self, response):
        if response.error and not isinstance(response.error, tornado.httpclient.HTTPError):
            logging.info('response has error %s' % response.error)
            self.set_status(500)
            self.write('Internal server error:\n' + str(response.error))
            self.finish()
        else:
            self.set_status(response.code)
            for header in ("Date", "Cache-Control", "Server", "Content-Type", "Location"): 
                v = response.headers.get(header)
                if v:
                    self.set_header(header, v)
            if response.body:
                self.write(response.body)
            self.finish()

    def _parse_host(self, request):
        return request[request.find('\r\nHost: ') + 8 : request.find('\r\n', request.find('Host: '))]

    def _gen_request(self, request, host):
        try:
            data = native_str(request.decode('latin1'))
            eol = data.find('\r\n')
            eol2 = data.find('\r\n\r\n')
            start_line = data[:eol]
            try:
                _method, _uri, _version = start_line.split(' ')
            except ValueError:
                raise _BadRequestException('Malformed HTTP request line')
            if not _version.startswith('HTTP/'):
                raise _BadRequestException('Malformed HTTP version in HTTP Request-Line')
            _headers = httputil.HTTPHeaders.parse(data[eol:eol2])
            content_length = _headers.get("Content-Length")
            if content_length:
                _body = data[eol2+4:]
            http_request = HTTPRequest(
                    url = "http://%s:80%s" % (host, _uri), 
                    method = _method, 
                    body = _body, 
                    headers = _headers, 
                    follow_redirects=False)
            return http_request
 
        except _BadRequestException, e:
            logging.info("Malformed HTTP request. %s", e)
            return

if __name__ == '__main__':
    application = tornado.web.Application([(r'/proxy', ProxyHandler),])
    application.listen(8081)
    tornado.ioloop.IOLoop.instance().start()

Andrew Fort

unread,
Aug 29, 2011, 10:30:31 AM8/29/11
to python-...@googlegroups.com
On Mon, Aug 29, 2011 at 1:54 AM, pivt <piv...@gmail.com> wrote:
> hi, Bill
> I want to use tornado as a proxy, in my case, I need to specific a IP for
> that request.
> So my script gets 2 arguments: IP(GET parameter) and request(Multipart POST
> parameter), and I have to parse the POST parameter.
> When I run this script, there is a assertion:
>     assert not self._finished
> AssertionError
> ERROR:root:Cannot send error response after headers written

> class ProxyHandler(tornado.web.RequestHandler):
>     def post(self):

You have an asynchronous style handler (i.e., you have the HTTP client
fire your callback, where you must manually call self.finish()), but
the request method is not decorated that way, so self.finish() is
called twice (once at the end of post(), once at the end of
_callback()), hence the assertion failure that self._finished is
already true.

To fix that, add the decorator to your post method, like so;

@tornado.web.asynchronous
def post(self):

Cheers,
Andrew

pivt

unread,
Sep 1, 2011, 4:18:27 AM9/1/11
to python-...@googlegroups.com
Now there is another problem with my proxy script, 
When I send a request to my proxy server, it works. But when I send another request, it turns out to be wrong, which log:
WARNING:root:uncaught exception
Traceback (most recent call last):
  File "/home/pivd/bin/python/lib/python2.7/site-packages/tornado-2.0-py2.7.egg/tornado/simple_httpclient.py", line 259, in cleanup
    yield
  File "/home/pivd/bin/python/lib/python2.7/site-packages/tornado-2.0-py2.7.egg/tornado/simple_httpclient.py", line 162, in __init__
    0, 0)
gaierror: [Errno -2] Name or service not known
INFO:root:response has error [Errno -2] Name or service not known
ERROR:root:500 POST /proxy?ip=192.168.0.2 (192.168.0.3) 1.79ms

When I restart this script, it works again, I don't know what's going on:(

Ben Darnell

unread,
Sep 2, 2011, 12:47:06 PM9/2/11
to python-...@googlegroups.com
You're being bitten by the fact that AsyncHTTPClient instances are magically reused (in order to provide e.g. limits on the number of simultaneous connections.  Specifically, you can't really change the hostname_mapping; it's assumed to be a process-wide constant analogous to /etc/hosts.  I recommend creating a single AsyncHTTPClient instance at startup and reusing it.  You'll need to specify what host/ip you're connecting to in the request rather than modifying hostname_mapping.

-Ben

Drew Whitehouse

unread,
Sep 2, 2011, 7:48:50 PM9/2/11
to python-...@googlegroups.com
I'm doing something like this by sharing tornado's cookie_secret with the other services. The other services can then decode the cookie and get the user_id to authenticate. I'm using mongrel2 in front of tornado, it proxies through to tornado for login etc but ajax services are served directly via mongrel2 handlers without a trip through tornado. 

-Drew

pivt

unread,
Sep 4, 2011, 10:38:24 PM9/4/11
to python-...@googlegroups.com
thanks Ben, but I can't find host/ip attribute what I am connecting to in the request object, the source code about the HTTPRequest object is:
class HTTPRequest(object):
    """A single HTTP request.

    .. attribute:: method

       HTTP request method, e.g. "GET" or "POST"

    .. attribute:: uri

       The requested uri.

    .. attribute:: path

       The path portion of `uri`

    .. attribute:: query

       The query portion of `uri`

    .. attribute:: version

       HTTP version specified in request, e.g. "HTTP/1.1"

    .. attribute:: headers

       `HTTPHeader` dictionary-like object for request headers.  Acts like
       a case-insensitive dictionary with additional methods for repeated
       headers.

    .. attribute:: body

       Request body, if present.

    .. attribute:: remote_ip

       Client's IP address as a string.  If `HTTPServer.xheaders` is set,
       will pass along the real IP address provided by a load balancer
       in the ``X-Real-Ip`` header

    .. attribute:: protocol

       The protocol used, either "http" or "https".  If `HTTPServer.xheaders`
       is seet, will pass along the protocol used by a load balancer if
       reported via an ``X-Scheme`` header.

    .. attribute:: host

       The requested hostname, usually taken from the ``Host`` header.

    .. attribute:: arguments

       GET/POST arguments are available in the arguments property, which
       maps arguments names to lists of values (to support multiple values
       for individual names). Names and values are both unicode always.

    .. attribute:: files

       File uploads are available in the files property, which maps file
       names to list of files. Each file is a dictionary of the form
       {"filename":..., "content_type":..., "body":...}. The content_type
       comes from the provided HTTP header and should not be trusted
       outright given that it can be easily forged.

    .. attribute:: connection

       An HTTP request is attached to a single HTTP connection, which can
       be accessed through the "connection" attribute. Since connections
       are typically kept open in HTTP/1.1, multiple requests can be handled
       sequentially on a single connection.
    """
 

pivt

unread,
Sep 4, 2011, 10:56:34 PM9/4/11
to python-...@googlegroups.com
oh, I just found that I read the wrong code, the HTTPRequest object's code would be:
class HTTPRequest(object):
    """HTTP client request object."""
    def __init__(self, url, method="GET", headers=None, body=None,
                 auth_username=None, auth_password=None,
                 connect_timeout=20.0, request_timeout=20.0,
                 if_modified_since=None, follow_redirects=True,
                 max_redirects=5, user_agent=None, use_gzip=True,
                 network_interface=None, streaming_callback=None,
                 header_callback=None, prepare_curl_callback=None,
                 proxy_host=None, proxy_port=None, proxy_username=None,
                 proxy_password='', allow_nonstandard_methods=False,
                 validate_cert=True, ca_certs=None,
                 allow_ipv6=None):
        """Creates an `HTTPRequest`.

        All parameters except `url` are optional.

        :arg string url: URL to fetch
        :arg string method: HTTP method, e.g. "GET" or "POST"
        :arg headers: Additional HTTP headers to pass on the request
        :type headers: `~tornado.httputil.HTTPHeaders` or `dict`
        :arg string auth_username: Username for HTTP "Basic" authentication
        :arg string auth_password: Password for HTTP "Basic" authentication
        :arg float connect_timeout: Timeout for initial connection in seconds
        :arg float request_timeout: Timeout for entire request in seconds
        :arg datetime if_modified_since: Timestamp for ``If-Modified-Since``
           header
        :arg bool follow_redirects: Should redirects be followed automatically
           or return the 3xx response?
        :arg int max_redirects: Limit for `follow_redirects`
        :arg string user_agent: String to send as ``User-Agent`` header
        :arg bool use_gzip: Request gzip encoding from the server
        :arg string network_interface: Network interface to use for request
        :arg callable streaming_callback: If set, `streaming_callback` will
           be run with each chunk of data as it is received, and 
           `~HTTPResponse.body` and `~HTTPResponse.buffer` will be empty in 
           the final response.
        :arg callable header_callback: If set, `header_callback` will
           be run with each header line as it is received, and 
           `~HTTPResponse.headers` will be empty in the final response.
        :arg callable prepare_curl_callback: If set, will be called with
           a `pycurl.Curl` object to allow the application to make additional
           `setopt` calls.
        :arg string proxy_host: HTTP proxy hostname.  To use proxies, 
           `proxy_host` and `proxy_port` must be set; `proxy_username` and 
           `proxy_pass` are optional.  Proxies are currently only support 
           with `curl_httpclient`.
        :arg int proxy_port: HTTP proxy port
        :arg string proxy_username: HTTP proxy username
        :arg string proxy_password: HTTP proxy password
        :arg bool allow_nonstandard_methods: Allow unknown values for `method` 
           argument?
        :arg bool validate_cert: For HTTPS requests, validate the server's
           certificate?
        :arg string ca_certs: filename of CA certificates in PEM format,
           or None to use defaults.  Note that in `curl_httpclient`, if
           any request uses a custom `ca_certs` file, they all must (they
           don't have to all use the same `ca_certs`, but it's not possible
           to mix requests with ca_certs and requests that use the defaults.
        :arg bool allow_ipv6: Use IPv6 when available?  Default is false in 
           `simple_httpclient` and true in `curl_httpclient`
        """

But once again, I can't find the host/ip attribute. 

Phil Whelan

unread,
Sep 5, 2011, 1:32:27 AM9/5/11
to python-...@googlegroups.com
Hi pivt,


On Sun, Sep 4, 2011 at 7:56 PM, pivt <piv...@gmail.com> wrote:
> But once again, I can't find the host/ip attribute. 

I think you can just specify it in the URL...


           http_request = HTTPRequest(
                   url = "http://%s:80%s" % (host, _uri),

change to...

           http_request = HTTPRequest(
                   url = "http://%s:80%s" % (ip, _uri),

or "ip" could be the actual hostname specific to that IP. Currently you seem to use the hostname of the proxy here, which I do not understand the reasoning behind.

Would that work for you?

Cheers,
Phil



pivt

unread,
Sep 5, 2011, 4:13:26 AM9/5/11
to python-...@googlegroups.com
Thank you Phil & Ben:)

I know what you are talking about right now.
Reply all
Reply to author
Forward
0 new messages