ZeroMQ based tornado request handlers

1,975 views
Skip to first unread message

ellisonbg

unread,
Feb 18, 2012, 12:21:46 AM2/18/12
to Tornado Web Server
Hi,

I am one of the developers of PyZMQ, the Python bindings to ZeroMQ.
PyZMQ has a tornado based eventloop that fully integrates ZeroMQ
sockets with the tornado event loop. This allows you to build web
servers with ZeroMQ based backends. I wanted to let the group know
that we are currently reviewing a pull request on github:

https://github.com/zeromq/pyzmq/pull/177

That expands the integration between PyZMQ and tornado even further.
The basic idea is that this enables one or more tornado request
handlers to proxy their requests over ZeroMQ sockets to backend
processes in a scalable, load balanced manner. This is done using
ZeroMQ ROUTER/DEALER sockets. No changes to your request handler
classes are needed, you just put them in a separate process with a
small amount of support code to hook everything up. This provides
fine grained scalability in the sense that you can leave some (fast)
handlers in the main tornado web.Application, and move slow/expensive
ones to other load balanced processes.

Seeing that this code integrates so closely with tornado, we wanted to
let the community know and ask for help in reviewing the code. We
create custom subclasses of many of the core classes in tornado.web
and tornado.httpserver. This means we are highly dependent on the
internals of these classes, not just the public APIs.

Happy coding and thanks for an awesome project!

Cheers,

Brian

Shane Spencer

unread,
Feb 18, 2012, 1:20:59 AM2/18/12
to python-...@googlegroups.com
Very cool sir... very cool.

Alexey Kachayev

unread,
Feb 18, 2012, 7:54:25 AM2/18/12
to python-...@googlegroups.com
Hi everybody! 

Actually, I use Tornado + PyZMQ stack in all my projects. I use different approach to separate jobs from main process: I have special abstraction Job and I can explicitly "say" to my application whether I want to push this request to zmq socket or just handle here. It's convenient for example for invalid requests and those situations when I need request's fanout. What benefits can we get from your approach? External HTTP balancer will do the same thing, am I right (we just should run separated Tornado instances on different ports)? 

I didn't read full code carefully, but I see one inconvenience: how can I set timeout per request? In running applications it's very often situation, that after some time of waiting backend response you want to do other action (e.g. finish request with "Internal timeout error").  As far as I understood code, it can be done here:


We just should to check whether Timeout or TimeoutCallback is set (or both) and create DelayCallback to call it if request is not finished yet.

There also some questions to code. E.g. do we really need local variables here:

https://github.com/zeromq/pyzmq/commit/f261a5f9f8dcd75ebe6e9f1f1885bdf00fc05214#L1R255

BR, Alexey.

суббота, 18 февраля 2012 г. пользователь Shane Spencer писал:


--
Kind regards,
Alexey S. Kachayev, Senior Software Engineer
Cogniance Inc.
----------
http://codemehanika.org
Skype: kachayev
Tel: +380-996692092

Brian Granger

unread,
Feb 18, 2012, 7:13:38 PM2/18/12
to python-...@googlegroups.com
Alexey,

2012/2/18 Alexey Kachayev <kach...@gmail.com>:


> Hi everybody!
>
> Actually, I use Tornado + PyZMQ stack in all my projects. I use different
> approach to separate jobs from main process: I have special abstraction Job
> and I can explicitly "say" to my application whether I want to push this
> request to zmq socket or just handle here. It's convenient for example for
> invalid requests and those situations when I need request's fanout. What
> benefits can we get from your approach? External HTTP balancer will do the
> same thing, am I right (we just should run separated Tornado instances on
> different ports)?

The advantage of new approach is that you can start out and develop
your tornado based web app as you normally would (without zeromq).
You just start writing RequestHandler classes and putting them in a
single tornado web.Application. As you discover that some of the
RequestHandlers are becoming bottlenecks, you can simply run them in
other processes in a load balanced manner. Do accomplish this, the
changes needed to the code are almost trivial - no changes to your
RequestHandlers are needed.

But, there are definitely times where you want to scale a web
application by using ZeroMQ sockets in the manner you are describing.
We actually have another pull request under review that adds some
helper classes to PyZMQ to cover these usage cases:

https://github.com/zeromq/pyzmq/pull/174

This is basically a simple ZeroMQ based RPC setup that again uses
ROUTER/DEALER sockets for load balancing. If you need this type of
thing we would love your comments on that PR as well. We use both of
these approaches in building tornado based web apps.

> I didn't read full code carefully, but I see one inconvenience: how can I
> set timeout per request? In running applications it's very often situation,
> that after some time of waiting backend response you want to do other action
> (e.g. finish request with "Internal timeout error").  As far as I understood
> code, it can be done here:
>
> https://github.com/zeromq/pyzmq/commit/f261a5f9f8dcd75ebe6e9f1f1885bdf00fc05214#L1R102

Great point! I will look into this.

> We just should to check whether Timeout or TimeoutCallback is set (or both)
> and create DelayCallback to call it if request is not finished yet.
>
> There also some questions to code. E.g. do we really need local variables
> here:
>
> https://github.com/zeromq/pyzmq/commit/f261a5f9f8dcd75ebe6e9f1f1885bdf00fc05214#L1R255

Probably not, I will clean that up.

--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgra...@calpoly.edu and elli...@gmail.com

Ben Darnell

unread,
Feb 19, 2012, 5:02:11 AM2/19/12
to python-...@googlegroups.com
I don't condone the monkey-patching at the end of this file (i.e.
assignment to attributes of tornado.web). And there's an unfortunate
amount of duplicated code that seems likely to break over time. Let's
look at this from a different direction: What changes would be needed
on the tornado side to implement this cleanly? I'm thinking that the
cleanest solution may be for the worker to use unmodified Application
and RequestHandler classes, but with a subclass of HTTPServer that
uses zmq instead of real sockets. Things look better on the proxy
side, although it's not clear why you're checking xsrf cookies on the
proxy side rather than just passing it through. (also, could you do
this in prepare() instead of _execute()? See FallbackHandler for
example).

-Ben

Maria Kremnitzer

unread,
Feb 19, 2012, 5:10:35 AM2/19/12
to python-...@googlegroups.com
thank you very much, !!!
 
 


Von: Ben Darnell <b...@bendarnell.com>
An: python-...@googlegroups.com
Gesendet: 11:02 Sonntag, 19.Februar 2012
Betreff: Re: [tornado] ZeroMQ based tornado request handlers

Brian Granger

unread,
Feb 20, 2012, 1:47:34 PM2/20/12
to python-...@googlegroups.com
Ben,

Thanks for looking at this.

On Sun, Feb 19, 2012 at 2:02 AM, Ben Darnell <b...@bendarnell.com> wrote:
> I don't condone the monkey-patching at the end of this file (i.e.
> assignment to attributes of tornado.web).  And there's an unfortunate
> amount of duplicated code that seems likely to break over time.  Let's
> look at this from a different direction:  What changes would be needed
> on the tornado side to implement this cleanly?

I agree the monkey patching is not great. There are two ways we could
address this. The reason the monkey patching is there is that there
are methods inside tornado.web that use some of the existing request
handler classes (like RedirectHandler). I am currently not overriding
those methods, but I need them to use the ZMQ versions of those
request handlers. Two approaches:

* I could just override those methods, leaving the logic the same but
using the ZMQ versions of the request handlers. This requires no
changes to tornado, but will mean there is more code duplication in
our subclasses.
* Make those request handlers class attributes that our subclasses can
set. This would require a minimal change to tornado, but would
require future changes to tornado to continue to use this usage
pattern. I could submit a PR for this.

I a mine with either of these, which would you prefer.

>  I'm thinking that the
> cleanest solution may be for the worker to use unmodified Application
> and RequestHandler classes, but with a subclass of HTTPServer that
> uses zmq instead of real sockets.  Things look better on the proxy
> side, although it's not clear why you're checking xsrf cookies on the
> proxy side rather than just passing it through.  (also, could you do
> this in prepare() instead of _execute()?  See FallbackHandler for
> example).

I will have a look at the xsrf stuff, I may be able to clean it up.

Cheers,

Brian

--

Ben Darnell

unread,
Feb 20, 2012, 11:09:10 PM2/20/12
to python-...@googlegroups.com
It probably makes sense to make ErrorHandler, RedirectHander and the
rest application settings like we did for static_handler_class.
However, I think it's cleaner overall to keep zeromq at the transport
layer and leave the web layer unmodified. The difficulty in doing
this is largely our fault; WSGIApplication set a bad precedent and
kept us from cleaning up the interface between HTTPServer and
Application.

-Ben

Brian Granger

unread,
Feb 22, 2012, 2:50:23 PM2/22/12
to python-...@googlegroups.com
Ben,

OK I have created a tornado PR that implements the
redirect_handler_class and error_handler_class settings:

https://github.com/facebook/tornado/pull/467

I will also remove the monkey patching code from PyZMQ.

Thanks!

Brian

Brian Granger

unread,
Feb 22, 2012, 6:36:38 PM2/22/12
to python-...@googlegroups.com
Ben,

I have figured out a way to implement all of this without subclassing
RequestHandler on the backend side of things. This means the tornado
PR I put up earlier today is no longer needed. The code is much
cleaner now and also supports flush.

Cheers,

Brian

Shane Spencer

unread,
Feb 22, 2012, 8:01:57 PM2/22/12
to python-...@googlegroups.com
Share Share!

Brian Granger

unread,
Feb 24, 2012, 4:40:26 PM2/24/12
to python-...@googlegroups.com
OK, my branch has a mostly final version of this. It has changed
quite a bit since the initial PR was put up. Some of the changes:

* The backend can use regular web.RequestHandlers - no custom
ZMQRequestHandler is needed. This means your existing handler classes
will just work.
* Different choices for how the frontend/backend communication. This
includes a new streaming mode. See the examples.
* More testing and improved examples.
* Our subclasses do much less. We are trying to use the logic of the
base tornado classes as much as possible.
* Timeouts are implemented.

Cheers,

Brian

Shane Spencer

unread,
Jun 27, 2012, 12:21:18 AM6/27/12
to python-...@googlegroups.com
So.. how did this all end up? I'm familiar with both Tornado and
ZeroMQ. Are the proposed changes official at this point and in tagged
versions? I'd love to start using both together without tying in
Brubeck or Mongrel2 any more.

- Shane

Shane Spencer

unread,
Jun 28, 2012, 5:07:49 PM6/28/12
to python-...@googlegroups.com
On Wed, Jun 27, 2012 at 9:24 PM, James Dennis <jde...@gmail.com> wrote:
> Hello.
>
> This discussion makes me think of a few things that I'd like to share.
>
> I tried to build something around this idea a while back, but I used a
> simple ZMQ REQ/REP socket. In doing so, I also wrote a ZMQMixin, not a
> special subclass of RequestHandler, that let me send a message to some zmq
> socket. The application logic was basically just to create a message id and
> put the callback in a dictionary, keyed by the message id. Then, when a
> response was given, look up the id in the dict off you go.
>
> It was a simple experiment and not really robust, but maybe something can be
> learned from the idea. I call it Dillinger (after Dillinger Escape
> Plan): https://github.com/j2labs/dillinger/ - You'll see that there's some
> code to replace Tornado's IOLoop with something that imports the pyzmq one
> instead. This always felt kinda dirty, though. It's called surgery.py.
>
> This discussion also sounds similar to what Mongrel2 does. I see Shane
> mentioned using it and Brubeck (I'm the author of Brubeck btw). I've learned
> some things from working with Mongrel2 and Brubeck that might be worth
> sharing.
>
> One example, IPC sockets are tricky for users. You can have a relative path,
> like ipc://some/socket or an absolute path like ipc:///some/socket. The
> difference is subtle, but combine this nuance with the mailbox features in
> ZeroMQ and you end up with confused users. A safe default that I've started
> using is to just use TCP sockets in all the examples.
>
> In some cases, you will have to greatly rethink the way you handle requests.
> If you off load heavy lifting to backends, you are waiting to hear back on
> the front-end, turning the front-end into something of a sockets proxy. ZMQ
> is also interesting because you're not bound to using Tornado on the other
> side, if you don't want to. You could use an entirely different language.
>
> The design of Brubeck's WebMessageHandler is very similar to what's being
> described here. Some URL routes to one of these handlers via a regex match.
> There is a lower level with just a MessageHandler that is used for arbitrary
> ZMQ message formats. If the user should be authenticated, a decorator
> decides if they can access the resource, all that usual stuff, but it's
> based entirely on a python dictionary containing the request data, instead
> of some specially design HTTP object. I imagine that's similar to what you'd
> send to the ZMQ workers, anyway, perhaps as a JSON string or msgpack'd
> thing.
>
> The ZMQ guide can be a bit overwhelming, so I have another link to offer in
> hopes of helping: https://github.com/j2labs/zmq_examples - These are some
> examples that kept coming up in conversation when I first started playing
> with ZeroMQ.
>
> And, I found it was actually somewhat easy to build an abstraction around
> WSGI or a Mongrel2 connection. I don't know if it will translate properly to
> the challenges Tornado will face, but here is where I did
> that: https://github.com/j2labs/brubeck/blob/master/brubeck/connections.py
>
> I hope some of this information is helpful while everyone thinks on ZMQ.
> It's a really powerful system, but it has some slippery nuances.
>
> James
>
>

Awesome work on Brubeck James.. I talked to you a few times about it
in the past. I find there is a lot of usefulness of ZeroMQ as an
asynchronous client within a request handler as well as a request
dispatcher. The only real problem I have at this point when using a
ZeroMQ socket within a request in a single Tornado thread is that I
can't run Tornado in debug mode. Otherwise I feel as though the
ZeroMQ event loop mostly works for not so complex projects. It wasn't
until I tried Brubeck and Mongrel2 that I realized that ZeroMQ could
be seen as an adequate request dispatcher.

I'm actually always a bit up in the air, personally, on if I should
use a socket or a request dispatcher for many of the things I do.

The end goal with integration between Tornado and ZeroMQ for me would
be having an appropriate event loop that passed all of the tests as
well as simple ZeroMQ socket support on top of possibly a generic
request dispatcher that ZeroMQ could be a part of.

This is a bit rambly and whiny so I'm sorry. I was poking this thread
to see if any handshakes had gone on between both projects to form a
widely acceptable concrete solution with adequate documentation.

For what it's worth. I ended up writing several low level socket
clients to connect to ZeroMQ listeners somewhere in order to retain
the original event loop. It wasn't worth it in the end.

- Shane

Joe Bowman

unread,
Jul 17, 2012, 9:20:27 AM7/17/12
to python-...@googlegroups.com
why can't you use debug mode.. and are you building zeromq sockets for each request? Why wouldn't you build one socket for the application to each point you need to reach? ie: Some quick sample code from chatfor.us 

def process_msg(message):
    msg = json.loads(message[0],
            object_hook=json_util.object_hook)
    if msg["type"] == "ping":
        # ping and broadcast go to every one (broadcast not implemented yet)
        for chat in active_chats:
            for connection in active_chats[chat].values():
                #logging.error(connection)
                connection["callback"](msg)
    else:
        if msg["chat"] in active_chats:
                for chat in active_chats[msg["chat"]]:
                    active_chats[msg["chat"]][chat]["callback"](msg)

if __name__ == "__main__":

    ctx = zmq.Context()
    pusher = ctx.socket(zmq.PUSH)
    pusher.connect('tcp://127.0.0.1:5555')
    pusher_stream = zmqstream.ZMQStream(pusher, my_ioloop)
    
    receiver = ctx.socket(zmq.PULL)
    receiver.connect('tcp://127.0.0.1:5556')
    receiver_stream = zmqstream.ZMQStream(receiver, my_ioloop)
    receiver_stream.on_recv(process_msg)

    my_ioloop.start()


That's 2 sockets, one for sending one for receiving. My architecture is I have web servers that all interact with a simple router application which manages getting the chat messages sent to all the other web severs as well as storing it in a dbms. The pusher is available to every request so I never build sockets in requests. 

Using this approach I've never had a problem running in debug mode. 
Reply all
Reply to author
Forward
0 new messages