Re: [tornado] tornado.gen: New library for asynchronous generators

718 views
Skip to first unread message

Alek Storm

unread,
Sep 4, 2011, 3:36:11 PM9/4/11
to python-...@googlegroups.com
Is this a typo?  From reading the contents of gen.py, I think

response = yield gen.Task(http_client.fetch("http://example.com
"))

should be replaced with 

response = yield gen.Task(http_client.fetch, "http://example.com")

So that the function isn't called prematurely.  More feedback to come; I have to run out the door.

On Sun, Sep 4, 2011 at 1:07 PM, Ben Darnell <b...@bendarnell.com> wrote:
I've submitted a new module that lets you write asynchronous code in a straight-line synchronous style using generators.  It is similar in spirit to libraries such as adisp, monocle, and swirl; its major distinguishing feature is the option to split a single asynchronous call into two steps (Callback/Wait), which affords greater flexibility in control flow and parallelism.  

Please check it out; I'd appreciate feedback on the interface before everything gets frozen for 2.1.

-Ben


"""``tornado.gen`` is a generator-based interface to make it easier to
work in an asynchronous environment.  Code using the ``gen`` module
is technically asynchronous, but it is written as a single generator
instead of a collection of separate functions.

For example, the following asynchronous handler::

    class AsyncHandler(RequestHandler):
        @asynchronous
        def get(self):
            http_client = AsyncHTTPClient()
            http_client.fetch("http://example.com",
                              callback=self.on_fetch)

        def on_fetch(self, response):
            do_something_with_response(response)
            self.render("template.html")

could be written with ``gen`` as::

    class GenAsyncHandler(RequestHandler):
        @asynchronous
        @gen.engine
        def get(self):
            http_client = AsyncHTTPClient()
            response = yield gen.Task(http_client.fetch("http://example.com"))
            do_something_with_response(response)
            self.render("template.html")

`Task` works with any function that takes a ``callback`` keyword argument
(and runs that callback with zero or one arguments).  For more complicated
interfaces, `Task` can be split into two parts: `Callback` and `Wait`::

    class GenAsyncHandler2(RequestHandler):
        @asynchronous
        @gen.engine
        def get(self):
            http_client = AsyncHTTPClient()
            http_client.fetch("http://example.com",
                              callback=(yield gen.Callback("key"))
            response = yield gen.Wait("key")
            do_something_with_response(response)
            self.render("template.html")

The ``key`` argument to `Callback` and `Wait` allows for multiple
asynchronous operations to proceed in parallel: yield several
callbacks with different keys, then wait for them once all the async
operations have started.
"""

Ben Darnell

unread,
Sep 4, 2011, 3:07:54 PM9/4/11
to Tornado Mailing List

Jeremy Kelley

unread,
Sep 4, 2011, 4:17:24 PM9/4/11
to python-...@googlegroups.com
I've just skimmed gen.py quickly, and having wrestled quite a bit with this issue (my inadequate attempts are public record :) ) I can say that this seems to be a pretty straightforward and elegant solution. 

-j

Sent from my iPhone

Jon Parise

unread,
Sep 4, 2011, 5:34:16 PM9/4/11
to python-...@googlegroups.com
On Sun, Sep 4, 2011 at 12:07 PM, Ben Darnell <b...@bendarnell.com> wrote:

> Please check it out; I'd appreciate feedback on the interface before
> everything gets frozen for 2.1.

I think my only feedback has to do with the name 'gen' itself. Given
the module's usage, the fact that it's generator-based is an important
detail but not a descriptor unto itself.

I think the names would be a bit more obvious to newcomers if the
module was simply called "tornado.async":

@gen.engine => @async.function
gen.Task => async.Task
gen.Wait => async.Wait

"tornado.async" is my current preference, but I appreciate how that
could be consider ambiguous alongside the existing @asynchronous
decorator. Perhaps someone else can think of a better name (hopefully
without this turning into a "naming-the-thing" thread <g>).

Other than that detail, this looks quite good, and I appreciate the
upfront effort you put into including such descriptive exceptions.

--
Jon Parise (jon of indelible.org)  ::  "Scientia potentia est"

HENG

unread,
Sep 5, 2011, 8:36:29 AM9/5/11
to python-...@googlegroups.com
Nice! Ben, you are a `Gen`ius.....

Thank you!

2011/9/5 Jon Parise <j...@indelible.org>



--
--------------------------------------------------------------------
HengZhou
---------------------------------------------------------------------
--

Aleksandar Radulovic

unread,
Sep 5, 2011, 8:58:13 AM9/5/11
to python-...@googlegroups.com
Hi,

I agree with Jon, the module name should describe it's purpose (and
function) not implementation details.

+1 for tornado.async instead of tornado.gen

-alex

--
a lex 13 x
http://www.a13x.info

Phil Whelan

unread,
Sep 5, 2011, 12:57:11 PM9/5/11
to python-...@googlegroups.com
Looks awesome.
 
Perhaps someone else can think of a better name (hopefully
without this turning into a "naming-the-thing" thread <g>).

How about "asynctask"?

Cheers,
Phil

Frank Smit

unread,
Sep 5, 2011, 1:58:55 PM9/5/11
to python-...@googlegroups.com
Why not just keep `gen` or `generator` as it's based on a generator.
Using `async*` would be confusing, because you already wrap your
request handler method with `tornado.web.asynchronous`.

I haven't tried it yet, but I guess it can be used as a replacement
for `adisp`? I'll try to implement this into Momoko after I'm done
with the blocking connection part.

Ovidiu Predescu

unread,
Sep 5, 2011, 2:25:43 PM9/5/11
to python-...@googlegroups.com
Looks good Ben!

I like how it handles multiple async operations, much better than adisp.

What happens with exceptions thrown in the async handler? It would be
nice to have them appear back in the original context that started the
operation. One potential problem are exceptions in functions that
issue multiple async operations in parallel.

I echo other's comments about renaming the module to async, it's a
better name for what the module does.

Ovidiu

On Sun, Sep 4, 2011 at 12:07 PM, Ben Darnell <b...@bendarnell.com> wrote:

Ben Darnell

unread,
Sep 5, 2011, 2:43:25 PM9/5/11
to python-...@googlegroups.com
Naming this kind of thing is tricky, and I'm open to suggestions, but I definitely prefer "gen" to "async".  This module needs a more specific name than "async" because I could easily imagine other things going into an "async" module (e.g. https://gist.github.com/741041), many of which are alternatives to the generator approach rather than complementary (in addition to the potential confusion with @tornado.web.asynchronous and the now-deprecated async_callback methods).  I think "gen" is a reasonable descriptor since the salient difference to the application developer between using this module and manually passing callbacks around is that you're writing a generator.  

On Mon, Sep 5, 2011 at 11:25 AM, Ovidiu Predescu <ovi...@gmail.com> wrote:
What happens with exceptions thrown in the async handler? It would be
nice to have them appear back in the original context that started the
operation. One potential problem are exceptions in functions that
issue multiple async operations in parallel.

Exception handling is not as graceful as it could be; mostly things just bubble up to the StackContext.  I'd like to make exceptions in the gen module to be thrown back into the generator, but in my first attempt at that the stack trace came out weird, so I'll need to look at it again later.

-Ben

Jon Parise

unread,
Sep 5, 2011, 2:49:30 PM9/5/11
to python-...@googlegroups.com
On Monday, September 5, 2011 at 11:43 AM, Ben Darnell wrote:
Naming this kind of thing is tricky, and I'm open to suggestions, but I definitely prefer "gen" to "async".  This module needs a more specific name than "async" because I could easily imagine other things going into an "async" module (e.g. https://gist.github.com/741041), many of which are alternatives to the generator approach rather than complementary (in addition to the potential confusion with @tornado.web.asynchronous and the now-deprecated async_callback methods).  I think "gen" is a reasonable descriptor since the salient difference to the application developer between using this module and manually passing callbacks around is that you're writing a generator.
That's true.  I also liked the "asynctask" (or just "task" or "tasks") suggestion.

I think my particular bias against "gen" is that I've used software components named that in the past, and they were all focused on generating files or code templates.

Nicholas Dudfield

unread,
Sep 5, 2011, 3:01:18 PM9/5/11
to python-...@googlegroups.com

Naming this kind of thing is tricky, and I'm open to suggestions, but I definitely prefer "gen" to "async". 

@tornado.scheduler.scheduled  ?

gen is nice and short

Alek Storm

unread,
Sep 7, 2011, 1:44:27 PM9/7/11
to python-...@googlegroups.com
The Callback/Wait pairing is a great idea, but the syntax is quite different from Task, not to mention a bit arcane, especially for new Python programmers:

http_client.fetch("http://example.com", 
callback=(yield gen.Callback("key"))
response = yield gen.Wait("key")

compared to

response = yield gen.Task(http_client.fetch, "http://example.com")

How about we change it to the following:

yield gen.Callback("key", http_client.fetch, "http://example.com")
response = yield gen.Wait("key")

Ben Darnell

unread,
Sep 8, 2011, 2:20:00 AM9/8/11
to python-...@googlegroups.com
This is largely a matter of taste, but personally I find the Callback/Wait syntax more natural since it looks like what you'd write if generators weren't involved, whereas the Task syntax is awkward since you have to remember to leave out the parens.  

The current Callback behavior has an important role in supporting situations where you need to do something other than pass the callback as a kwarg to another function.  I think the behavior you're proposing would fit better with a different name, maybe gen.KeyedTask?  Whatever it's called, it's a bit of a departure from any of the existing yieldable objects since it doesn't return a value, it just has a side effect.  

There are a couple other approaches that would give Task-style syntax without limiting you to one outstanding task at a time:
  key = yield gen.DeferredTask(http_client.fetch, "http://example.com")
  response = yield gen.Wait(key)
or:
  response_list = yield gen.Multi([gen.Task(http_client.fetch, "http://example.com")])
I've also thought of doing the latter if you yield a list of YieldPoints, rather than requiring an explicit wrapper object.  It's not quite as flexible as the variants that use an explicit Wait, but it's simple and handles the common cases well.

-Ben

Alek Storm

unread,
Sep 12, 2011, 6:16:28 AM9/12/11
to python-...@googlegroups.com
Yes, I considered those variants as well - I just wanted you to agree to a change in Callback first :).  If the deferring functionality doesn't generalize to all function signatures, we're going to continually get requests like https://github.com/facebook/tornado/issues/351.  There are two pieces of information that must be made available to the caller: the deferring function's return value and the arguments it passes to the callback (unfortunately, Python makes this awkward by storing positional and keyword arguments in separate collections).  There are several ways to present the interface.

Proposal 1
----------------

key, retval = yield gen.Callback(httpclient.fetch, "www.example.com")
args, kwargs = yield key

In fact, Python 3's extended list unpacking can make this syntax look even more like a function definition:

(name, address, *args), kwargs = yield key

If the user has multiple keys but doesn't need the callback arguments, they can do:

key1, retval1 = yield gen.Callback(httpclient.fetch, "www.example1.com")
key2, retval2 = yield gen.Callback(httpclient.fetch, "www.example2.com")
yield (key1, key2)

Proposal 2
----------------

key1 = yield gen.Callback(httpclient.fetch, "www.example1.com")
# key1.retval
key2 = yield gen.Callback(httpclient.fetch, "www.example2.com")
# key2.retval
yield (key1, key2)
name, address, *args = key1.args
# key1.kwargs

This has the advantage of a consistent interface, whether one key is yielded or several, without the unwieldiness of an (arg, kwargs) tuple.  With a bit of magic, we can allow both the above form and:

key1, key2 = yield gen.Multi(gen.Callback(httpclient.fetch, "www.example1.com"), gen.Callback(httpclient.fetch, "www.example2.com"))
# key1.retval, key1.args, key1.kwargs

which is strictly equivalent to:

key1, key2 = yield (yield gen.Callback(httpclient.fetch, "www.example1.com"), yield gen.Callback(httpclient.fetch, "www.example2.com"))

And gen.Task becomes syntactic sugar:

key = yield gen.Task(httpclient.fetch, "www.example.com")
# key.retval, key.args, key.kwargs

which is equivalent to:

key = yield (yield gen.Callback(httpclient.fetch, "www.example.com"))

I favor proposal 2, though I'm sure there are third and fourth options I haven't considered.

Ben Darnell

unread,
Sep 13, 2011, 12:25:15 AM9/13/11
to python-...@googlegroups.com
I consider the ability for application code to get its hands on the callback object directly to be an important feature of the design, so the overall flow of the Callback/Wait interface is going to stay more or less the same, rather than being reworked to be more like Task.  

There's a tradeoff between generality and convenience here.  It's important that the common cases be simple - it would suck to have to do "response = key1.args[0]" after every http_client.fetch call.  But at the same time, the generator interface as a whole should be flexible enough that you don't have to rewrite the whole handler with manual callbacks to do certain things.  I've been thinking of Callback/Wait as the "advanced" interface, with Task as the "simple" interface.  I don't think it's a problem that in the rare case that the asynchronous function has a useful return value you need to use Callback/Wait to see it (just like how you need to use Callback/Wait if the asynchronous function doesn't take its callback as a kwarg named "callback").  

Something like your proposal 2 makes sense (with e.g. s/gen.Callback/gen.Start/), although I'm wary of adding too many options to the interface before we know how it's going to be used.  I'm inclined to keep it simple for now, and encourage the use of gen.Task (and lists of gen.Tasks) for new users of the framework, and introduce new variations based on need in practice.

-Ben

Ben Darnell

unread,
Aug 11, 2012, 5:19:41 PM8/11/12
to pranjal5215, python-...@googlegroups.com
You need a new subclass of gen.YieldPoint. Using gen.WaitAll as an
example, something like this should work (but I haven't tested it):

class WaitAny(YieldPoint):
def __init__(self, keys):
self.keys = keys

def start(self, runner):
self.runner = runner

def is_ready(self):
return any(self.runner.is_ready(key) for key in self.keys)

def get_result(self):
for key in self.keys:
if self.runner.is_ready(key):
return (key, self.runner.pop_result(key))
raise Exception("no results found")

Usage would be something like this:

keys = set(["google", "tornado", "python"])
while keys:
key, response = yield WaitAny(keys)
keys.remove(key)
# do something with response

-Ben

On Thu, Aug 9, 2012 at 1:00 PM, pranjal5215 <pranj...@gmail.com> wrote:
> I am new to tornado and may be this is not the best place to ask but I had
> something to ask in continuation with the ongoing discussion.
>
> From what I understand from docs is that tornado.gen.Task comprises of
> tornado.gen.Callback and tornado.gen.Wait with each Callback/Wait pair
> associated with unique keys ...
>
>
> def get(self):
> http_client = AsyncHTTPClient()
> http_client.fetch("http://google.com",
> callback=(yield tornado.gen.Callback("google")))
>
> http_client.fetch("http://python.org",
> callback=(yield tornado.gen.Callback("python")))
>
> http_client.fetch("http://tornadoweb.org",
> callback=(yield
> tornado.gen.Callback("tornado")))
> response = yield [tornado.gen.Wait("google"),
> tornado.gen.Wait("tornado"), tornado.gen.Wait("python")]
>
> do_something_with_response(response)
> self.render("template.html")
>
> So the above code will get all responses from the different URLs.
> Now what I actually need to accomplish is to return the response as soon as
> one http_client returns the data. So if 'tornadoweb.org' returns the data
> first, it should do a self.write(respose) and a loop in def get() should
> keep waiting for other http_clients to complete.
> Any ideas on how to write this using tornado.gen interface.
>
>
>
>
> Very vague implementation(and syntactically incorrect) of what I am trying
> to do would be like this
>
> class GenAsyncHandler2(tornado.web.RequestHandler):
> @tornado.web.asynchronous
> @tornado.gen.engine
>
> def get(self):
> http_client = AsyncHTTPClient()
> http_client.fetch("http://google.com",
> callback=(yield tornado.gen.Callback("google")))
>
> http_client.fetch("http://python.org",
> callback=(yield tornado.gen.Callback("python")))
>
> http_client.fetch("http://tornadoweb.org",
> callback=(yield tornado.gen.Callback("tornado")))
>
> while True:
> response = self.get_response()
> if response:
> self.write(response)
> self.flush()
> else:
> break
> self.finish()
>
>
> def get_response(self):
> for key in tornado.gen.availableKeys():
> if key.is_ready:
> value = tornado.gen.pop(key)
> return value
> return None

pranjal pandit

unread,
Aug 12, 2012, 3:02:50 PM8/12/12
to Ben Darnell, python-...@googlegroups.com
Hi,

Making a subclass works just fine.
I have tested it. The following code works just fine with above implementation of WaitAny.

Should WaitAny be added as a standerd part inside tornado.gen module.
I think it would be useful utility provided in tornado.gen



class GenAsyncHandler2(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    @tornado.gen.engine
    def get(self):
        http_client = AsyncHTTPClient()
        http_client.fetch("http://google.com",
                          callback=(yield tornado.gen.Callback("google")))

        http_client.fetch("http://python.org",
                          callback=(yield tornado.gen.Callback("python")))

        http_client.fetch("http://tornadoweb.org",
                          callback=(yield tornado.gen.Callback("tornado")))
        keys = set(["google", "tornado", "python"])
        while keys:
            key, response = yield WaitAny(keys)
            keys.remove(key)
            # do something with response
            self.write(str(key)+"        ")
            self.flush()
        self.finish()
--
Regards
Pranjal Pandit


Don't communicate by sharing memory; share memory by communicating.
Reply all
Reply to author
Forward
0 new messages