Deconstructing Deferred

Showing 1-23 of 23 messages
Deconstructing Deferred Guido van Rossum 8/18/13 7:17 AM
Glyph seemed upset that I claimed I didn't understand Deferred. I was honest, but while on vacation without Internet access I felt remorse -- I couldn't seriously reject Deferred if I don't even understand it! Since I happened to have a somewhat aged Twisted checkout on my laptop, I perused the source and docs in tandem. I learned a lot. I figured that one of the reasons I hadn't understood Deferred before was that the available docs area pretty for someone like me -- they are either aimed at absolute beginners or at experienced Twisted users.

Here's my attempt to explain Deferred's big ideas (and there are a lot of them) to advanced Python users with no previous Twisted experience. I also assume you have thought about asynchronous calls before. Just to annoy Glyph, I am using a 5-star system to indicate the importance of ideas, where 1 star is "good idea but pretty obvious" and 5 stars is "brilliant".

I am showing a lot of code snippets, because some ideas are just best expressed that way -- but I intentionally leave out lots of details, and sometimes I show code that has bugs, if fixing them would reduce understanding the idea behind the code. (I will point out such bugs.) I am using Python 3.

Notes specifically for Glyph: (a) Consider this a draft for a blog post. I'd be more than happy to take corrections and suggestions for improvements. (b) This does not mean I am going to change Tulip to a more Deferred-like model; but that's for a different thread.

Idea 1: Return a special object instead of taking a callback argument

When designing APIs that produce results asynchronously, you find that you need a system for callbacks. Usually the first design that comes to mind is to pass in a callback function that will be called when the async operation is complete. I've even seen designs where if you don't pass in a callback the operation is synchronous -- that's bad enough I'd give it zero stars. But even the one-star version pollutes all APIs with extra arguments that have to be passed around tediously. Twisted's first big idea then is that it's better to return a special object to which the caller can add a callback after receiving it. I give this three stars because from it sprout so many of the other good ideas. It is of course similar to the idea underlying the Futures and Promises found in many languages and libraries, e.g. Python's concurrent.futures (PEP 3148, closely following Java Futures, both of which are meant for a threaded world) and now Tulip (PEP 3156, using a similar design adapted for thread-less async operation).

Idea 2: Pass results from callback to callback

I think it's best to show some code first:

class Deferred:
    def __init__(self):
        self.callbacks = []
    def addCallback(self, callback):
        self.callbacks.append(callback)  # Bug here
    def callback(self, result):
        for cb in self.callbacks:
            result = cb(result)

The most interesting bits are the last two lines: the result of each callback is passed to the next. This is different from how things work in concurrent.futures and Tulip, where the result (once set) is fixed as an attribute of the Future. Here the result can be modified by each callback.

This enables a new pattern when one function returning a Deferred calls another one and transforms its result, and this is what earns this idea three stars. For example, suppose we have an async function that reads a set of bookmarks, and we want to write an async function that calls this and then sorts the bookmarks. Instead of inventing a mechanism whereby one async function can wait for another (which we will do later anyway :-), the second async function can simply add a new callback to the Deferred returned by the first one:

def read_bookmarks_sorted():
    d = read_bookmarks()
    d.addCallback(sorted)
    return d

The Deferred returned by this function represents a sorted list of bookmarks. If its caller wants to print those bookmarks, it must add another callback:

d = read_bookmarks_sorted()
d.addCallback(print)

In a world where async results are represented by Futures, this same example would require two separate Futures: one returned by read_bookmarks() representing the unsorted list, and a separate Future returned by read_bookmarks_sorted() representing the sorted list.

There is one non-obvious bug in this version of the class: if addCallback() is called after the Deferred has already fired (i.e. its callback() method was called) then the callback added by addCallback() will never be called. It's easy enough to fix this, but tedious, and you can look it up in the Twisted source code. I'll carry this bug through successive examples -- just pretend that you live in a world where the result is never ready too soon. There are other problems with this design too, but I'd rather call the solutions improvements than bugfixes.

Aside: Twisted's poor choices of terminology

I don't know why, but, starting with the project's own name, Twisted often rubs me the wrong way with its choice of names for things. For example, I really like the guideline that class names should be nouns. But 'Deferred' is an adjective, and not just any adjective, it's a verb's past participle (and an overly long one at that :-). And why is it in a module named twisted.internet?

Then there is 'callback', which is used for two related but distinct purposes: it is the preferred term used for a function that will be called when a result is ready, but it is also the name of the method you call to "fire" the Deferred, i.e. set the (initial) result.

Don't get me started on the neologism/portmanteau that is 'errback', which leads us to...

Idea 3: Integrated error handling

This idea gets only two stars (which I'm sure will disappoint many Twisted fans) because it confused me a lot. I've also noted that the Twisted docs have some trouble explaining how it works -- In this case particularly I found that reading the code was more helpful than the docs.

The basic idea is simple enough: what if the promise of firing the Deferred with a result can't be fulfilled? When we write

d = pod_bay_doors.open()
d.addCallback(lambda _: pod.launch())

how is HAL 9000 supposed to say "I'm sorry, Dave. I'm afraid I can't do that" ?

And even if we don't care for that answer, what should we do if one of the callbacks raises an exception?

Twisted's solution is to bifurcate each callback into a callback and an 'errback'. But that's not all -- in order to deal with exceptions raised by callbacks, it also introduces a new class, 'Failure'. I'd actually like to introduce the latter first, without introducing errbacks:

class Failure:
    def __init__(self):
        self.exception = sys.exc_info()[1]

(By the way, great class name. And I mean this, I'm not being sarcastic.)

Now we can rewrite the callback() method as follows:

    def callback(self, result):
        for cb in self.callbacks:
            try:
                result = cb(result)
            except:
                result = Failure()

This in itself I'd give two stars; the callback can use isinstance(result, Failure) to tell regular results apart from failures.

By the way, in Python 3 it might be possible to do away with the separate Failure class encapsulating exceptions, and just use the built-in BaseException class. From reading the comments in the code, Twisted's Failure class mostly exists so that it can hold all the information returned by sys.exc_info(), i.e. exception class/type, exception instance, and traceback but in Python 3, exception objects already hold a reference to the traceback.There is some debug stuff that Twisted's Failure class does which standard exceptions don't, but still, I think most reasons for introducing a separate class have been addressed.

But let's not forget about the errbacks. We change the list of callbacks to a list of pairs of callback functions, and we rewrite the callback() method again, as follows:


    def callback(self, result):
        for (cb, eb) in self.callbacks:
            if isinstance(result, Failure):
                cb = eb  # Use errback
            try:
                result = cb(result)
            except:
                result = Failure()

For convenience we also add an errback() method:

    def errback(self, fail=None):
        if fail is None:
            fail = Failure()
        self.callback(fail)

(The real errback() function has a few more special cases, it can be called with either an exception or a Failure as argument, and the Failure class takes an optional exception argument to prevent it from using sys.exc_info(). But none of that is essential and it makes the code snippets more complicated.)

In order to ensure that self.callbacks is a list of pairs we must also update addCallback() (it still doesn't work right when called after the Deferred has fired):

    def addCallback(self, callback, errback=None):
        if errback is None:
            errback = lambda r: r
        self.callbacks.append((callback, errback))

If this is called with just a callback function, the errback will be a dummy that passes the result (i.e. a Failure instance) through unchanged. This preserves the error condition for a subsequent error handler. To make it easy to add an error handler without also handling a regular resullt, we add addErrback(), as follows:

    def addErrback(self, errback):
        self.addCallback(lambda r: r, errback)

Here, the callback half of the pair will pass the (non-Failure) result through unchanged to the next callback.

If you want the full motivation, read Twisted's Introduction to Deferreds; I'll just end by noting that an errback and substitute a regular result for a Failure just by returning a non-Failure value (including None).

Before I move on to the next idea, let me point out that there are more niceties in the real Deferred class. For example, you can specify additional arguments to be passed to the callback and errback. But in a pinch you can do this with lambdas, so I'm leaving it out, because the extra code for doing the administration doesn't elucidate the basic ideas.

Idea 4: Chaining Deferreds

This is a five-star idea! Sometimes it really is necessary for a callback to wait for an additional async event before it can produce the desired result. For example, suppose we have two basic async operations, read_bookmarks() and sync_bookmarks(), and we want a combined operation. If this was synchronous code, we could write:

def sync_and_read_bookmarks():
    sync_bookmarks()
    return read_bookmarks()

But how do we write this if all operations return Deferreds? With the idea of chaining, we can do it as follows:

def sync_and_read_bookmarks():
    d = sync_bookmarks()
    d.addCallback(lambda unused_result: read_bookmarks())
    return d

The lambda is needed because all callbacks are called with a result value, but read_bookmarks() takes no arguments.

--
--Guido van Rossum (python.org/~guido)
Re: Deconstructing Deferred Guido van Rossum 8/18/13 7:18 AM
[Whoops, I sent that before its time. I'll post the ending later]
Re: Deconstructing Deferred Guido van Rossum 8/18/13 9:44 AM
I apologize for the incomplete post. I pressed the wrong button and the first half of this post went out by accident, somewhat unreviewed. I'm happy enough about what got sent for ideas 1-3, although I had wanted to insert some more links into the Twisted docs and code for Deferred. Here is the rest, starting with the section on Chaining, which  hadn't fully fleshed out when I accidentally posted.

Idea 4: Chaining Deferreds


This is a five-star idea! Sometimes it really is necessary for a callback to wait for an additional async event before it can produce the desired result. For example, suppose we have two basic operations, read_bookmarks() and sync_bookmarks(), and we want a combined operation. If this was synchronous code, we could write:

def sync_and_read_bookmarks():
    sync_bookmarks()
    return read_bookmarks()

But how do we write this if all operations return Deferreds? With the idea of chaining, we can do it as follows:

def sync_and_read_bookmarks():
    d = sync_bookmarks()
    d.addCallback(lambda unused_result: read_bookmarks())
    return d

What is happening here? We call sync_bookmarks() and get a Deferred back that fires when the bookmarks have synced. Its return value is None, since it is invoked for its side-effect of syncing the bookmarks (it affects the local store of bookmarks, from which the next operation must read). This is the Deferred that we will return. But before returning it we add a callback that will read the bookmarks. Now, that callback itself returns a Deferred as its result! I will show next how this can be implemented.

(A note about the ugly lambda: it is needed because all callbacks are called with a result value, but read_bookmarks() takes no arguments.
Conveniently the result of sync_bookmarks() is None, so we can safely ignore it.)

Here's the code. In first approximation, we can just add a few lines after each callback (cb) is called, as follows:

    def callback(self, result):
        for (cb, eb) in self.callbacks:
            if isinstance(result, Failure):
                cb = eb  # Use errback
            try:
                result = cb(result)
            except:
                result = Failure()
            if isinstance(result, Deferred):
                result.addCallback(self.resume)
                break

But how to implement the resume() method? The idea is that it should continue calling callbacks where the current loop left off. But how do we know where that is? The solution chosen by Twisted is to consume the callback as they are called, so the resume operation can just be the same code as callback(). We do this by using self.callbacks as a queue -- new callbacks are appended to the end, and consumed callbacks are popped off at the front. The contents of the queue always represents the callbacks that yet have to be called. So here's the improved callback():

    def callback(self, result):
        while self.callbacks:
            cb, eb = self.callbacks.pop(0)
            if isinstance(result, Failure):
                cb = eb  # Use errback
            try:
                result = cb(result)
            except:
                result = Failure()
            if isinstance(result, Deferred):
                result.addCallback(self.callback)
                break

It's important to get your head around what happens in these last two lines. We stop calling the callback of this Deferred (that's what the 'break' is for) and we add a ourselves to the other Deferred (i.e., the one returned by the most recent callback or errback)  so that when that one fires, it calls us back, and we will continue calling our own list of callbacks, starting with the result of the other Deferred.

The beauty of this scheme is that the second Deferred may in turn wait for a third Deferred, so there's a whole chain of pending Deferreds, all of which will be woken up (resumed) when the last one in the chain fires.

(By the way, I find 'chaining' a somewhat unfortunate term, because the implementation concerns itself to which Deferred is chained to which other, and the verb does not make the direction clear. In the above, is 'self' chained to the other Deferred, or is the other Deferred chained to self? It turns out it is the former, self._chainedTo links to the Deferred on which self is currently waiting. The other Deferred is linked only indirectly to self, through the bound method 'self.callback'.)

Now there are a fe practical complications that I've left out. First of all, remember the bug I've refused to fix, about when addCallback() is called after the Deferred has already fired? Well, it just got harder to fix, and I'm still not showing the solution.

Second, we really should provide the other Deferred with an errback too, so that if it ends up with a failure we get a chance to handle it. We do this by calling its addCallback() method as follows:

                result.addCallback(self.callback, self.callback)

Now if the second Deferred has a Failure, it passes that Failure to our callback, which will use an errback to handle it. Failure or not, after all our callbacks/errbacks have run, the second Deferred (to which all this was a single callback/errback) will receive control back and start calling the rest of its callback lists. So the final result returned by our list of callbacks and errbacks becomes the next result for the second Deferred.

(The complication here for errbacks is one of the reasons I gave the errback idea only two stars. I should also point out that the signatures in my code don't exactly match those in Twisted; what I call addCallback() is called addCallbacks() there, since addCallback() is reserved for setting only the callback (leaving the errback the default pass-through), optionally providing additional arguments which I have also left out of my version, as I mentioned earlier.)

Dizzy? Re-read the code and perhaps try to follow it through with pen and paper. Here it is one more time including the last change:

    def callback(self, result):
        while self.callbacks:
            cb, eb = self.callbacks.pop(0)
            if isinstance(result, Failure):
                cb = eb  # Use errback
            try:
                result = cb(result)
            except:
                result = Failure()
            if isinstance(result, Deferred):
                result.addCallback(self.callback, self.callback)
                break

You may wonder, where is the result (that's being passed from callback to callback) being held while the first Deferred is waiting for the second one? The answer is that there is no result yet! It will be produced by the second Deferred in due time and passed into self.callback() at that time. This elegance is part of what earns this idea five stars.

The real Twisted code has a complication worth mentioning (but not showing) here: if you have a truly long chain of Deferreds, when the last one fires, all the callbacks end up calling each other, and effectively each Deferred in the chain takes up two Python stack frames. So with a truly long chain you could run into stack overflow. The Twisted folks have a solution for this, where the second Deferred is made to recognize that it is executing the callback of the first Deferred and just executes its callback/errback-calling loop in-line, maintaining a stack of Deferreds as a Python list instead of using up precious Python stack frames. If you're interested in the details, read the Twisted code (search for _CONTINUE).

There's also another (even less interesting) complication: a Deferred should really only be fired once, and we want callback() to fail an assertion when it is called for a second time. But we do want to be able to resume a Deferred that was waiting for another. The solution is to refactor things a bit so that the actual loop is in a helper method. The public callback() method contains the assertion and then calls the helper; the helper is used directly as the callback for the second Deferred. (See how confusing it can become to have two different uses for the term 'callback'?)

But while you research all that, I'll go on to the final idea I want to recognize.

Idea 5: Cancellation

Sometimes the party interested in a Deferred's result loses interest. It would be nice to be able to tell the source of the result to stop bothering and clean up instead. Twisted solves this using a new, optional callback function, called a 'canceller', that is passed to the Deferred's constructor. It is normally the party responsible for providing the result that creates the Deferred, and it must also provide the canceller function. The ultimate recipient of the result (say, the party that added the last callback to the Deferred) may call the Deferred's cancel() method to indicate that it has lost interest. This will then call the canceller with one argument, the Deferred being cancelled.

The canceller function can do any resource cleanup needed. Its specific responsibility to the Deferred is twofold:
  • It has to promise not to call the Deferred's callback or errback at a later time
  • Optionally it may call the callback or errback now
If the canceller doesn't call the callback or errback, cancel() will call the errback, with a Failure wrapping a CancelledError exception. So, in the end, calling cancel() will cause all callbacks/errbacks that were added to the Deferred to be called, and this provides an opportunity for the receiving party to do its own cleanup. Moreover, unless one of the callbacks/errbacks returns a Deferred, the whole list will be called before cancel() returns.

There are two more important details to cancellation, having to do with what happens if the Deferred has already fired. In this case, nothing happens: it's too late to cancel anything, since we have already got the result. However, there's an important exception: if we are waiting for another Deferred (see idea 4 above) the chained Deferred is cancelled.

Twisted also has another complication here: mostly for reasons of backwards compatibility, the canceller function is optional. If it is not specified, cancel() calls the callback chain starting with a Failure wrapping a CancelledError, but now the source of the result can't be held to the first bullet above. So eventually the Deferred will be fired again. Normally, this would produce an assertion failure (due to the complication I explained at the end of the previous section). So Twisted sets a flag on the Deferred to suppress this assertion. I don't really want to show all that, but here is some pseudo-code for cancel() that assumes the canceller is not None:

def cancel(self):
    if <already fired>:
        if <waiting for a chained Deferred>:
            <cancel the chained Deferred>
    else:
        # Not already fired
        self.canceller(self)
        if not <already fired>:
            # The canceller didn't call callback/errback
            self.errback(Failure(CancelledError()))

Note that this elegantly handles the case where a Deferred is cancelled twice: the second time, the Deferred will surely be marked as already fired, so the canceller will never be called more than once.

Also, if a callback/errback called by cancel() or by the canceller returns another Deferred, causing more chaining, it does make sense to cancel a second time, and then the chained Deferred will be cancelled. Thus, cancelling is a bit like hitting ^C multiple times to stop a stuck UNIX process -- if the process starts some cleanup in response to your first ^C, and the cleanup itself gets stuck, you can hit ^C again to interrupt the cleanup itself. And in Twisted, just as in the UNIX case, it is possible to have an arrangement of callbacks that keep creating and returning new Deferreds in response to being cancelled, so your program may never actually terminate. That's why UNIX has SIGKILL. (AFAIK there is no explicit Twisted equivalent, but I suppose you can use SIGKILL on the process running your runaway Twisted daemon. :-)

Another interesting observation is that cancellation propagates down the list of callbacks/errbacks in the same order as regular results or failures. Thus, parties that add callbacks/errbacks to a Deferred have the option of treating cancellation the same way they treat other failures -- including the option to mask the failure and return a regular (perhaps dummy) result, so that the next callback doesn't see a failure at all.

So how many stars do I give cancellation? I'm thinking between two and three. I want to give it three stars because I find the cancellation of chained Deferreds quite nice, but I feel like taking off a star because of the messiness around whether the canceller exists or not, and if it exists, the need for checking twice whether the Deferred has fired.

More good ideas

I really want to end this already interminable post, but I want to call out one more good idea, worth at least two stars: DeferredList. This is what you use if you want to wait for multiple Deferreds at the same time (as opposed to the bookmarks example, which must first wait for the sync operation to complete before it can start reading the bookmarks). It's a pretty straightforward application of the basic Deferred API; i it didn't already exist, you could easily write DeferredList yourself using just Deferred's public API (I think).

The DeferredList API is a bit clunkier than I'd like, because it has to deal with multiple error sources, and there are multiple choices for when to consider the result complete (you can wait for the first non-failure result, or for the first failure, or collect all results and errors). For full generality under all these options, the result is a list of (success, result) tuples. (Even though I'd think that success must always be equal to isinstance(result, Failure)?)

A somewhat simpler API built on top of DeferredList is gatherResults(), whose result is just the list of results of the arguments. But even it takes an optional flag, consumeErrors, which defaults to False but which the docs recommend you always set to True. (More backward compatibility?)

There are more things, ranging from useful to brilliant: there are lots of debug features, perhaps the most important being that if a Deferred is garbage-collected when the most recent result is a Failure, that Failure is logged. Deferreds can be paused; there are numerous convenience functions and methods; and of course there is @inlineCallbacks, which I have to give five stars because I independently reinvented it as Tulip coroutines. :-) But that's another story, for another time. It's time to do some laundry.

--
--Guido van Rossum (python.org/~guido)
Re: [python-tulip] Deconstructing Deferred Glyph Lefkowitz 8/18/13 10:56 PM

On Aug 18, 2013, at 7:17 AM, Guido van Rossum <gu...@python.org> wrote:

Glyph seemed upset that I claimed I didn't understand Deferred.

I'll hopefully be replying to some of the points here in detail as the week goes on, but to the extent that I am the, uh, "aggrieved" party here, let me just say a few things quickly:

  1. Welcome back!  I hope you had a good vacation.  (And I hope that this wasn't on your mind too much.)
  2. Wow!  What a writeup!  This is an incredibly thorough, thoughtful, and detailed architectural review of Deferreds.  I don't think anyone has ever done something like it before.  Thank you for doing it.  Reading it (even just skimming it) was like re-learning about Deferreds for the first time myself.  As I was doing so, I thought of at least 3 bugs that I should file to improve things about Deferreds, at least 1 of which was a completely new idea :-).
  3. You seem to have gotten pretty much everything on the nose.  I reserve the right to correct minor points but hit covered a number of things which are poorly, incompletely, or not at all documented, as well as all the major basics; I'm pretty sure you hit everything that I think is a good idea in Deferred.
  4. In fact, I think we might like to steal some of your prose here verbatim for the Twisted documentation.  Do you mind?  Particularly, although I just (finally) this weekend approved some some rudimentary narrative cancellation documentation to land, and your explanation of cancellation is actually better than what's in the docs, so it might be good to integrate there.
  5. You see my difficulty now - hard to keep a post about Deferreds under 5000 words, isn't it? ;-)

Thanks again,

-glyph


Re: [python-tulip] Deconstructing Deferred Guido van Rossum 8/19/13 8:23 AM
You're welcome! This is the kind of project that could only happen during a vacation. :-)

Feel free to borrow from my prose, or link to it (I will post this to my blog once the dust has settled). I did the research for my own benefit, but decided to write it up for the benefit of future Twisted users (as well as people who would like to argue with me about Tulip).

Everyone: the discussion is happening in a Google Doc, which is more suitable for commenting on such a huge post. The URL is here; members of the python-tulip list can comment: https://docs.google.com/document/d/10WOZgLQaYNpOrag-eTbUm-JUCCfdyfravZ4qSOQPg1M

I am going to do a separate write-up explaining why Tulip doesn't need Deferred (based on what I learned and wrote about it), but it may take a while before I have the whole argument complete. In the meantime, I am also going back to the discussion about cancellation, now that I understand Deferred's approach.

--
--Guido van Rossum (python.org/~guido)
Re: [python-tulip] Re: Deconstructing Deferred Gustavo Carneiro 8/20/13 5:24 AM
On Sun, Aug 18, 2013 at 5:44 PM, Guido van Rossum <gu...@python.org> wrote:
I apologize for the incomplete post. I pressed the wrong button and the first half of this post went out by accident, somewhat unreviewed. I'm happy enough about what got sent for ideas 1-3, although I had wanted to insert some more links into the Twisted docs and code for Deferred. Here is the rest, starting with the section on Chaining, which  hadn't fully fleshed out when I accidentally posted.

Idea 4: Chaining Deferreds



This is a five-star idea! Sometimes it really is necessary for a callback to wait for an additional async event before it can produce the desired result. For example, suppose we have two basic operations, read_bookmarks() and sync_bookmarks(), and we want a combined operation. If this was synchronous code, we could write:

def sync_and_read_bookmarks():
    sync_bookmarks()
    return read_bookmarks()

But how do we write this if all operations return Deferreds? With the idea of chaining, we can do it as follows:

def sync_and_read_bookmarks():
    d = sync_bookmarks()
    d.addCallback(lambda unused_result: read_bookmarks())
    return d



I should point out here that Tulip coroutines clearly beat Deferred chaining in terms of readability.  In Tulip coroutine, this would be:

def sync_and_read_bookmarks():
    yield from sync_bookmarks()
    return (yield from read_bookmarks())

The "chaining" here is explicit.  You don't need to learn any API docs to understand what is happening here.  You could add a sort of "conditional chaining" code here, that depending on the return value of sync_bookmarks() you would call or not call read_bookmarks():

def sync_and_read_bookmarks():
    if (yield from sync_bookmarks()) > 0:
        return (yield from read_bookmarks())

Doing the same thing with chaining callbacks would be more complicated and less readable.

You can't beat this code in terms of ease of read and write.  Deferreds are ugly by comparison, IMHO.  If this is a 5 star idea, then Tulip has to be 6.

I think Twisted does great things within the callback programming style that was available at the time.  However, I think "yield from" is a game changer, and opens up new possibilities.  You cannot pretend that all the design decisions behind Twisted still apply, because these design decisions did't take into account "yield from".

Well, I guess this turned out to be another "coroutines vs callbacks" post.  Sorry about that :P

-- Gustavo (now on vacations).
Re: Deconstructing Deferred Ben Lesh 8/20/13 6:33 AM
Just a tiny tidbit here:

 For example, I really like the guideline that class names should be nouns. But 'Deferred' is an adjective, and not just any adjective, it's a verb's past participle (and an overly long one at that :-). And why is it in a module named twisted.internet?

The name "Deferred" is a standard in the future/promise pattern vocabulary that stems primarily from JavaScript's Q library and later JQuery. They were referred to as "Deferred Objects", but I guess DeferredObject as a name seemed a little redundant. I've also seen them called "deferred callbacks" and "deferred promises". So is it a noun? It is now. At least in the programming world it is. Thank Kris Kowal and John Resig. "Deferment" might have been better? But that ship has probably sailed.

Language is an evolutionary thing, you of all people should understand that. ;)

Thanks for all of your work.
unk...@googlegroups.com 8/21/13 9:32 AM <This message has been deleted.>
Re: Deconstructing Deferred Terry Jones 8/21/13 9:48 AM
Hi Benjamin

I think you've got your history a bit the wrong way around. Twisted deferreds pre-date the existence of jQuery, Q, etc. As I understand it, the history of Deferreds in Javascript begins with Dojo deferreds (based explicitly on Twisted), and then the Promises/A and Promises/A+ efforts which led to many things (jQuery deferreds, Q, when, rsvp) etc., all of which are MUCH more recent than Twisted's deferreds.


Terry
unk...@googlegroups.com 8/21/13 10:13 AM <This message has been deleted.>
Re: Deconstructing Deferred Guido van Rossum 8/21/13 10:25 AM

On Wed, Aug 21, 2013 at 9:32 AM, Terry Jones <terry...@gmail.com> wrote:
Hi Guido - that's great you spent time digging into Twisted deferreds.

Well, it was mostly so I can have an informed opinion when stating that Tulip won't be adopting Deferred's model. :-)
 
I have a few comments.


Idea 1: Return a special object instead of taking a callback argument

When designing APIs that produce results asynchronously, you find that you need a system for callbacks. Usually the first design that comes to mind is to pass in a callback function that will be called when the async operation is complete. I've even seen designs where if you don't pass in a callback the operation is synchronous -- that's bad enough I'd give it zero stars. But even the one-star version pollutes all APIs with extra arguments that have to be passed around tediously. Twisted's first big idea then is that it's better to return a special object to which the caller can add a callback after receiving it.

Did you see this recent well-written article about node.js? Callbacks are imperative, promises are functional: Node’s biggest missed opportunity (http://bit.ly/1bTd9A1)?

Didn't see it -- I'm not really following node.js, I really can't handle callbacks in JavaScript, so I guess I have to agree with that author.
 
It has the following:

By contrast, promise-based functions always let you treat the result of the function as a value in a time-independent way. When you invoke a callback-based function, there is some time between you invoking the function and its callback being invoked during which there is no representation of the result anywhere in the program.

Yeah, this is basically "idea 1" right?

The niceness of having a first-class object representing the future result is illustrated by an example I was recently writing about: memoization. Without deferreds, if you're memoizing a function that takes a while to compute, then any requests that arrive while the first call for a given argument is being made will result in additional identical calls (exactly the thing you're trying to prevent!). With deferreds, you can instantly memoize the result even though you don't have the result yet. Only one call is made. I think this is a really good example, one that should be given in explanations of why having a first-class object helps. (There are other reasons too, as you know.)

Same with Futures. The example you're using is also known as the "dogpile effect" in the database world. But I don't think it's a particularly good motivating example for "idea 1" -- it requires a fair amount of set-up to pull it off. (Although I think the set-up is less with Futures than with Deferreds. :-)
 
Idea 2: Pass results from callback to callback

I think it's best to show some code first:

class Deferred:
    def __init__(self):
        self.callbacks = []
    def addCallback(self, callback):
        self.callbacks.append(callback)  # Bug here
    def callback(self, result):
        for cb in self.callbacks:
            result = cb(result)

The most interesting bits are the last two lines: the result of each callback is passed to the next. This is different from how things work in concurrent.futures and Tulip, where the result (once set) is fixed as an attribute of the Future. Here the result can be modified by each callback.

There are cases for wanting to pass along a modified result or not wanting to. A problem with doing things this way is that you can't always safely pass the same deferred back to multiple independent callers. I've been spending a lot of time looking at deferred packages in Javascript (there are many of them), and these tend to follow the Promises/A+ spec. In that setup you can always pass a deferred (a promise, as they call it) to any number of independent callers and it is impossible for them to interfere with one another (by adding callbacks). You can also get the modified chained result behavior, by calling .then which (as you note) creates a new promise. The creation of a new object is an overhead of course, but I think on balance I prefer that model as it gives you both options and has the ironclad safety.

Yes, this is closer to Tulip's Futures. Contrary to Promises/A (what a terrible name :-) Tulip Futures (like the PEP 3148 Futures they emulate) don't have separate errbacks -- the intention is that you do error handling in try/except clauses in the generators/coroutines that you should use for the majority of your code (when using Tulip in its "native" form).
 
There is one non-obvious bug in this version of the class: if addCallback() is called after the Deferred has already fired (i.e. its callback() method was called) then the callback added by addCallback() will never be called.

That's incorrect, I'm not sure how you got that impression. This is an important/fantastic part of the point of deferreds: you don't need to know if they have already fired or not.

You  misunderstood me. Note that I start with "in this version of the class". The bug is in the code that I show throughout the article, not in Twisted. I'll try to update the words to make this clear.

I could say a bit more about the rest of your post, but I think others would (and will) do a better job.

Terry

Could you add comments to the Google Doc I made of it?
https://docs.google.com/document/d/10WOZgLQaYNpOrag-eTbUm-JUCCfdyfravZ4qSOQPg1M


--
--Guido van Rossum (python.org/~guido)
Re: [python-tulip] Re: Deconstructing Deferred Guido van Rossum 8/21/13 10:28 AM

On Wed, Aug 21, 2013 at 9:48 AM, Terry Jones <terry...@gmail.com> wrote:
Hi Benjamin

I think you've got your history a bit the wrong way around. Twisted deferreds pre-date the existence of jQuery, Q, etc. As I understand it, the history of Deferreds in Javascript begins with Dojo deferreds (based explicitly on Twisted), and then the Promises/A and Promises/A+ efforts which led to many things (jQuery deferreds, Q, when, rsvp) etc., all of which are MUCH more recent than Twisted's deferreds.

Also in many cases using sufficiently different semantics that you can only conclude that the authors of those JS packages never really understood Twisted's Deferred.



--
--Guido van Rossum (python.org/~guido)
Re: Deconstructing Deferred Terry Jones 8/21/13 10:42 AM

Well, it was mostly so I can have an informed opinion when stating that Tulip won't be adopting Deferred's model. :-)

OK, good to know you went in with an open mind :-)
 
It has the following:

By contrast, promise-based functions always let you treat the result of the function as a value in a time-independent way. When you invoke a callback-based function, there is some time between you invoking the function and its callback being invoked during which there is no representation of the result anywhere in the program.

Yeah, this is basically "idea 1" right?

Yes.
 
Same with Futures. The example you're using is also known as the "dogpile effect" in the database world.

Ah, thanks - I didn't know that.
 
But I don't think it's a particularly good motivating example for "idea 1" -- it requires a fair amount of set-up to pull it off. (Although I think the set-up is less with Futures than with Deferreds. :-)

cache[arg] = defer.maybeDeferred(func, arg)

is pretty succinct :-)   Well, I am assuming that those who receive the deferred don't interfere with one another (see my last post), and for simplicity that the func takes just one arg. The above would be perfectly safe in most JS implementations.

BTW, another thing normal in the JS world is being able to separate the promise from the deferred (or the "resolver"), which I also like. I'll have to go look at the Tulip proposal properly, maybe I'm just mentioning things that are already there.

Yes, this is closer to Tulip's Futures. Contrary to Promises/A (what a terrible name :-)

Agreed!
 
You  misunderstood me. Note that I start with "in this version of the class". The bug is in the code that I show throughout the article, not in Twisted. I'll try to update the words to make this clear.

Ah, ok, that's a relief :-)

Could you add comments to the Google Doc I made of it? 
https://docs.google.com/document/d/10WOZgLQaYNpOrag-eTbUm-JUCCfdyfravZ4qSOQPg1M

Sure.

Terry

Re: [python-tulip] Re: Deconstructing Deferred Terry Jones 8/21/13 10:47 AM
On Wednesday, August 21, 2013 6:28:30 PM UTC+1, Guido van Rossum wrote:

Also in many cases using sufficiently different semantics that you can only conclude that the authors of those JS packages never really understood Twisted's Deferred.

What seem to be the most popular JS packages (Q, when, rsvp) all follow Promises/A+  (note the plus), which I think is good.

A glaring exception is jQuery deferreds which I understand to have been loosely based on the earlier and much less well specified Promises/A. The error processing in jQuery deferreds leaves something to be desired. I've been trying to find out if there's a plan to bring jQuery into line with Promises/A+  I'm almost done co-writing an O'Reilly book on jQuery deferreds, trying to bring deferreds to wider attention (and to then point people to other JS implementations).

Terry

Re: [python-tulip] Re: Deconstructing Deferred Guido van Rossum 8/21/13 11:54 AM
On Wed, Aug 21, 2013 at 10:42 AM, Terry Jones <terry...@gmail.com> wrote:
 
But I don't think it's a particularly good motivating example for "idea 1" -- it requires a fair amount of set-up to pull it off. (Although I think the set-up is less with Futures than with Deferreds. :-)

cache[arg] = defer.maybeDeferred(func, arg)

is pretty succinct :-)   Well, I am assuming that those who receive the deferred don't interfere with one another (see my last post), and for simplicity that the func takes just one arg. The above would be perfectly safe in most JS implementations.

Maybe that's because most JS implementations don't implement idea 2 (passing values between callbacks). With Twisted Deferred, having the Deferred object in hand doesn't give you the value -- you must use addCallback() to add your callback that will be notified when the value is ready (which may be immediately if the Deferred is already complete). But the *next* user of the same Deferred object will have add their own callback (there's no documented API to get the result from the Deferred object) and they will see whatever return value *your* callback produced. So if you write your callback carelessly (e.g. returning None), you spoil the result for everyone after you.

With Tulip Futures or most other types of Promises and Futures that's not a problem. As Ben Darnell writes in a comment on my Google doc:

"""[...] led me to think of them as "Futures with some extra complexity" when in fact the identity of the Deferred is really about the queue instead of an individual result."""

--
--Guido van Rossum (python.org/~guido)
Re: [python-tulip] Re: Deconstructing Deferred Guido van Rossum 8/21/13 11:57 AM
Any chance of getting the JS world to use a word other than "deferred" for those promise/future APIs that don't follow Twisted's design of passing results from callback to callback (i.e., what I identified as idea 2)? It seems terribly confusing to adopt such a weird term *and* then not even to implement the original intention properly when better terms (either future or promise) are available and match the semantics they actually implement better.
--
--Guido van Rossum (python.org/~guido)
Re: [python-tulip] Re: Deconstructing Deferred Terry Jones 8/21/13 12:15 PM
On Wed, Aug 21, 2013 at 7:54 PM, Guido van Rossum <gu...@python.org> wrote:
Maybe that's because most JS implementations don't implement idea 2 (passing values between callbacks). With Twisted Deferred, having the Deferred object in hand doesn't give you the value -- you must use addCallback() to add your callback that will be notified when the value is ready (which may be immediately if the Deferred is already complete). But the *next* user of the same Deferred object will have add their own callback (there's no documented API to get the result from the Deferred object) and they will see whatever return value *your* callback produced. So if you write your callback carelessly (e.g. returning None), you spoil the result for everyone after you.

Right, that's what I meant.  I prefer the immutable approach, given that there is also a way to construct a chain of Deferreds (or whatever you call them) to pass on a modified result. It seems better/simpler to take that approach, despite its overhead, because the separation is clean, the underlying code is simpler, and the pass-on-a-modified-result behavior can be built simply using another Deferred.  I have a feeling we're saying exactly the same thing, though perhaps for different reasons.

Terry
Re: [python-tulip] Re: Deconstructing Deferred Terry Jones 8/21/13 12:44 PM
On Wednesday, August 21, 2013 7:57:32 PM UTC+1, Guido van Rossum wrote:
Any chance of getting the JS world to use a word other than "deferred" for those promise/future APIs that don't follow Twisted's design of passing results from callback to callback (i.e., what I identified as idea 2)? It seems terribly confusing to adopt such a weird term *and* then not even to implement the original intention properly when better terms (either future or promise) are available and match the semantics they actually implement better.

The JS world almost exclusively uses "Promises", and there is strong momentum behind the Promises/A+ spec (which is very well spelled out, see http://promisesaplus.com/). The page at https://github.com/promises-aplus/promises-spec/blob/master/implementations.md lists 34(!) JS packages that all pass the community-standardized Promises/A+ test suite.

Notable exceptions using "Deferred" are Dojo (which took its language directly from Twisted, AFAIK) but which is *very* close to Promises/A+ in behavior (see http://dojo-toolkit.33424.n3.nabble.com/Becoming-promises-aplus-compliant-td3995934.html) and jQuery, which is like Promises/A (sans plus!) and which has bad error handling.

So I think the JS world pretty much has its house in order. 34 compliant (and therefore interoperable) Promise implementations is pretty remarkable.

Terry

Re: [python-tulip] Re: Deconstructing Deferred Guido van Rossum 8/21/13 1:44 PM
Why on earth does the JS world need 34 distinct implementations of the same spec? Because there's no JS standard library? And given jQuery's popularity, shouldn't it follow suit and drop the "deferred" moniker?
--
--Guido van Rossum (python.org/~guido)
Re: [python-tulip] Re: Deconstructing Deferred Terry Jones 8/21/13 2:02 PM
On Wed, Aug 21, 2013 at 9:44 PM, Guido van Rossum <gu...@python.org> wrote:
Why on earth does the JS world need 34 distinct implementations of the same spec? Because there's no JS standard library?

I'm guessing that people write implementations in part just for the fun of it. I find the whole idea of promises so attractive and elegant that I find that easy to understand why people would do it for the pleasure (but it's a complete guess, I have no hard data).  Also, the Promises/A+ spec only specifies that your promises implement *one* function, then(), though it very carefully spells out the dynamics around callbacks and errbacks.

And given jQuery's popularity, shouldn't it follow suit and drop the "deferred" moniker?

I hope it will, and I'm trying a little to find out the answer to that question (and whether they'll fix error processing and follow Promises/A+).

Confusingly, jQuery calls its objects "deferreds" but after you make one of them what you pass back to your caller is a "promise" (which you get by calling the promise() method on the deferred).  Unfortunately this is completely backwards from the way Wikipedia talks about Promises and Futures:

Specifically, when usage is distinguished, a future is a read-only placeholder view of a variable, while a promise is a writable, single assignment container which sets the value of the future.
Re: [python-tulip] Re: Deconstructing Deferred Guido van Rossum 8/21/13 2:12 PM
On Wed, Aug 21, 2013 at 2:02 PM, Terry Jones <terry...@gmail.com> wrote:
Confusingly, jQuery calls its objects "deferreds" but after you make one of them what you pass back to your caller is a "promise" (which you get by calling the promise() method on the deferred).  Unfortunately this is completely backwards from the way Wikipedia talks about Promises and Futures:

Specifically, when usage is distinguished, a future is a read-only placeholder view of a variable, while a promise is a writable, single assignment container which sets the value of the future.

Indeed. The sentence you quote is clear as mud to me, and the few sentences following it on the wiki page don't add anything to help me understand. What is a "view of a variable"? How can the *container* *set* the future's value?

That whole wiki page seems a hodgepodge edited by fans of various mostly-forgotten or rarely-used languages that claim or once claimed to implement the "true" concept of futures or promises, interspersed with the occasional random JavaScript link. :-(

--
--Guido van Rossum (python.org/~guido)
Re: Deconstructing Deferred Alberto Berti 8/23/13 4:24 AM
>>>>> "Terry" == Terry Jones <terry...@gmail.com> writes:

    Terry> On Wednesday, August 21, 2013 7:57:32 PM UTC+1, Guido van Rossum wrote:

    Terry> The JS world almost exclusively uses "Promises", and there is

    Terry> ...

    Terry> Notable exceptions using "Deferred" are Dojo (which took its
    Terry> language directly from Twisted, AFAIK) but which is *very*
    Terry> close to Promises/A+ in behavior (see
    Terry> http://dojo-toolkit.33424.n3.nabble.com/Becoming-promises-aplus-compliant-td3995934.html)
    Terry> and jQuery, which is like Promises/A (sans plus!) and which
    Terry> has bad error handling.

another exception is MochiKit.Async.Deferred that maybe predates dojo's?
It a direct porting of Twisted's Deferred to JS


http://mochi.github.io/mochikit/doc/html/MochiKit/Async.html

Re: [python-tulip] Re: Deconstructing Deferred Alex Gaynor 8/23/13 5:22 AM
Mochikit's deferred does predate Dojo's. It was written by Bob Ippolito, who many will recognize as a long time Pythonista and twisted person.

Alex
--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084
More topics »