Calling coroutines within asyncio.Protocol.data_received

2,219 views
Skip to first unread message

Tobias Oberstein

unread,
Dec 23, 2013, 12:52:55 PM12/23/13
to python...@googlegroups.com
Hi,

I am wondering how to call coroutines within asyncio.Protocol.data_received:

http://stackoverflow.com/questions/20746619/calling-coroutines-in-asyncio-protocol-data-received/20748559

I am coming from Twisted, where doing asynchronous stuff within the corresponding callback ("dataReceived") is ok.

Now, Twisted does internal buffering of data incoming from a socket.

Am I supposed to do that myself and decouple the (asynchronous) processing via a receive queue and a Future to signal availability of received data?

Like so:

https://github.com/oberstet/scratchbox/blob/master/python/asyncio/client.py
https://github.com/oberstet/scratchbox/blob/master/python/asyncio/server.py

?

Thanks
/Tobias

Guido van Rossum

unread,
Dec 23, 2013, 4:07:52 PM12/23/13
to Tobias Oberstein, python-tulip

Protocol callbacks don't have a return value and can't (or shouldn't) block. You can use async() or Task() to spawn a coroutine. You might also look at open_connection(), which has a lot of the logic you describe.

Tobias Oberstein

unread,
Dec 23, 2013, 4:25:59 PM12/23/13
to gu...@python.org, python-tulip
Am 23.12.2013 22:07, schrieb Guido van Rossum:
> Protocol callbacks don't have a return value and can't (or shouldn't)
> block. You can use async() or Task() to spawn a coroutine. You might

Thanks for making this clear. Just from looking at the signature of
data_received vs Twisted dataReceived, I first (naively) assumed similar
behavior. It's clear now, and in fact I got it working in the meantime
(using a deque() and a Future to signal).

I have now another issue. I am _extending_ support of a WebSocket
framework (https://github.com/tavendo/AutobahnPython) from Twisted/Py2
to Py3 and asyncio. I am _nearly_ there. 90% code is shared between
Twisted and asyncio. But, in the shared code, I have multiple places:

if PY3 and self._coroutines:
yield from self.consumeData()
else:
self.consumeData()

I guess that works on Py3/Twisted (if Twisted would support Py3 some
time), but it breaks Py2 .. "yield from" is a syntax error, even if PY3
is False. There is no #ifndef in Python.

Any hint on what to do?

The problem is that to make coroutines work for user code, the whole
call-chain _within_ the framework needs to be using "yield from".

/Tobias

> also look at open_connection(), which has a lot of the logic you describe.
>
> On Dec 23, 2013 7:52 AM, "Tobias Oberstein" <tobias.o...@gmail.com

Antoine Pitrou

unread,
Dec 23, 2013, 4:39:17 PM12/23/13
to python...@googlegroups.com
On Mon, 23 Dec 2013 22:25:59 +0100
Tobias Oberstein
<tobias.o...@gmail.com> wrote:
> Am 23.12.2013 22:07, schrieb Guido van Rossum:
> > Protocol callbacks don't have a return value and can't (or shouldn't)
> > block. You can use async() or Task() to spawn a coroutine. You might
>
> Thanks for making this clear. Just from looking at the signature of
> data_received vs Twisted dataReceived, I first (naively) assumed similar
> behavior. It's clear now, and in fact I got it working in the meantime
> (using a deque() and a Future to signal).

I'm curious: what is the difference in behaviour between Twisted's
dataReceived and asyncio's data_received?

> I have now another issue. I am _extending_ support of a WebSocket
> framework (https://github.com/tavendo/AutobahnPython) from Twisted/Py2
> to Py3 and asyncio. I am _nearly_ there. 90% code is shared between
> Twisted and asyncio. But, in the shared code, I have multiple places:
>
> if PY3 and self._coroutines:
> yield from self.consumeData()
> else:
> self.consumeData()

I don't really understand. If consumeData() isn't supposed to wait on
I/O, you don't need to "yield from" it.

> I guess that works on Py3/Twisted (if Twisted would support Py3 some
> time), but it breaks Py2 .. "yield from" is a syntax error, even if PY3
> is False. There is no #ifndef in Python.

If you want to write code that's compatible with Python 2, you
shouldn't use "yield from" at all. Just write callback-style code,
rather than coroutine-style.

If you're a bit stuck, you can take a look at Obelus:
https://pypi.python.org/pypi/obelus/

Regards

Antoine.


Tobias Oberstein

unread,
Dec 23, 2013, 5:14:45 PM12/23/13
to Antoine Pitrou, python...@googlegroups.com
Am 23.12.2013 22:39, schrieb Antoine Pitrou:
> On Mon, 23 Dec 2013 22:25:59 +0100
> Tobias Oberstein
> <tobias.o...@gmail.com> wrote:
>> Am 23.12.2013 22:07, schrieb Guido van Rossum:
>>> Protocol callbacks don't have a return value and can't (or shouldn't)
>>> block. You can use async() or Task() to spawn a coroutine. You might
>>
>> Thanks for making this clear. Just from looking at the signature of
>> data_received vs Twisted dataReceived, I first (naively) assumed similar
>> behavior. It's clear now, and in fact I got it working in the meantime
>> (using a deque() and a Future to signal).
>
> I'm curious: what is the difference in behaviour between Twisted's
> dataReceived and asyncio's data_received?

Twisted does do internal buffering and allows me to use the Twisted
coroutine analog "inlineCallbacks" within dataReceived. With asyncio, I
do the buffering myself and trigger a coroutine waiting on a Future to
be signaled.

In short: I can't call a coroutine from data_received, but I can call an
inlineCallbacks-decorated functions in dataReceived. This is a major
difference.

>
>> I have now another issue. I am _extending_ support of a WebSocket
>> framework (https://github.com/tavendo/AutobahnPython) from Twisted/Py2
>> to Py3 and asyncio. I am _nearly_ there. 90% code is shared between
>> Twisted and asyncio. But, in the shared code, I have multiple places:
>>
>> if PY3 and self._coroutines:
>> yield from self.consumeData()
>> else:
>> self.consumeData()
>
> I don't really understand. If consumeData() isn't supposed to wait on
> I/O, you don't need to "yield from" it.

consumeData() will ultimately call into user code, and I want that user
code to be able to be a co-routine.

In fact, I am using a maybe_yield idiom:

http://stackoverflow.com/a/20742763/884770

"similar" to Twisted's maybeDeferred.

>
>> I guess that works on Py3/Twisted (if Twisted would support Py3 some
>> time), but it breaks Py2 .. "yield from" is a syntax error, even if PY3
>> is False. There is no #ifndef in Python.
>
> If you want to write code that's compatible with Python 2, you
> shouldn't use "yield from" at all. Just write callback-style code,
> rather than coroutine-style.

I am writing a framework, and I want to leave it to user's choice to use
coroutines.

>
> If you're a bit stuck, you can take a look at Obelus:
> https://pypi.python.org/pypi/obelus/

Does that allow users to write app code as coroutines?

/Tobias

>
> Regards
>
> Antoine.
>
>

Antoine Pitrou

unread,
Dec 23, 2013, 5:22:22 PM12/23/13
to python...@googlegroups.com
On Mon, 23 Dec 2013 23:14:45 +0100
Tobias Oberstein
<tobias.o...@gmail.com> wrote:
>
> Twisted does do internal buffering and allows me to use the Twisted
> coroutine analog "inlineCallbacks" within dataReceived. With asyncio, I
> do the buffering myself and trigger a coroutine waiting on a Future to
> be signaled.
>
> In short: I can't call a coroutine from data_received, but I can call an
> inlineCallbacks-decorated functions in dataReceived. This is a major
> difference.

I'm willing to bet that you're doing something wrong here. Can you post
code snippets?

> >> I have now another issue. I am _extending_ support of a WebSocket
> >> framework (https://github.com/tavendo/AutobahnPython) from Twisted/Py2
> >> to Py3 and asyncio. I am _nearly_ there. 90% code is shared between
> >> Twisted and asyncio. But, in the shared code, I have multiple places:
> >>
> >> if PY3 and self._coroutines:
> >> yield from self.consumeData()
> >> else:
> >> self.consumeData()
> >
> > I don't really understand. If consumeData() isn't supposed to wait on
> > I/O, you don't need to "yield from" it.
>
> consumeData() will ultimately call into user code, and I want that user
> code to be able to be a co-routine.

So how it is supposed to work in the Python 2 case if you're simply
calling `self.consumeData()` without either yielding it or registering a
Deferred somewhere?

> > If you want to write code that's compatible with Python 2, you
> > shouldn't use "yield from" at all. Just write callback-style code,
> > rather than coroutine-style.
>
> I am writing a framework, and I want to leave it to user's choice to use
> coroutines.

That's completely orthogonal to whether your framework uses
coroutines for its internal implementation.

Coroutines are a convenience layer over plain callbacks; actually,
asyncio.Task is a subclass of asyncio.Future.

> > If you're a bit stuck, you can take a look at Obelus:
> > https://pypi.python.org/pypi/obelus/
>
> Does that allow users to write app code as coroutines?

I've never tried, but there's no reason for it not to allow it. Some
glue code might be necessary, which I wasn't interested in writing.

Regards

Antoine.


Tobias Oberstein

unread,
Dec 23, 2013, 5:25:58 PM12/23/13
to Antoine Pitrou, python...@googlegroups.com
>> In short: I can't call a coroutine from data_received, but I can call an
>> inlineCallbacks-decorated functions in dataReceived. This is a major
>> difference.
>
> I'm willing to bet that you're doing something wrong here. Can you post
> code snippets?

http://stackoverflow.com/questions/20746619/calling-coroutines-in-asyncio-protocol-data-received

Antoine Pitrou

unread,
Dec 23, 2013, 5:29:19 PM12/23/13
to python...@googlegroups.com
On Mon, 23 Dec 2013 23:25:58 +0100
Tobias Oberstein
I think you should instead write your coroutine as a separate function
or method and then schedule it from data_received() using e.g.
asyncio.async().

Perhaps it would be more convenient to be able to write data_received()
as a coroutine; that's a possible enhancement.

Regards

Antoine.


Tobias Oberstein

unread,
Dec 23, 2013, 6:43:51 PM12/23/13
to Antoine Pitrou, python...@googlegroups.com
Am 23.12.2013 23:29, schrieb Antoine Pitrou:
> On Mon, 23 Dec 2013 23:25:58 +0100
> Tobias Oberstein
> <tobias.o...@gmail.com> wrote:
>>>> In short: I can't call a coroutine from data_received, but I can call an
>>>> inlineCallbacks-decorated functions in dataReceived. This is a major
>>>> difference.
>>>
>>> I'm willing to bet that you're doing something wrong here. Can you post
>>> code snippets?
>>
>> http://stackoverflow.com/questions/20746619/calling-coroutines-in-asyncio-protocol-data-received
>
> I think you should instead write your coroutine as a separate function
> or method and then schedule it from data_received() using e.g.
> asyncio.async().

I can't see how that would help with "yield from" proliferation. I guess
I can sum up my issue like this:

If I want users to be able to write their app level protocol handlers as
coroutines, that means I have to use "yield from" everywhere inside my
framework. And that does not work on Python 2. So I cannot have one
(mostly) shared codebase. Twisted is far less intrusive. Or maybe I am
just too dumb.

>
> Perhaps it would be more convenient to be able to write data_received()
> as a coroutine; that's a possible enhancement.

That would be nice, but is a minor problem. I do already have a working
solution for this.

Thanks for your comments!
/Tobias

>
> Regards
>
> Antoine.
>
>

Antoine Pitrou

unread,
Dec 23, 2013, 6:47:12 PM12/23/13
to python...@googlegroups.com
On Tue, 24 Dec 2013 00:43:51 +0100
Tobias Oberstein
<tobias.o...@gmail.com> wrote:
> Am 23.12.2013 23:29, schrieb Antoine Pitrou:
> > On Mon, 23 Dec 2013 23:25:58 +0100
> > Tobias Oberstein
> > <tobias.o...@gmail.com> wrote:
> >>>> In short: I can't call a coroutine from data_received, but I can call an
> >>>> inlineCallbacks-decorated functions in dataReceived. This is a major
> >>>> difference.
> >>>
> >>> I'm willing to bet that you're doing something wrong here. Can you post
> >>> code snippets?
> >>
> >> http://stackoverflow.com/questions/20746619/calling-coroutines-in-asyncio-protocol-data-received
> >
> > I think you should instead write your coroutine as a separate function
> > or method and then schedule it from data_received() using e.g.
> > asyncio.async().
>
> I can't see how that would help with "yield from" proliferation. I guess
> I can sum up my issue like this:
>
> If I want users to be able to write their app level protocol handlers as
> coroutines, that means I have to use "yield from" everywhere inside my
> framework.

No, that doesn't mean that. Did you misunderstand what I wrote?

Regards

Antoine.


Tobias Oberstein

unread,
Dec 23, 2013, 7:12:57 PM12/23/13
to Antoine Pitrou, python...@googlegroups.com
>> If I want users to be able to write their app level protocol handlers as
>> coroutines, that means I have to use "yield from" everywhere inside my
>> framework.
>
> No, that doesn't mean that. Did you misunderstand what I wrote?
>

Alright, I guess I misunderstood what you said.

So you say I call the user-level functions (which may be coroutines)
like this from within my protocol implementation?

res = self.onMessage(message)
if PY3 and (isinstance(res, asyncio.futures.Future) or
inspect.isgenerator(res)):
asyncio.async(res)

=> onMessage is the user code

This seems to work .. and is far less intrusive!

Now if that is what you meant, could you please confirm?

Also: will asyncio.async schedule via reentering the event loop or how?
I am asking since these user-level functions will get called for each
and every received WebSocket message .. very often. Any performance caveats?

Thanks so much!

/Tobias

Antoine Pitrou

unread,
Dec 27, 2013, 12:35:24 PM12/27/13
to python...@googlegroups.com
On Tue, 24 Dec 2013 01:12:57 +0100
Tobias Oberstein
<tobias.o...@gmail.com> wrote:
>
> So you say I call the user-level functions (which may be coroutines)
> like this from within my protocol implementation?
>
> res = self.onMessage(message)
> if PY3 and (isinstance(res, asyncio.futures.Future) or
> inspect.isgenerator(res)):
> asyncio.async(res)
>
> => onMessage is the user code

Yes. Of course, ideally there should be a terser way of saying
"(isinstance(res, asyncio.futures.Future) or inspect.isgenerator(res))"
:-)

> Also: will asyncio.async schedule via reentering the event loop or how?

Off the top of my head, no. But better confirm by reading the source
code :-)

Regards

Antoine.


Guido van Rossum

unread,
Dec 27, 2013, 1:21:07 PM12/27/13
to Antoine Pitrou, python-tulip
On Fri, Dec 27, 2013 at 9:35 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Tue, 24 Dec 2013 01:12:57 +0100
> Tobias Oberstein
> <tobias.o...@gmail.com> wrote:
>>
>> So you say I call the user-level functions (which may be coroutines)
>> like this from within my protocol implementation?
>>
>> res = self.onMessage(message)
>> if PY3 and (isinstance(res, asyncio.futures.Future) or
>> inspect.isgenerator(res)):
>> asyncio.async(res)
>>
>> => onMessage is the user code
>
> Yes. Of course, ideally there should be a terser way of saying
> "(isinstance(res, asyncio.futures.Future) or inspect.isgenerator(res))"
> :-)

You shouldn't use inspect.isgenerator() though -- you should use
asyncio.iscoroutine(). (It's currently undocumented but we should fix
that.)

The reason there's no more compact way to express this is probably
that in most cases you shouldn't dynamically have to know this.
Usually you already know that something is waitable, and then you just
use "yield from" on it without the test, or you need a Future anyway
(e.g. to add a callback) and then you just use asyncio.async().

>> Also: will asyncio.async schedule via reentering the event loop or how?
>
> Off the top of my head, no. But better confirm by reading the source
> code :-)

Not sure what the question means. But when you call async() you are
still responsible for yielding to the event loop -- there is no second
event loop that runs background tasks for you in a separate thread.
:-)

(Although an interesting area of research might be to figure out if we
can invent a mixture of threads and event loops to emulate Go's
goroutines, which are such a mixture supported by special syntax in
the compiler.)

--
--Guido van Rossum (python.org/~guido)

Tobias Oberstein

unread,
Dec 28, 2013, 8:09:03 AM12/28/13
to Antoine Pitrou, python...@googlegroups.com
Hi Antoine,

In the meanwhile, I got Autobahn running on Python3/asyncio.

Here is a complete example of WebSocket client and server, running on
asyncio

https://github.com/tavendo/AutobahnPython/tree/master/examples/asyncio/websocket/echo

and running on Twisted

https://github.com/tavendo/AutobahnPython/tree/master/examples/twisted/websocket/echo

Autobahn (http://autobahn.ws/) is a feature rich, compliant and
performant WebSocket and WAMP client/server framework. And it's exciting
that we can support both Twisted and asyncio at the same time;)

>> Also: will asyncio.async schedule via reentering the event loop or how?
>
> Off the top of my head, no. But better confirm by reading the source
> code :-)

The code seems to indicate that the Task will only run after the event
loop is reentered

http://code.google.com/p/tulip/source/browse/asyncio/tasks.py#157

Or do I get this wrong?

Anway, I will do some performance benchmarking to compare Twisted and
asyncio ..

/Tobias

Tobias Oberstein

unread,
Dec 28, 2013, 8:20:52 AM12/28/13
to gu...@python.org, Antoine Pitrou, python-tulip
> You shouldn't use inspect.isgenerator() though -- you should use
> asyncio.iscoroutine(). (It's currently undocumented but we should fix
> that.)

Thanks! I adjusted my code accordingly.

> The reason there's no more compact way to express this is probably
> that in most cases you shouldn't dynamically have to know this.
> Usually you already know that something is waitable, and then you just
> use "yield from" on it without the test, or you need a Future anyway
> (e.g. to add a callback) and then you just use asyncio.async().

I see. In my case (a WebSocket implementation), I don't know in advance,
since the code called is user supplied (eg user/app code that runs when
a WebSocket message has been received/parsed).

>>> Also: will asyncio.async schedule via reentering the event loop or how?
>>
>> Off the top of my head, no. But better confirm by reading the source
>> code :-)
>
> Not sure what the question means. But when you call async() you are
> still responsible for yielding to the event loop -- there is no second
> event loop that runs background tasks for you in a separate thread.
> :-)

I wasn't thinking of threads, but more wondering about the following.

I am buffering data received in asyncio.Protocol.data_received in
a deque

https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L80

and the data buffered is then processed in a (recurring) Task

https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L71

That Task will parse the raw octets, and fire user code. This firing
will wrap the user code calling in asyncio.async()

https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L98

_if_ the user code yields.

And I was wondering: _if_ the user code is wrapped in asyncio.async,
how/when will it run?

Since it seems Tasks use call_soon() to schedule their execution

http://code.google.com/p/tulip/source/browse/asyncio/tasks.py#157

I guess the user code wrapped in Task will run upon the next event loop
iteration?

Also: Autobahn now works with above design (and I have 95% shared code
between Twisted and asyncio), but is this how you intended asyncio to be
used, or am I misusing / not following best practice in some way? I am
totally new to asyncio, coming from Twisted ..

/Tobias






Aymeric Augustin

unread,
Dec 28, 2013, 9:38:40 AM12/28/13
to Tobias Oberstein, Guido van Rossum, Antoine Pitrou, python-tulip
On 28 déc. 2013, at 14:20, Tobias Oberstein <tobias.o...@gmail.com> wrote:

> Also: Autobahn now works with above design (and I have 95% shared code between Twisted and asyncio), but is this how you intended asyncio to be used, or am I misusing / not following best practice in some way? I am totally new to asyncio, coming from Twisted ..

Hi Tobias,

To the best of my understanding, libraries based on asyncio often exhibit the following design:
1) The “low level” is implemented in “push” mode with callbacks.
2) The “high level” is exposed in “pull” mode with coroutines.
3) At some point in-between, there’s a switch from callback-style to coroutine-style.

In this regard Autobahn strongly contrasts with my own websockets library.

I’m switching immediately to corountine-style with a StreamReader:
https://github.com/aaugustin/websockets/blob/a13be1f1/websockets/protocol.py#L344-L355

If I understand correctly, you’re staying in callback-style up to the user level:
https://github.com/tavendo/AutobahnPython/blob/3a8c87fc/examples/asyncio/websocket/echo/client.py#L40-L44

(Obvisouly, this makes sense when you’re attempting to share as much code as possible with Twisted.)

Antoine finds the lack of a callback-based API in websockets regrettable. (It’s in the archives of this mailing list). Autobahn is nicely filling this gap.

However, to provide a “native” asyncio experience, an coroutine-based API would be nice, and it shouldn’t be particularly hard to implement.

Here’s the kind of API I’m talking about: http://aaugustin.github.io/websockets/#example. (Most of my effort on websockets went into designing the API.)

I hope this helps,

--
Aymeric.

PS: thanks a lot for the Autobahn test suite, it’s been incredibly helpful to validate websockets.

Tobias Oberstein

unread,
Dec 28, 2013, 10:44:02 AM12/28/13
to Aymeric Augustin, Guido van Rossum, Antoine Pitrou, python-tulip
Hi Aymeric,

Am 28.12.2013 15:38, schrieb Aymeric Augustin:
> On 28 d�c. 2013, at 14:20, Tobias Oberstein <tobias.o...@gmail.com> wrote:
>
>> Also: Autobahn now works with above design (and I have 95% shared code between Twisted and asyncio), but is this how you intended asyncio to be used, or am I misusing / not following best practice in some way? I am totally new to asyncio, coming from Twisted ..
>
> Hi Tobias,
>
> To the best of my understanding, libraries based on asyncio often exhibit the following design:
> 1) The �low level� is implemented in �push� mode with callbacks.
> 2) The �high level� is exposed in �pull� mode with coroutines.
> 3) At some point in-between, there�s a switch from callback-style to coroutine-style.

Interesting. I see.

> In this regard Autobahn strongly contrasts with my own websockets library.
>
> I�m switching immediately to corountine-style with a StreamReader:
> https://github.com/aaugustin/websockets/blob/a13be1f1/websockets/protocol.py#L344-L355

Autobahn's asyncio integration essentially does the same as StreamReader
under the hood:

https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L80

The reason is, that - currently - firing coroutines from data_received
seems to be not possible. And thats probably fine. In any case, above
buffering works. I'll see what the performance impacts are.

> If I understand correctly, you�re staying in callback-style up to the user level:
> https://github.com/tavendo/AutobahnPython/blob/3a8c87fc/examples/asyncio/websocket/echo/client.py#L40-L44

I guess yes. But users can implement their stuff as coroutines also:

https://github.com/tavendo/AutobahnPython/blob/master/examples/asyncio/websocket/echo/client_coroutines.py#L31

This also translates 1:1 to Twisted:

https://github.com/tavendo/AutobahnPython/blob/master/examples/twisted/websocket/echo/client_coroutines.py#L36

This is still a "push like" API, not "pull".

> (Obvisouly, this makes sense when you�re attempting to share as much code as possible with Twisted.)

I'd say it's more a consequence of exposing a "push API" - you could do
a "pull" API in Twisted as well.

> Antoine finds the lack of a callback-based API in websockets regrettable. (It�s in the archives of this mailing list). Autobahn is nicely filling this gap.

To be honest, I agree on that. A "pull style" API for WebSocket feels
unnatural to me. But it's a matter of taste, sure.

Autobahn's API was designed along the API for WebSocket in browsers.
Which is "push style". As are many other APIs in other languages, e.g.
Java JSR356.

> However, to provide a �native� asyncio experience, an coroutine-based API would be nice, and it shouldn�t be particularly hard to implement.

As mentioned above, users can implement their hooks (onMessage() etc) as
coroutines. I don't have plans to implement a pull-style API, since ..
yeah, unnatural;)

>
> Here�s the kind of API I�m talking about: http://aaugustin.github.io/websockets/#example. (Most of my effort on websockets went into designing the API.)

Interesting. This somehow reminds me of the difference between the Unix
poll based networking (notify on when stuff _can_ be done, then do the
stuff and fire callbacks) vs Windows IOCP: do stuff asynchronously and
come back when done.

>
> I hope this helps,
>

Yes, definitely brings in some perspective! Thanks.

Rgd Autobahn testsuite: good to hear it was of use to you! Btw: do you
have reports online? From a short skimming over the code, I'd be
interested .. in particular (or also) the UTF8 stuff;)

/Tobias

Tobias Oberstein

unread,
Dec 28, 2013, 10:58:43 AM12/28/13
to gu...@python.org, Antoine Pitrou, python-tulip
> You shouldn't use inspect.isgenerator() though -- you should use
> asyncio.iscoroutine(). (It's currently undocumented but we should fix
> that.)

Is that "iscoroutine" brand new?

oberstet@vbox-ubuntu1310:~/scm/AutobahnPython/examples/asyncio/websocket/testee$
~/python340b1/bin/python3
Python 3.4.0b1 (default, Dec 28 2013, 15:29:21)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import asyncio
>>> asyncio.iscoroutine
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'iscoroutine'


Aymeric Augustin

unread,
Dec 28, 2013, 11:39:01 AM12/28/13
to Tobias Oberstein, Guido van Rossum, Antoine Pitrou, python-tulip
On 28 déc. 2013, at 16:44, Tobias Oberstein <tobias.o...@gmail.com> wrote:

> To be honest, I agree on that. A "pull style" API for WebSocket feels unnatural to me. But it's a matter of taste, sure.

Coroutines seemed easier to me, but that’s probably because I was new to async programming and never wrote significant amounts of callback-based code.

> Rgd Autobahn testsuite: good to hear it was of use to you! Btw: do you have reports online?

I just re-ran the test suite and uploaded the reports at https://myks.org/stuff/websockets-1.0-client/ and https://myks.org/stuff/websockets-1.0-server/

(I didn’t attempt to make my implementation fast, I’m sure there are some easy wins.)

> From a short skimming over the code, I'd be interested .. in particular (or also) the UTF8 stuff;)

You may be thinking about this: https://github.com/aaugustin/websockets/tree/master/compliance#conformance-notes. I’m not sure about the current status; apparently these tests are reported as not strict with version 0.5.2 of the test suite.

--
Aymeric.

Tobias Oberstein

unread,
Dec 28, 2013, 12:36:08 PM12/28/13
to Aymeric Augustin, Guido van Rossum, Antoine Pitrou, python-tulip
>> From a short skimming over the code, I'd be interested .. in particular (or also) the UTF8 stuff;)
>
> You may be thinking about this: https://github.com/aaugustin/websockets/tree/master/compliance#conformance-notes. I�m not sure about the current status; apparently these tests are reported as not strict with version 0.5.2 of the test suite.
>

I am surprised that Python 3 with

codecs.getincrementaldecoder('utf-8')(errors='strict')
https://github.com/aaugustin/websockets/blob/master/websockets/protocol.py#L242

now seems to be able to actually pass all UTF8 tests of Autobahn.

This would be the first language/run-time I know of being able to do so
natively;)

Regarding the 6.4. tests: your implementation seems to do per-frame
incremental validation (but is not fully incremental - intra-frame).
So it's non-strict passing those tests - which is correct.

/Tobias

Guido van Rossum

unread,
Dec 28, 2013, 1:13:59 PM12/28/13
to Tobias Oberstein, Antoine Pitrou, python-tulip
On Sat, Dec 28, 2013 at 5:20 AM, Tobias Oberstein
<tobias.o...@gmail.com> wrote:
>>>> Also: will asyncio.async schedule via reentering the event loop or how?

>>> Off the top of my head, no. But better confirm by reading the source
>>> code :-)

>> Not sure what the question means. But when you call async() you are
>> still responsible for yielding to the event loop -- there is no second
>> event loop that runs background tasks for you in a separate thread.
>> :-)

> I wasn't thinking of threads, but more wondering about the following.
>
> I am buffering data received in asyncio.Protocol.data_received in
> a deque
>
> https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L80
>
> and the data buffered is then processed in a (recurring) Task
>
> https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L71

Why not us an asyncio queue so you can just have a loop in a task
instead of a recurring task?

> That Task will parse the raw octets, and fire user code. This firing will
> wrap the user code calling in asyncio.async()
>
> https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L98
>
> _if_ the user code yields.
>
> And I was wondering: _if_ the user code is wrapped in asyncio.async,
> how/when will it run?
>
> Since it seems Tasks use call_soon() to schedule their execution
>
> http://code.google.com/p/tulip/source/browse/asyncio/tasks.py#157
>
> I guess the user code wrapped in Task will run upon the next event loop
> iteration?

Right.

> Also: Autobahn now works with above design (and I have 95% shared code
> between Twisted and asyncio), but is this how you intended asyncio to be
> used, or am I misusing / not following best practice in some way? I am
> totally new to asyncio, coming from Twisted ..

Seems to me as if perhaps you are trying to find ways to combine
asyncio operations to implement the primitives you are familiar with
from Twisted, rather than figuring out how to best solve your
underlying problem using asyncio. Using asyncio, your best approach is
to think of how you would do it in a sequential world using blocking
I/O, then use coroutines and yield-from for the blocking I/O, then
think about how to introduce some parallelism where you have two
independent blocking operations that don't depend on each other.

Guido van Rossum

unread,
Dec 28, 2013, 1:29:35 PM12/28/13
to Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Sat, Dec 28, 2013 at 7:44 AM, Tobias Oberstein
<tobias.o...@gmail.com> wrote:
> Autobahn's asyncio integration essentially does the same as StreamReader
> under the hood:
>
> https://github.com/tavendo/AutobahnPython/blob/master/autobahn/autobahn/asyncio/websocket.py#L80
>
> The reason is, that - currently - firing coroutines from data_received seems
> to be not possible.

I'm not sure what you mean. Are you still worried about the event loop
round trip? But why worry? If I have a coroutine that needs be started
from a Protocol callback method, I just wrap it in Task() or async()
and forget about it.

> And thats probably fine. In any case, above buffering
> works. I'll see what the performance impacts are.

Please do let us know!

[...]
> To be honest, I agree on that. A "pull style" API for WebSocket feels
> unnatural to me. But it's a matter of taste, sure.

Does a pull-style HTTP server also feel unnatural to you?

I only vaguely recall what Websockets is used for, but isn't is a
cumbersome implementation (constrained by HTTP architecture) for an
elegant abstraction (long-living bidirectional streams) which are
typically used to send requests "upstream" (i.e. from the server to
the client) and send responses back?

I would find it totally natural to have loop in the client that pulls
one request from the stream (blocking until it is ready), processes
it, and sends a response back -- and similar, in the server, to
occasionally send a request to the stream and block until you have the
response. (The latter could all be done by a coroutine designed to
send one request and receive one response.)

> Autobahn's API was designed along the API for WebSocket in browsers. Which
> is "push style". As are many other APIs in other languages, e.g. Java
> JSR356.

As is asyncio when you go low enough in the stack. But my philosophy
is that writing push-style code is a pain, so it should be taken care
of once by the framework, and the application should be written in a
more convenient pull-style, using coroutines with yield-from to wrap
operations that block at the OS level (or are implemented using
push-style callbacks and waiting for Futures).

>> However, to provide a “native” asyncio experience, an coroutine-based API
>> would be nice, and it shouldn’t be particularly hard to implement.

> As mentioned above, users can implement their hooks (onMessage() etc) as
> coroutines.

Hm, that would seem not to benefit fully from asyncio's abstractions.

> I don't have plans to implement a pull-style API, since .. yeah,
> unnatural;)

Please reconsider.

>> Here’s the kind of API I’m talking about:
>> http://aaugustin.github.io/websockets/#example. (Most of my effort on
>> websockets went into designing the API.)
>
> Interesting. This somehow reminds me of the difference between the Unix poll
> based networking (notify on when stuff _can_ be done, then do the stuff and
> fire callbacks) vs Windows IOCP: do stuff asynchronously and come back when
> done.

It's a holy war that asyncio stays out of at the low level by
supporting either. But at the high level asyncio's opinion is that in
terms of coding convenience, pull-style is better, because it's more
familiar to Python programmers. Asyncio (and e.g. StreamReader) put in
the work to bridge the interface gap for you.

Guido van Rossum

unread,
Dec 28, 2013, 1:30:56 PM12/28/13
to Tobias Oberstein, Antoine Pitrou, python-tulip
No, but I neglected to export it. That's now fixed in the repos, but
for better compatibility with Python 3.4b1 I recommend that you use

from asyncio.tasks import iscouroutine

Tobias Oberstein

unread,
Dec 28, 2013, 3:44:18 PM12/28/13
to gu...@python.org, Antoine Pitrou, python-tulip
> Why not us an asyncio queue so you can just have a loop in a task
> instead of a recurring task?

Ah, ok. Thanks! I need to read up on asyncio queues.

>> Also: Autobahn now works with above design (and I have 95% shared code
>> between Twisted and asyncio), but is this how you intended asyncio to be
>> used, or am I misusing / not following best practice in some way? I am
>> totally new to asyncio, coming from Twisted ..
>
> Seems to me as if perhaps you are trying to find ways to combine
> asyncio operations to implement the primitives you are familiar with
> from Twisted, rather than figuring out how to best solve your

Yeah, I've grown up event driven: from C++/ACE, C++/Boost/ASIO to Twisted.

> underlying problem using asyncio. Using asyncio, your best approach is
> to think of how you would do it in a sequential world using blocking
> I/O, then use coroutines and yield-from for the blocking I/O, then
> think about how to introduce some parallelism where you have two
> independent blocking operations that don't depend on each other.
>

Indeed event-driven ("push style") feels more "natural" for me. Writing
"synchronous" ("pull-style") code is a stretch to me. In fact, other
stuff I work with (besides Twisted) is also event-driven, like
JavaScript or Android.

/Tobias

Tobias Oberstein

unread,
Dec 28, 2013, 4:06:54 PM12/28/13
to gu...@python.org, Aymeric Augustin, Antoine Pitrou, python-tulip
>> To be honest, I agree on that. A "pull style" API for WebSocket feels
>> unnatural to me. But it's a matter of taste, sure.
>
> Does a pull-style HTTP server also feel unnatural to you?

In short, yes.

The way I most often do "classic" Web stuff on Python is via Flask:

@app.route('/')
def page_home():
return render_template('home.html')

The decorators route URLs to callbacks which render Jinja templates.

This feels natural.

Doing something like

while True:
request = yield from http.block_until_request()
if request.url == "/":
request.write(render_template('home.html'))

would seem strange to me.

In fact, Autobahn provides a protocol layer above WebSocket for RPC and
PubSub, and the RPC lets you expose Python functions that can be called
via WebSocket (eg from browser JavaScript):

@exportRpc("com.myapp.myfun")
def myfun(a, b):
return a + b

and in JS:

session.call("com.myapp.myfun", 2, 3).then(
function (res) {
console.log(res); // prints 5
}
);

session.call returns a Promise.

Btw: Promises are coming natively to browsers .. they are specified in
ECMAScript6.

>
> I only vaguely recall what Websockets is used for, but isn't is a
> cumbersome implementation (constrained by HTTP architecture) for an
> elegant abstraction (long-living bidirectional streams) which are
> typically used to send requests "upstream" (i.e. from the server to
> the client) and send responses back?

WebSocket is essentially a HTTP compatible opening handshake, that when
finished, establishes a bidirectional, full-duplex, reliable, message
based channel.

>
> I would find it totally natural to have loop in the client that pulls
> one request from the stream (blocking until it is ready), processes
> it, and sends a response back -- and similar, in the server, to
> occasionally send a request to the stream and block until you have the
> response. (The latter could all be done by a coroutine designed to
> send one request and receive one response.)

If I understand right, that API to WebSocket is more like what Aymeric has:

while True:
msg = yield from websocket.recv()
// do something with msg, which is a WebSocket message

>> I don't have plans to implement a pull-style API, since .. yeah,
>> unnatural;)
>
> Please reconsider.

One problem is that this would be totally intrusive into my existing
codebase _and_ API. Another one: Autobahn has multiple implementations
in other languages (JS and Android currently) - and those APIs are
event-driven style also, and having APIs similar across enviroments is a
plus. Another (personal) issue: the synchronous style feels weird to
me;) Maybe I am spoiled already.

> It's a holy war that asyncio stays out of at the low level by
> supporting either. But at the high level asyncio's opinion is that in
> terms of coding convenience, pull-style is better, because it's more
> familiar to Python programmers. Asyncio (and e.g. StreamReader) put in
> the work to bridge the interface gap for you.
>

Thanks alot for pointing this out! I now understand better. Asyncio at
high-level is opinionated towards "pull-style". This is fine - and
important to recognize.

Regarding Autobahn: I think it's already quite cool that we can now
support Twisted and asyncio at the same time - even if "push style" only ..

Thanks for your hints and comments!

/Tobias


Guido van Rossum

unread,
Dec 29, 2013, 9:18:17 PM12/29/13
to Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Sat, Dec 28, 2013 at 11:06 AM, Tobias Oberstein
<tobias.o...@gmail.com> wrote:
>>> To be honest, I agree on that. A "pull style" API for WebSocket feels
>>> unnatural to me. But it's a matter of taste, sure.

>> Does a pull-style HTTP server also feel unnatural to you?

> In short, yes.
>
> The way I most often do "classic" Web stuff on Python is via Flask:
>
> @app.route('/')
> def page_home():
> return render_template('home.html')
>
> The decorators route URLs to callbacks which render Jinja templates.
>
> This feels natural.

You got me there. :-)

What I was thinking of was actually a lower level of the HTTP
infrastructure, where you have to write something that parses the HTTP
protocol (and things like form input or other common request content
types). I'm personally pretty happy with the pull style HTTP parsing I
wrote as an example
(http://code.google.com/p/tulip/source/browse/examples/fetch3.py#112
-- note that this is a client but the server-side header parsing is
about the same).

I thought about this a bit more and I think in the end it comes down
to one's preferred style for writing parsers. Take a parser for a
language like Python -- you can write it in pull style (the lexer
reads a character or perhaps a line at a time, the parser asks the
lexer for a token at a time) or in push style, using a parser
generator (like CPython's parser does). Actually, even there, you can
use one style for the lexer and another style for the parser.

Maybe it's just six of one, half a dozen of the other. Here's an
observation to ponder (for those who haven't made up their mind yet
:-). In either style, you're probably implementing some kind of state
machine.

Using push style, the state machine ends up being represented
explicitly in the form of state variables, e.g. "am I parsing the
status line", "am I parsing the headers", "have I seen the end of the
headers", in addition to some buffers holding a representation of the
stuff you've already parsed (completed headers, request
method/path/version) and the stuff you haven't parsed yet (e.g. the
next incomplete line). Typically those have to be represented as
instance variables on the Protocol (or some higher-level object with a
similar role).

Using pull style, you can often represent the state implicitly in the
form of program location; e.g. an HTTP request/response parser could
start with a readline() call to read the initial request/response,
then a loop reading the headers until a blank line is found, perhaps
an inner loop to handle continuation lines. The buffers may be just
local variables.

I've noticed in the past that some people see state machines
everywhere, and others don't see them even if the problem at hand begs
for one. :-) Personally I probably fall a little on the latter side of
the spectrum -- I have a strong antipathy for Boolean local variables
and often try to get rid of them by rewriting the logic. I also feel
that a lot of programs written in push style end up not handling
errors particularly well -- the nadir of this is your typical
JavaScript embedded in a web page which usually just hangs silently
when the web server times out or sends an unexpected response.

I've only written a small amount of Android code but I sure remember
that it felt nearly impossible to follow the logic of a moderately
complex Android app -- whereas in pull style your abstractions nicely
correspond to e.g. classes or methods (or just functions), in Android
even the simplest logic seemed to be spread across many different
classes, with the linkage between them often expressed separately
(sometimes even in XML or some other dynamic configuration that
requires the reader to switch languages). But I'll add that this was
perhaps due to being a beginner in the Android world (and I haven't
taken it up since).

> Doing something like
>
> while True:
> request = yield from http.block_until_request()
> if request.url == "/":
> request.write(render_template('home.html'))
>
> would seem strange to me.

Yeah, at that level it makes little sense. :-) It seems clear to me
that pull-style and push-style are often combined at different levels
of a software stack.

> In fact, Autobahn provides a protocol layer above WebSocket for RPC and
> PubSub, and the RPC lets you expose Python functions that can be called via
> WebSocket (eg from browser JavaScript):
>
> @exportRpc("com.myapp.myfun")
> def myfun(a, b):
> return a + b

That's nice.

> and in JS:
>
> session.call("com.myapp.myfun", 2, 3).then(
> function (res) {
> console.log(res); // prints 5
> }
> );

I can't read that. :-(

> session.call returns a Promise.
>
> Btw: Promises are coming natively to browsers .. they are specified in
> ECMAScript6.

Well, in the browser world there's little choice but to continue on
the push-based path that was started over a decade ago. That doesn't
mean it's the best programming paradigm. :-)

>> I only vaguely recall what Websockets is used for, but isn't is a
>> cumbersome implementation (constrained by HTTP architecture) for an
>> elegant abstraction (long-living bidirectional streams) which are
>> typically used to send requests "upstream" (i.e. from the server to
>> the client) and send responses back?

> WebSocket is essentially a HTTP compatible opening handshake, that when
> finished, establishes a bidirectional, full-duplex, reliable, message based
> channel.

So would it make sense to build a (modified) transport/protocol
abstraction on top of that for asyncio? It seems the API can't be the
same as the standard asyncio transport/protocol, because message
framing needs to be preserved, but you could probably start with (or
use a variant of) DatagramTransport and DatagramProtocol.

>> I would find it totally natural to have loop in the client that pulls
>> one request from the stream (blocking until it is ready), processes
>> it, and sends a response back -- and similar, in the server, to
>> occasionally send a request to the stream and block until you have the
>> response. (The latter could all be done by a coroutine designed to
>> send one request and receive one response.)

> If I understand right, that API to WebSocket is more like what Aymeric has:
>
> while True:
> msg = yield from websocket.recv()
> // do something with msg, which is a WebSocket message

Yeah, probably.

>>> I don't have plans to implement a pull-style API, since .. yeah,
>>> unnatural;)

Understood, and no problem.

>> Please reconsider.

> One problem is that this would be totally intrusive into my existing
> codebase _and_ API. Another one: Autobahn has multiple implementations in
> other languages (JS and Android currently) - and those APIs are event-driven
> style also, and having APIs similar across enviroments is a plus. Another
> (personal) issue: the synchronous style feels weird to me;) Maybe I am
> spoiled already.

Different strokes for different folks.

>> It's a holy war that asyncio stays out of at the low level by
>> supporting either. But at the high level asyncio's opinion is that in
>> terms of coding convenience, pull-style is better, because it's more
>> familiar to Python programmers. Asyncio (and e.g. StreamReader) put in
>> the work to bridge the interface gap for you.

> Thanks alot for pointing this out! I now understand better. Asyncio at
> high-level is opinionated towards "pull-style". This is fine - and important
> to recognize.

Yup.

> Regarding Autobahn: I think it's already quite cool that we can now support
> Twisted and asyncio at the same time - even if "push style" only ..

Yeah, asyncio uses push-style at the lower levels specifically for
improved interoperability with other frameworks.

> Thanks for your hints and comments!

You're welcome. It's an interesting discussion for me as well.

Tobias Oberstein

unread,
Jan 2, 2014, 7:49:39 AM1/2/14
to gu...@python.org, Aymeric Augustin, Antoine Pitrou, python-tulip
> What I was thinking of was actually a lower level of the HTTP
> infrastructure, where you have to write something that parses the HTTP
> protocol (and things like form input or other common request content
> types). I'm personally pretty happy with the pull style HTTP parsing I
> wrote as an example
> (http://code.google.com/p/tulip/source/browse/examples/fetch3.py#112
> -- note that this is a client but the server-side header parsing is
> about the same).

Looks nice and clean. So this relies on StreamReader for the translation
of the push-style that comes out of the low-level transport
("data_received") to pull-style asyncio.StreamReader.readXXX(). Nice.
>
> I thought about this a bit more and I think in the end it comes down
> to one's preferred style for writing parsers. Take a parser for a
> language like Python -- you can write it in pull style (the lexer
> reads a character or perhaps a line at a time, the parser asks the
> lexer for a token at a time) or in push style, using a parser
> generator (like CPython's parser does). Actually, even there, you can
> use one style for the lexer and another style for the parser.

Interesting analogy. Yes, seems language/syntax parsing a file is
similar to protocol parsing a wire-level stream transport. I wonder
about the "sending leg": with language parsers, this would be probably
the AST. With network protocols, it's more of producing a 2nd stream
conforming again to the same "syntax": for sending to the other peer.

> Using push style, the state machine ends up being represented
> explicitly in the form of state variables, e.g. "am I parsing the
> status line", "am I parsing the headers", "have I seen the end of the
> headers", in addition to some buffers holding a representation of the
> stuff you've already parsed (completed headers, request
> method/path/version) and the stuff you haven't parsed yet (e.g. the
> next incomplete line). Typically those have to be represented as
> instance variables on the Protocol (or some higher-level object with a
> similar role).
>
> Using pull style, you can often represent the state implicitly in the
> form of program location; e.g. an HTTP request/response parser could
> start with a readline() call to read the initial request/response,
> then a loop reading the headers until a blank line is found, perhaps
> an inner loop to handle continuation lines. The buffers may be just
> local variables.

The ability to represent state machine states implicitly in program
location instead of explicit variables indeed seems higher-level / more
abstracted. I have never looked at it that way .. very interesting.

I am wondering what happens if you take timing constraints into account.
Eg. with WebSocket, for DoS protection, one might want to have the
initial opening handshake finished in <N seconds. Hence you want to
check after N seconds if state "HANDSHAKE_FINISHED" has been reached. A

yield from socket.read_handshake()

(simplified) will however just "block" infinitely. So I need a 2nd
coroutine for the timeout. And the timeout will need to check .. an
instance variable for state. Or can I have a timing out yield from?

> I've only written a small amount of Android code but I sure remember
> that it felt nearly impossible to follow the logic of a moderately
> complex Android app -- whereas in pull style your abstractions nicely
> correspond to e.g. classes or methods (or just functions), in Android
> even the simplest logic seemed to be spread across many different
> classes, with the linkage between them often expressed separately
> (sometimes even in XML or some other dynamic configuration that
> requires the reader to switch languages). But I'll add that this was
> perhaps due to being a beginner in the Android world (and I haven't
> taken it up since).

Thats also my experience (but I also have limited exposure to Android):
it can get unwieldly pretty quick.

How would you do a pull-style UI toolkit? Transforming each push-style
callback for UI widgets into pull-style code

yield from button1.onclick()
# handle button1 click

or

evt = yield from ui.onevent()
if evt.target == "button1" and evt.type == "click":
# handle button1 click

The latter leads to one massive, monolithic code block handling all UI
interaction. The former leads to many small "sequential" looking code
pieces .. similar to callbacks. And those "distributed" code pieces
somehow need to interact with each other.

FWIW, the - for me - most comfortable and managable way of doing UI is
via "reactive programming", e.g. in JavaScript http://knockoutjs.com/

Eg. say some "x" is changing asynchronously (like a UI input field
widget) and some "y" needs to be changed _whenever_ "x" changes (like a
UI label).

In reactive programming, I can basically write code

y = f(x)

and the reactive engine will _analyze_ that code, and hook up push-style
callback code under the hood, so that _whenever_ x changes, f() is
_automatically_ reapplied.

Probably better explained here:
http://peak.telecommunity.com/DevCenter/Trellis

MS also seems to like RP: http://rxpy.codeplex.com/

In UI, this, combined with data-binding for widgets from view models, is
a really clean and abstract way.

> Well, in the browser world there's little choice but to continue on
> the push-based path that was started over a decade ago. That doesn't
> mean it's the best programming paradigm. :-)

No, it doesn't;)

I'd put it like this: the classical UI is push-style. As mentioned
above, it would be interesting how pull-style would look like. But the
reactive style gives me a superior way .. again, just my personal
experience/exposure/taste ..

>> WebSocket is essentially a HTTP compatible opening handshake, that when
>> finished, establishes a bidirectional, full-duplex, reliable, message based
>> channel.
>
> So would it make sense to build a (modified) transport/protocol
> abstraction on top of that for asyncio? It seems the API can't be the

I have done that for Twisted:
https://github.com/tavendo/AutobahnPython/tree/master/examples/twisted/websocket/wrapping

This allows you to run _any_ stream-based protocol _over_ WebSocket. Eg
here is a Terminal session to Linux from Browser:

http://picpaste.com/Clipboard24-LLwRCPKG.png

In fact, Twisted endpoints are so cool (sorry for mentioning this here),
I can also run WebSocket over any stream transport, like Unix domain
sockets, pipes, serial:

https://github.com/tavendo/AutobahnPython/tree/master/examples/twisted/websocket/echo_endpoints

> same as the standard asyncio transport/protocol, because message
> framing needs to be preserved, but you could probably start with (or
> use a variant of) DatagramTransport and DatagramProtocol.

Thats the 2nd possibility: capture the essence of WebSocket in a
ReliableOrderedFullDuplexDatagramTransport (just kidding about the name;)

The thing is: the abstracted WebSocket semantics of "reliable, ordered
datagram" neither fit TCP nor UDP.

It does fit SCTP:

http://en.wikipedia.org/wiki/Stream_Control_Transmission_Protocol

SCTP is available natively

http://www.freebsd.org/cgi/man.cgi?query=sctp&sektion=4

in which case you need to use new kernel API (Posix sockets don't have
it) and it can be layered over UDP, which is used eg with the upcoming
WebRTC HTML5 standard:

http://tools.ietf.org/html/draft-ietf-rtcweb-data-channel-06

So I think it really could be worth to define an abstract interface in
asyncio with the appropiate semantics for transports to implement. And
implementing transports then could be WebSocket and SCTP or even
shared-memory message queues ..

Happy new year,
/Tobias

Guido van Rossum

unread,
Jan 3, 2014, 5:13:01 PM1/3/14
to Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
[Trimming...]

On Thu, Jan 2, 2014 at 2:49 AM, Tobias Oberstein
<tobias.o...@gmail.com> wrote:

> Interesting analogy. Yes, seems language/syntax parsing a file is similar to
> protocol parsing a wire-level stream transport. I wonder about the "sending
> leg": with language parsers, this would be probably the AST. With network
> protocols, it's more of producing a 2nd stream conforming again to the same
> "syntax": for sending to the other peer.

But reading and writing are almost always asymmetrical. E.g. for HTTP
I wouldn't see how you could reuse the push code for the input side
(say, the part that parses the headers) to write the response. The
process is just very different.

> I am wondering what happens if you take timing constraints into account. Eg.
> with WebSocket, for DoS protection, one might want to have the initial
> opening handshake finished in <N seconds. Hence you want to check after N
> seconds if state "HANDSHAKE_FINISHED" has been reached. A
>
> yield from socket.read_handshake()
>
> (simplified) will however just "block" infinitely. So I need a 2nd coroutine
> for the timeout. And the timeout will need to check .. an instance variable
> for state. Or can I have a timing out yield from?

You'd write this

yield from asyncio.wait_for(socket.read_handshake(), <timeout_in_seconds>)

The implementation uses a Future instead of a coroutine. When the
timeout happens, the yield-from will raise asyncio.TimeoutError and
asyncio.CancelledError will be thrown into the read_handshake
coroutine (and perhaps deeper, into whatever is *actually* blocking at
this precise point).

> How would you do a pull-style UI toolkit? Transforming each push-style
> callback for UI widgets into pull-style code
>
> yield from button1.onclick()
> # handle button1 click
>
> or
>
> evt = yield from ui.onevent()
> if evt.target == "button1" and evt.type == "click":
> # handle button1 click
>
> The latter leads to one massive, monolithic code block handling all UI
> interaction. The former leads to many small "sequential" looking code pieces
> .. similar to callbacks. And those "distributed" code pieces somehow need to
> interact with each other.

I honestly haven't figured this one out. It seems all popular UI
frameworks use push style. I recall hearing about a system called NeWS
(http://en.wikipedia.org/wiki/NeWS) that IIRC used a pull-style APIs
for a mouse-based UI. This was in the '80s, before the necessary
abstractions were widely agreed on. E.g. the mouse-tracking logic for
drawing a shape or selecting objects was done using a loop that
blocked until the mouse button was released. I don't think it was very
successful though.

> FWIW, the - for me - most comfortable and managable way of doing UI is via
> "reactive programming", e.g. in JavaScript http://knockoutjs.com/
>
> Eg. say some "x" is changing asynchronously (like a UI input field widget)
> and some "y" needs to be changed _whenever_ "x" changes (like a UI label).
>
> In reactive programming, I can basically write code
>
> y = f(x)
>
> and the reactive engine will _analyze_ that code, and hook up push-style
> callback code under the hood, so that _whenever_ x changes, f() is
> _automatically_ reapplied.

This idea has been around the block a few times too. Some of my former
colleagues on the ABC project (Python's predecessor) went on to do
something like that, and eventually their research morphed into XForms
(http://en.wikipedia.org/wiki/XForms). More recently the Elm
programming language (http://elm-lang.org/) also uses this paradigm.
I'd say it's worth knowing about and nice when it's what you need,
but, like Haskell, limited by the non-pure nature of many practical
problems.

> In fact, Twisted endpoints are so cool (sorry for mentioning this here), I
> can also run WebSocket over any stream transport, like Unix domain sockets,
> pipes, serial:
>
> https://github.com/tavendo/AutobahnPython/tree/master/examples/twisted/websocket/echo_endpoints

I'm not entirely sure what Twisted endpoints are and why they're so
cool. (Glyph kept telling me they were cool but not quite ready or
something, so I never had a good look.) But most of what you describe
here should be possible equally well with asyncio's protocol/transport
abstractions (which are pretty close to those in Twisted, if you
aren't too attached to Deferred). So what's missing?(Probably nothing
that couldn't be added on top of asyncio as a 3rd party package.)

> The thing is: the abstracted WebSocket semantics of "reliable, ordered
> datagram" neither fit TCP nor UDP.

True, but that doesn't mean it doesn't fit the datagram
transport/protocol APIs -- there's nothing in their definition that
says the underlying transport can't be reliable and ordered. :-)

> It does fit SCTP:
>
> http://en.wikipedia.org/wiki/Stream_Control_Transmission_Protocol
>
> SCTP is available natively
>
> http://www.freebsd.org/cgi/man.cgi?query=sctp&sektion=4
>
> in which case you need to use new kernel API (Posix sockets don't have it)
> and it can be layered over UDP, which is used eg with the upcoming WebRTC
> HTML5 standard:
>
> http://tools.ietf.org/html/draft-ietf-rtcweb-data-channel-06

I'm not sure that the actual implementation is relevant here --
following Twisted's example, asyncio's transports and protocols don't
require a specific underlying internet protocol. (E.g. the common
stream protocol has TCP and TLS implementations, and shares a lot of
APIs with pipes (named or unnamed).

> So I think it really could be worth to define an abstract interface in
> asyncio with the appropiate semantics for transports to implement. And
> implementing transports then could be WebSocket and SCTP or even
> shared-memory message queues ..

A concrete question: what APIs are missing in asyncio's datagram
transport and protocol classes to support SCTP and WebSocket? (Apart
from the constructor -- but the constructor is explicitly not part of
the transport/protocol abstraction, just like the __init__ signature
is not part of most ABCs.)

> Happy new year,

Same to you and anyone else who read this far!

Tobias Oberstein

unread,
Jan 4, 2014, 4:33:40 PM1/4/14
to gu...@python.org, Aymeric Augustin, Antoine Pitrou, python-tulip
> You'd write this
>
> yield from asyncio.wait_for(socket.read_handshake(), <timeout_in_seconds>)
>

Thanks! That's handy ..

> I'm not entirely sure what Twisted endpoints are and why they're so

My understanding: they completely decouple the creation of application
specific factories/protocols from the creation of network
clients/servers using the former and created from "client/server string
descriptors".

Clients/servers can be created from string descriptors. This allows to
write programs that let users determine on command line whether to use
eg TCP, Unix domain socket or even Serial.

> cool. (Glyph kept telling me they were cool but not quite ready or
> something, so I never had a good look.) But most of what you describe
> here should be possible equally well with asyncio's protocol/transport
> abstractions (which are pretty close to those in Twisted, if you
> aren't too attached to Deferred). So what's missing?(Probably nothing
> that couldn't be added on top of asyncio as a 3rd party package.)

What's missing is probably something like this (which I guess could be
added to asyncio):

coro = loop.create_server(EchoServer, stream_server_descriptor)

where

stream_server_descriptor = "tcp:127.0.0.1:8000"

or

stream_server_descriptor = "unix:/tmp/myserver"

> A concrete question: what APIs are missing in asyncio's datagram
> transport and protocol classes to support SCTP and WebSocket? (Apart

DatagramProtocol.datagram_received and DatagramTransport.sendto take an
`addr` parameter.

However, SCTP and WebSocket are _connected_ protocols.

Hence, there isn't anything missing actually, but `addr` is in excess.

So for sendto, the addr would either needed to be ignored, or raised
when non-None.

And for datagram_received, the addr would either be the fixed addr of
the connected peer, or be set None.

As a corollary: how is addr handled with connected UDP?

FWIW: https://twistedmatrix.com/documents/current/core/howto/udp.html#auto3

Finally: yes, syntactically, there is nothing which would distinguish an
ordered, reliable datagram interface from and unordered, unreliable
datagram interface (at least a connected one).

Mmh, but shouldn't the interface _identity_ also express semantics?

If I get a transport, and want to adjust behavior according to whether
the transport is unreliable, even today I am not supposed to do that
via isinstance(transport, DatagramTransport), since DatagramTransport
doesn't imply any semantics, but only specifies a programmatic interface
(on a syntactical level)?

And a transport deriving from asyncio.transport.Transport (a stream)
might be in fact unreliable?

What are the exact semantics implied by Transport and DatagramTransport
- beyond the mere existence of certain functions syntactically?

Cheers,
/Tobias

Glyph

unread,
Jan 5, 2014, 2:32:10 AM1/5/14
to Guido van Rossum, Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Jan 3, 2014, at 2:13 PM, Guido van Rossum <gu...@python.org> wrote:

I'm not entirely sure what Twisted endpoints are

Endpoints are really very simple.  They're two interfaces for listening and connecting; .listen(factory) for servers and .connect(factory) for clients.  They each return a Deferred, which fires with an IListeningPort and a connected IProtocol instance, respectively.

As the documentation puts it, "there's not a lot to it": <https://twistedmatrix.com/documents/current/core/howto/endpoints.html#auto2>

and why they're so cool.

They're cool because they decouple the part of the program which figures out what to listen on or connect to from the part of the program that figures out when to listen or connect.

This allows you to transparently support running a protocol over an outbound TCP connection, outbound TLS connection, or something fancier, like an SSH command tunnel.  Similarly, since 'listen()' returns a Deferred, you can do asynchronous set-up when your program opens a new listening port (like, for example, executing firewall rules or manipulating a load balancer's configuration).

Also, there's the thing that Tobias was referring to, the ability to parse a string into an endpoint.  This involves a plug-in interface so that third party packages can add new endpoint types, so you can run things over new types of transports without writing any new code at all (as long as applications use serverFromString or clientFromString to parse their command-line arguments, and an increasing number of the servers built in to Twisted do).

(Glyph kept telling me they were cool but not quite ready or something, so I never had a good look.)

The thing that wasn't ready was <https://twistedmatrix.com/trac/ticket/4859>, which is an example of how to implement the 'happy eyeballs' RFC, in terms of the endpoint API.  It's done in the most recent Twisted release :-).

As I recall, when I said they weren't quite ready, I was trying to suggest that tulip not even have an API like create_connection, and just have connect() that only took IP addresses, not hostnames, and put the name-resolution bits exclusively into the endpoint, but we hadn't proven that concept yet with Twisted.  We still haven't quite, because the "low-level" API that HostnameEndpoint.connect() calls is still connectTCP(), which can in fact take hostnames (although it does something a lot less useful, TCP4 resolution only).

But most of what you describe here should be possible equally well with asyncio's protocol/transport
abstractions (which are pretty close to those in Twisted, if you
aren't too attached to Deferred). So what's missing?(Probably nothing
that couldn't be added on top of asyncio as a 3rd party package.)

Indeed, none of this is terribly complicated.  It would be useful to put them into Tulip proper though, even if no functionality beyond what create_connection offers is available, so that application code doesn't use low-level loop interfaces like create_connection and create_server, since those aren't extensible.  If applications habitually use this simple little bit of indirection, you can get the same sort of no-new-code flexibility for new transports that Twisted offers, at least at the library-code level.

Hope that was useful,

-glyph

Guido van Rossum

unread,
Jan 5, 2014, 2:46:21 AM1/5/14
to Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Sat, Jan 4, 2014 at 11:33 AM, Tobias Oberstein
<tobias.o...@gmail.com> wrote:
[endpoints]
> My understanding: they completely decouple the creation of application
> specific factories/protocols from the creation of network clients/servers
> using the former and created from "client/server string descriptors".
>
> Clients/servers can be created from string descriptors. This allows to write
> programs that let users determine on command line whether to use eg TCP,
> Unix domain socket or even Serial.
[...]
> What's missing is probably something like this (which I guess could be added
> to asyncio):
>
> coro = loop.create_server(EchoServer, stream_server_descriptor)
>
> where
>
> stream_server_descriptor = "tcp:127.0.0.1:8000"
>
> or
>
> stream_server_descriptor = "unix:/tmp/myserver"

OK, that makes some sense, but it can easily be added as a 3rd party
add-on. (And eventually integrated back.) I've added this idea to the
Tulip tracker: http://code.google.com/p/tulip/issues/detail?id=97

(Oh, I just saw Glyph replied to the same question. I may have to
update that tracker item. :-)

This is similar to the URL-ish syntax I've seen in database drivers
for specifying the database server. I suppose we could support the
Twisted syntax (if it's documented well enough) or make up our own,
possibly borrowing from URLs (which are another variant of the same
idea).

>> A concrete question: what APIs are missing in asyncio's datagram
>> transport and protocol classes to support SCTP and WebSocket? (Apart

> DatagramProtocol.datagram_received and DatagramTransport.sendto take an
> `addr` parameter.
>
> However, SCTP and WebSocket are _connected_ protocols.

On UNIX, UDP can also be connected, and Tulip supports this. :-)

> Hence, there isn't anything missing actually, but `addr` is in excess.
>
> So for sendto, the addr would either needed to be ignored, or raised when
> non-None.

If you specify a destination in when creating the datagram connection
that's exactly how it works.

> And for datagram_received, the addr would either be the fixed addr of the
> connected peer, or be set None.

Right.

> As a corollary: how is addr handled with connected UDP?

Um, yes. :-)

> FWIW: https://twistedmatrix.com/documents/current/core/howto/udp.html#auto3
>
> Finally: yes, syntactically, there is nothing which would distinguish an
> ordered, reliable datagram interface from and unordered, unreliable datagram
> interface (at least a connected one).
>
> Mmh, but shouldn't the interface _identity_ also express semantics?

That seems an idea from Twisted. I don't want to go that way. I prefer
giving the transport object an inquiry method if necessary; that's how
asyncio does it for the capability to use write_eof() (which works for
TCP but not for SSL/TLS).

> If I get a transport, and want to adjust behavior according to whether the
> transport is unreliable, even today I am not supposed to do that
> via isinstance(transport, DatagramTransport), since DatagramTransport
> doesn't imply any semantics, but only specifies a programmatic interface (on
> a syntactical level)?

You're already writing your protocol code specific to either a
datagram or a stream transport, because the protocol API is
(intentionally) different. So there's never a question about which
kind of transport you have.

> And a transport deriving from asyncio.transport.Transport (a stream) might
> be in fact unreliable?

I don't see any useful semantics for that.

> What are the exact semantics implied by Transport and DatagramTransport -
> beyond the mere existence of certain functions syntactically?

You probably have to ask a more specific question.

The datagram API promises that if the sender sends two packets, and
both arrive, the receiver will see two packets, though not necessarily
in the same order (and one or both may be lost).

The stream API promises that the receiver sees the bytes that the
sender sent, in the same order, until the connection is terminated or
broken, but it makes no promises about whether you'll get a separate
data_received() call for each byte or a single call for all bytes. In
particular there's no promise that the framing implied by the sender's
sequence of send() or write() calls is preserved -- the network may
repackage the bytes in arbitrary blocks.

By the way, there's nothing that would prevent you from defining your
own transport and protocol ABCs.

Glyph

unread,
Jan 5, 2014, 3:11:18 AM1/5/14
to Guido van Rossum, Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Jan 4, 2014, at 11:46 PM, Guido van Rossum <gu...@python.org> wrote:

> This is similar to the URL-ish syntax I've seen in database drivers
> for specifying the database server.

Yeah, it's the same principle.

> I suppose we could support the
> Twisted syntax (if it's documented well enough) or make up our own,
> possibly borrowing from URLs (which are another variant of the same
> idea).

Twisted's syntax is actually pretty bad; the mere fact of the syntax is a great idea but I'd suggest only looking at the particulars of the syntax itself to learn from its mistakes :-).

The syntax is 'endpoint-type:positional1:pos2:keyword1=kwval1:keyword2=kwval2'. The problem with this is of course colons are pretty popular thing that you need to be able to type in e.g. IPv6 address literals and win32 paths, so lots of endpoint types require escaping; 'tcp6:\:\:1' is how you say 'localhost' for example.

>>> A concrete question: what APIs are missing in asyncio's datagram
>>> transport and protocol classes to support SCTP and WebSocket? (Apart
>
>> DatagramProtocol.datagram_received and DatagramTransport.sendto take an
>> `addr` parameter.
>>
>> However, SCTP and WebSocket are _connected_ protocols.
>
> On UNIX, UDP can also be connected, and Tulip supports this. :-)

"connected UDP" is a strange beast, and it doesn't make UDP a connected protocol; there's still no connection state machine, no way to tell if a particular session is connected or disconnected.

>> FWIW: https://twistedmatrix.com/documents/current/core/howto/udp.html#auto3
>>
>> Finally: yes, syntactically, there is nothing which would distinguish an
>> ordered, reliable datagram interface from and unordered, unreliable datagram
>> interface (at least a connected one).
>>
>> Mmh, but shouldn't the interface _identity_ also express semantics?
>
> That seems an idea from Twisted. I don't want to go that way. I prefer
> giving the transport object an inquiry method if necessary; that's how
> asyncio does it for the capability to use write_eof() (which works for
> TCP but not for SSL/TLS).

The principle that we use type-names to identify discrete sets of behavior is an ideal that Twisted strives to adhere to, but it seems like a pretty basic idea about type systems, not an idea we came up with. ("We use types for nouns" in the words of Nathaniel Manista & Augie Fackler, both of whom are very much not associated with Twisted ;-).) It's just having a consistent way to talk about sets of methods and attributes and stuff. Isn't this the whole reason that ABCMeta.register exists, so you can use ABCs in this way? Then the inquiry method is always consistent, 'isinstance'.

>> If I get a transport, and want to adjust behavior according to whether the
>> transport is unreliable, even today I am not supposed to do that
>> via isinstance(transport, DatagramTransport), since DatagramTransport
>> doesn't imply any semantics, but only specifies a programmatic interface (on
>> a syntactical level)?
>
> You're already writing your protocol code specific to either a
> datagram or a stream transport, because the protocol API is
> (intentionally) different. So there's never a question about which
> kind of transport you have.

I think that the point that Tobias was trying to make here is that if you have some generic string-syntax way of generating a transport (like endpoints) and the user can enter any transport-type they want, how does an application say "I am only going to work over a reliable+ordered transport". In Twisted (and in Zope) there's a pattern of using a zope Interface or a python type object to identify the required semantics (as he pointed out above).

In the case of endpoints, we just don't have any interfaces for datagram endpoints yet, since those are pretty esoteric and the only other use-case besides UDP we can come up with is DTLS (which, due to session-management issues, is also esoteric enough to require other setup).

-glyph

Guido van Rossum

unread,
Jan 5, 2014, 3:20:37 AM1/5/14
to Glyph, Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Sat, Jan 4, 2014 at 9:32 PM, Glyph <gl...@twistedmatrix.com> wrote:
> Endpoints are really very simple. They're two interfaces for listening and
> connecting; .listen(factory) for servers and .connect(factory) for clients.
> They each return a Deferred, which fires with an IListeningPort and a
> connected IProtocol instance, respectively.
>
> As the documentation puts it, "there's not a lot to it":
> <https://twistedmatrix.com/documents/current/core/howto/endpoints.html#auto2>

After reading Tobias' description and some example code referenced in
those docs I agree, but these docs and your first paragraph above
couch everything in too-abstract terms. (Like those consulting
companies whose website claims "we help you be more effective." :-)

> They're cool because they decouple the part of the program which figures out
> what to listen on or connect to from the part of the program that figures
> out when to listen or connect.

Basically, it's a mechanism that calls the right variant of
create_connection() or create_server() for you.

> This allows you to transparently support running a protocol over an outbound
> TCP connection, outbound TLS connection, or something fancier, like an SSH
> command tunnel.

Of course, you can already do that, just by having an if-statement
that chooses between different available ways to create connections.
What you really mean is that the mechanism is extensible so you don't
have to keep adding clauses to that if-statement in every program.

> Similarly, since 'listen()' returns a Deferred, you can do
> asynchronous set-up when your program opens a new listening port (like, for
> example, executing firewall rules or manipulating a load balancer's
> configuration).

This I don't quite understand. I suspect it addresses a specific
deficiency in the old way of setting up servers in Twisted? When would
that specific Deferred fire in relationship to the actual sequence of
socket()/bind()/listen()/accept() syscalls?

> Also, there's the thing that Tobias was referring to, the ability to parse a
> string into an endpoint. This involves a plug-in interface so that third
> party packages can add new endpoint types, so you can run things over new
> types of transports without writing any new code at all (as long as
> applications use serverFromString or clientFromString to parse their
> command-line arguments, and an increasing number of the servers built in to
> Twisted do).

It feels like the extensibility through a plug-in is the key feature
here, and the rest is just shaping the API to enable the
extensibility. Right?

> The thing that wasn't ready was
> <https://twistedmatrix.com/trac/ticket/4859>, which is an example of how to
> implement the 'happy eyeballs' RFC, in terms of the endpoint API. It's done
> in the most recent Twisted release :-).

Ah, I now know what this is. We don't quite have it in Tulip (we just
try the choices returned by getaddrinfo() in turn) but we should, and
I think we can do it transparently in create_connection(). I'm
tracking it in http://code.google.com/p/tulip/issues/detail?id=86,
feel free to add to it.

> As I recall, when I said they weren't quite ready, I was trying to suggest
> that tulip not even have an API like create_connection, and just have
> connect() that only took IP addresses, not hostnames, and put the
> name-resolution bits exclusively into the endpoint, but we hadn't proven
> that concept yet with Twisted. We still haven't quite, because the
> "low-level" API that HostnameEndpoint.connect() calls is still connectTCP(),
> which can in fact take hostnames (although it does something a lot less
> useful, TCP4 resolution only).

Oh well. Tulip is pretty fixed now that it's in the Python 3.4 beta
release, but the PEP is provisional so we can change things for Python
3.5, and of course we can certainly add new stuff then. Small tweaks
are still allowable before the first 3.4 release candidate (but this
seems to go beyond small).

> Indeed, none of this is terribly complicated. It would be useful to put
> them into Tulip proper though, even if no functionality beyond what
> create_connection offers is available, so that application code doesn't use
> low-level loop interfaces like create_connection and create_server, since
> those aren't extensible. If applications habitually use this simple little
> bit of indirection, you can get the same sort of no-new-code flexibility for
> new transports that Twisted offers, at least at the library-code level.

Unfortunately I think it's too late to add this for 3.4. But there
also are no other options yet besides create_connection() and
create_server(), so it's all mostly important for very forward-looking
code. Tulip doesn't even have UNIX domain sockets
(http://code.google.com/p/tulip/issues/detail?id=81). Most current
Tulip users seem to be creating frameworks themselves, so they can add
this at their end. :-)

Guido van Rossum

unread,
Jan 5, 2014, 3:44:23 AM1/5/14
to Glyph, Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Sat, Jan 4, 2014 at 10:11 PM, Glyph <gl...@twistedmatrix.com> wrote:
> Twisted's syntax is actually pretty bad [...]

Noted.

> "connected UDP" is a strange beast, and it doesn't make UDP a connected protocol; there's still no connection state machine, no way to tell if a particular session is connected or disconnected.

Technically, in the Tulip implementation it corresponds to whether the
private variable transport._address is None or not. Of course I may
not understand what you mean by "session" in this context. :-)

>>> FWIW: https://twistedmatrix.com/documents/current/core/howto/udp.html#auto3
>>>
>>> Finally: yes, syntactically, there is nothing which would distinguish an
>>> ordered, reliable datagram interface from and unordered, unreliable datagram
>>> interface (at least a connected one).
>>>
>>> Mmh, but shouldn't the interface _identity_ also express semantics?
>>
>> That seems an idea from Twisted. I don't want to go that way. I prefer
>> giving the transport object an inquiry method if necessary; that's how
>> asyncio does it for the capability to use write_eof() (which works for
>> TCP but not for SSL/TLS).
>
> The principle that we use type-names to identify discrete sets of behavior is an ideal that Twisted strives to adhere to, but it seems like a pretty basic idea about type systems, not an idea we came up with. ("We use types for nouns" in the words of Nathaniel Manista & Augie Fackler, both of whom are very much not associated with Twisted ;-).) It's just having a consistent way to talk about sets of methods and attributes and stuff. Isn't this the whole reason that ABCMeta.register exists, so you can use ABCs in this way? Then the inquiry method is always consistent, 'isinstance'.

Hm. I really don't like using isinstance() for this particular set of
purposes (checking if a datagram transport is reliable or not, or
checking whether it is connected or not). I fear that there would be a
proliferation of little interface classes that would be more confusing
than enlightening -- in particular, whenever you mention something
starting with I-something when talking about Twisted I usually feel
more confused than enlightened.

>>> If I get a transport, and want to adjust behavior according to whether the
>>> transport is unreliable, even today I am not supposed to do that
>>> via isinstance(transport, DatagramTransport), since DatagramTransport
>>> doesn't imply any semantics, but only specifies a programmatic interface (on
>>> a syntactical level)?
>>
>> You're already writing your protocol code specific to either a
>> datagram or a stream transport, because the protocol API is
>> (intentionally) different. So there's never a question about which
>> kind of transport you have.
>
> I think that the point that Tobias was trying to make here is that if you have some generic string-syntax way of generating a transport (like endpoints) and the user can enter any transport-type they want, how does an application say "I am only going to work over a reliable+ordered transport". In Twisted (and in Zope) there's a pattern of using a zope Interface or a python type object to identify the required semantics (as he pointed out above).

Which I specifically don't like (as I pointed out above :-).

I read between the lines of the Twisted endpoints docs that they don't
support anything besides streams. I suspect that the use case for the
inquiry discussed here is purely theoretical.

> In the case of endpoints, we just don't have any interfaces for datagram endpoints yet, since those are pretty esoteric and the only other use-case besides UDP we can come up with is DTLS (which, due to session-management issues, is also esoteric enough to require other setup).

Right.

Glyph

unread,
Jan 5, 2014, 3:50:36 AM1/5/14
to Guido van Rossum, Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Jan 5, 2014, at 12:20 AM, Guido van Rossum <gu...@python.org> wrote:

On Sat, Jan 4, 2014 at 9:32 PM, Glyph <gl...@twistedmatrix.com> wrote:
Endpoints are really very simple.  They're two interfaces for listening and
connecting; .listen(factory) for servers and .connect(factory) for clients.
They each return a Deferred, which fires with an IListeningPort and a
connected IProtocol instance, respectively.

As the documentation puts it, "there's not a lot to it":
<https://twistedmatrix.com/documents/current/core/howto/endpoints.html#auto2>

After reading Tobias' description and some example code referenced in
those docs I agree, but these docs and your first paragraph above
couch everything in too-abstract terms. (Like those consulting
companies whose website claims "we help you be more effective." :-)

If you have suggestions for those docs, I'd be happy to hear them.  Believe it or not that documentation was what we came up with after several drafts that were more uselessly abstract; it's a bit hard to have a comprehensible, concrete introduction when the interface is basically "a method that, when you call it, does something arbitrary".

Besides, endpoints *will* make you more effective!  They promote synergy, and dynamic creativity, and serendipity and so on.

They're cool because they decouple the part of the program which figures out
what to listen on or connect to from the part of the program that figures
out when to listen or connect.

Basically, it's a mechanism that calls the right variant of
create_connection() or create_server() for you.

This allows you to transparently support running a protocol over an outbound
TCP connection, outbound TLS connection, or something fancier, like an SSH
command tunnel.

Of course, you can already do that, just by having an if-statement
that chooses between different available ways to create connections.
What you really mean is that the mechanism is extensible so you don't
have to keep adding clauses to that if-statement in every program.

Precisely.

Similarly, since 'listen()' returns a Deferred, you can do
asynchronous set-up when your program opens a new listening port (like, for
example, executing firewall rules or manipulating a load balancer's
configuration).

This I don't quite understand. I suspect it addresses a specific
deficiency in the old way of setting up servers in Twisted?

It's nothing particularly specific to Twisted, it just enables a very flexible way of setting up listening servers automatically.

When would
that specific Deferred fire in relationship to the actual sequence of
socket()/bind()/listen()/accept() syscalls?

The basic TCP one, which does what create_server does, just fires synchronously (i.e. after "listen" returns).

The one I'm talking about with the firewall/load-balancer stuff is more complex than that and would make a whole lot more syscalls than socket()/bind()/listen()/accept() ;-).  But the idea is also pretty vague, so let me be specific:

Let's say you have a service that is behind a TCP load balancer that has a JSON API for adding new back-end servers.  You could have a LoadBalancedTCPEndpoint(loadBalancerURI, localPort) whose listen() method first listens locally, then makes the relevant API call to the load balancer to add the local host to the load balancer's pool, and only when that succeeds actually fire the returned Deferred.

If that endpoint also had a plugin, then you could do 'twistd -n web --port loadbalanced:http\://loadbalancer/control-plane/add-port:8080' to add an existing server to the pool rather than write any new code.  But of course you wouldn't want the server to register as "started" if the LB is down and won't recognize your new server, hence the Deferred.

Does that explanation make sense?

Also, there's the thing that Tobias was referring to, the ability to parse a
string into an endpoint.  This involves a plug-in interface so that third
party packages can add new endpoint types, so you can run things over new
types of transports without writing any new code at all (as long as
applications use serverFromString or clientFromString to parse their
command-line arguments, and an increasing number of the servers built in to
Twisted do).

It feels like the extensibility through a plug-in is the key feature
here, and the rest is just shaping the API to enable the
extensibility. Right?

Yeah, although the API shape is basically the same kind of customizability as a library rather than as a plug-in, which can be just as important.

The thing that wasn't ready was
<https://twistedmatrix.com/trac/ticket/4859>, which is an example of how to
implement the 'happy eyeballs' RFC, in terms of the endpoint API.  It's done
in the most recent Twisted release :-).

Ah, I now know what this is. We don't quite have it in Tulip (we just
try the choices returned by getaddrinfo() in turn) but we should, and
I think we can do it transparently in create_connection(). I'm
tracking it in http://code.google.com/p/tulip/issues/detail?id=86,
feel free to add to it.

I don't think I have lots to add; create_connection's signature is obvious enough that I think it's clear what to do :).

Twisted took like 9 years to add this feature so I'm pretty sure that asyncio will be just fine without it for 18 months ;-).

-glyph

Glyph

unread,
Jan 5, 2014, 4:29:09 AM1/5/14
to Guido van Rossum, Tobias Oberstein, Aymeric Augustin, Antoine Pitrou, python-tulip
On Jan 5, 2014, at 12:44 AM, Guido van Rossum <gu...@python.org> wrote:

> On Sat, Jan 4, 2014 at 10:11 PM, Glyph <gl...@twistedmatrix.com> wrote:
>> Twisted's syntax is actually pretty bad [...]
>
> Noted.
>
>> "connected UDP" is a strange beast, and it doesn't make UDP a connected protocol; there's still no connection state machine, no way to tell if a particular session is connected or disconnected.
>
> Technically, in the Tulip implementation it corresponds to whether the
> private variable transport._address is None or not. Of course I may
> not understand what you mean by "session" in this context. :-)

I mean that for a given "connected" UDP object, you don't have individual "connections" (what I meant by the word "session"). You have one always-on pipe that is talking to a given address, whether that address exists or not, as opposed to TCP or SCTP or websockets, where there are discrete connections with beginnings and ends.

>>>> FWIW: https://twistedmatrix.com/documents/current/core/howto/udp.html#auto3
>>>>
>>>> Finally: yes, syntactically, there is nothing which would distinguish an
>>>> ordered, reliable datagram interface from and unordered, unreliable datagram
>>>> interface (at least a connected one).
>>>>
>>>> Mmh, but shouldn't the interface _identity_ also express semantics?
>>>
>>> That seems an idea from Twisted. I don't want to go that way. I prefer
>>> giving the transport object an inquiry method if necessary; that's how
>>> asyncio does it for the capability to use write_eof() (which works for
>>> TCP but not for SSL/TLS).
>>
>> The principle that we use type-names to identify discrete sets of behavior is an ideal that Twisted strives to adhere to, but it seems like a pretty basic idea about type systems, not an idea we came up with. ("We use types for nouns" in the words of Nathaniel Manista & Augie Fackler, both of whom are very much not associated with Twisted ;-).) It's just having a consistent way to talk about sets of methods and attributes and stuff. Isn't this the whole reason that ABCMeta.register exists, so you can use ABCs in this way? Then the inquiry method is always consistent, 'isinstance'.
>
> Hm. I really don't like using isinstance() for this particular set of
> purposes (checking if a datagram transport is reliable or not, or
> checking whether it is connected or not). I fear that there would be a
> proliferation of little interface classes that would be more confusing
> than enlightening -- in particular, whenever you mention something
> starting with I-something when talking about Twisted I usually feel
> more confused than enlightened.

The main issue with I-somethings in Twisted is that there's a lot of them, and each one has some nuances so the underlying concepts are often confusing in themselves. Plus, the way the documentation is presented, it's often far from obvious how to get from an abstract description of some behavior to a concrete implementation of that behavior.

But these tiny interface types is a solution to a real problem. Tulip is still at a pretty minimal set of interfaces now, and perhaps it can stay there, in which case this issue is somewhat moot. But if it can't, a proliferation of tiny interface types is _way_ better than a proliferation of ad-hoc prose passages describing the relationship between required attributes and methods and their differing semantics.

For what it's worth, it's somewhat unusual in Twisted to use the equivalent of isinstance to do any runtime checking; they're mostly useful as a documentation convention; duck-typing and all. However, it is very helpful for debugging to be able to add type-checking logic for cases where an instance of a particular type is set in a constructor or setter method and then used again later, so that you can see the type error when it's initially set and the caller is still on the stack; and it's helpful to have a common language between that type of type-checking and the documentation explaining what things are.

>>>> If I get a transport, and want to adjust behavior according to whether the
>>>> transport is unreliable, even today I am not supposed to do that
>>>> via isinstance(transport, DatagramTransport), since DatagramTransport
>>>> doesn't imply any semantics, but only specifies a programmatic interface (on
>>>> a syntactical level)?
>>>
>>> You're already writing your protocol code specific to either a
>>> datagram or a stream transport, because the protocol API is
>>> (intentionally) different. So there's never a question about which
>>> kind of transport you have.
>>
>> I think that the point that Tobias was trying to make here is that if you have some generic string-syntax way of generating a transport (like endpoints) and the user can enter any transport-type they want, how does an application say "I am only going to work over a reliable+ordered transport". In Twisted (and in Zope) there's a pattern of using a zope Interface or a python type object to identify the required semantics (as he pointed out above).
>
> Which I specifically don't like (as I pointed out above :-).
>
> I read between the lines of the Twisted endpoints docs that they don't
> support anything besides streams. I suspect that the use case for the
> inquiry discussed here is purely theoretical.

I believe Tobias actually cares about the actual additional framing and typing features of WebSockets, which _do_ have different semantics from TCP. Personally I feel like these features are just dumb over-engineering, and should be ignored as much as possible. So I'm given to agree with you about their theoretical nature but I don't think he would :-).

(I also think SCTP is pretty much doomed unless someone figures out how to run it on top of UDP or TCP...)

-glyph

Tobias Oberstein

unread,
Jan 5, 2014, 4:55:59 AM1/5/14
to gu...@python.org, Aymeric Augustin, Antoine Pitrou, python-tulip
>> And a transport deriving from asyncio.transport.Transport (a stream) might
>> be in fact unreliable?
>
> I don't see any useful semantics for that.

It was more an example for why mere API (what functions with what
parameters) and implied semantics are somewhat orthogonal.

The same API (asyncio.transport.DatagramTransport) can be implemented
with different semantics (reliable vs. unreliable).

>
>> What are the exact semantics implied by Transport and DatagramTransport -
>> beyond the mere existence of certain functions syntactically?
>
> You probably have to ask a more specific question.
>
> The datagram API promises that if the sender sends two packets, and
> both arrive, the receiver will see two packets, though not necessarily
> in the same order (and one or both may be lost).

So my question boils down to: how can a class deriving from
DatagramTransport _programatically signal_ that it implements a more
strict set of semantics than what you say above: ordering + reliability?

"programatically signal":

One option would be by "interface identity" - but that you don't like. I
suspect you will have your reasons for that - I won't open that can;)

Another would be via (mandatory) class attributes:

DatagramTransport.is_reliable
DatagramTransport.is_ordered

An UDP implementation would have both False. WebSocket both True. I
don't know of protocols that would have mixed values.

A third option would be ReliableDatagramTransport, that
- either derives of DatagramTransport with no API change at all, merely
to signal it's more strict semantics or
- is a new class where we get rid of `addr` parameters (which are
unneeded for connected transports)

> The stream API promises that the receiver sees the bytes that the
> sender sent, in the same order, until the connection is terminated or
> broken, but it makes no promises about whether you'll get a separate
> data_received() call for each byte or a single call for all bytes. In
> particular there's no promise that the framing implied by the sender's
> sequence of send() or write() calls is preserved -- the network may
> repackage the bytes in arbitrary blocks.
>
> By the way, there's nothing that would prevent you from defining your
> own transport and protocol ABCs.
>

Sure. It's all about trying to play nice and fit into the "bigger
picture". So things work together ..

Tobias Oberstein

unread,
Jan 5, 2014, 5:12:39 AM1/5/14
to Glyph, Guido van Rossum, Aymeric Augustin, Antoine Pitrou, python-tulip
>>>>> If I get a transport, and want to adjust behavior according to whether the
>>>>> transport is unreliable, even today I am not supposed to do that
>>>>> via isinstance(transport, DatagramTransport), since DatagramTransport
>>>>> doesn't imply any semantics, but only specifies a programmatic interface (on
>>>>> a syntactical level)?
>>>>
>>>> You're already writing your protocol code specific to either a
>>>> datagram or a stream transport, because the protocol API is
>>>> (intentionally) different. So there's never a question about which
>>>> kind of transport you have.
>>>
>>> I think that the point that Tobias was trying to make here is that if you have some generic string-syntax way of generating a transport (like endpoints) and the user can enter any transport-type they want, how does an application say "I am only going to work over a reliable+ordered transport". In Twisted (and in Zope) there's a pattern of using a zope Interface or a python type object to identify the required semantics (as he pointed out above).

Yes, that's the issue I was trying to raise.

How can 2 implementations exposing the same API ("datagram"), but with
different semantics ("unreliable" vs "reliable-ordered") programatically
signal that difference in semantics.

>>
>> Which I specifically don't like (as I pointed out above :-).
>>
>> I read between the lines of the Twisted endpoints docs that they don't
>> support anything besides streams. I suspect that the use case for the
>> inquiry discussed here is purely theoretical.

Here is my use case: I am working on WAMP, which is a protocol that
provides Publish & Subscribe and RPC over any transport that is:

- reliable, ordered, full-duplex, message-based

WebSocket is one transport. Shared-memory queues is another. SCTP might
be a third.

I want to decouple an implementation of WAMP of the underlying transport
using the "right abstraction".

> I believe Tobias actually cares about the actual additional framing and typing features of WebSockets, which _do_ have different semantics from TCP. Personally I feel like these features are just dumb over-engineering, and should be ignored as much as possible. So I'm given to agree with you about their theoretical nature but I don't think he would :-).

Actually, I don't care much about those;) The thing is: WebSocket is a
_pragmatic_ effort that takes into account existing Web infrastructure,
JavaScript, and the security issues of running untrusted code in browsers.

If you drop all pragmatism, SCTP (raw ... that is as a new
transport-level protocol besides TCP/UDP directly over IP) or plain TCP
would have cut it right away ...

>
> (I also think SCTP is pretty much doomed unless someone figures out how to run it on top of UDP or TCP...)

That's already shipping in browsers as of today (Firefox and Chrome) -
WebRTC data channels run over this stack:

+----------+
| SCTP |
+----------+
| DTLS |
+----------+
| ICE/UDP |
+----------+

http://tools.ietf.org/html/draft-ietf-rtcweb-data-channel-06

Above is as far as I know the only stack that allows browsers to
communicate peer-to-peer. And it's not only for media, but arbitrary
data. It might get significant .. or not. We'll see;)

/Tobias


>
> -glyph
>

Antoine Pitrou

unread,
Jan 5, 2014, 8:00:44 AM1/5/14
to python...@googlegroups.com
On Sun, 5 Jan 2014 00:11:18 -0800
Glyph <gl...@twistedmatrix.com> wrote:
> On Jan 4, 2014, at 11:46 PM, Guido van Rossum <gu...@python.org> wrote:
>
> > This is similar to the URL-ish syntax I've seen in database drivers
> > for specifying the database server.
>
> Yeah, it's the same principle.
>
> > I suppose we could support the
> > Twisted syntax (if it's documented well enough) or make up our own,
> > possibly borrowing from URLs (which are another variant of the same
> > idea).
>
> Twisted's syntax is actually pretty bad; the mere fact of the syntax is a great idea but I'd suggest only looking at the particulars of the syntax itself to learn from its mistakes :-).
>
> The syntax is 'endpoint-type:positional1:pos2:keyword1=kwval1:keyword2=kwval2'. The problem with this is of course colons are pretty popular thing that you need to be able to type in e.g. IPv6 address literals and win32 paths, so lots of endpoint types require escaping; 'tcp6:\:\:1' is how you say 'localhost' for example.

Why not actual URIs? Your latter example could then be spelled
"tcp6:[::1]". Also, URIs allow both parameters and query arguments at
the end, and escaping is handled by %-escaping.

Regards

Antoine.


Reply all
Reply to author
Forward
0 new messages