I am the author of Gate One (https://github.com/liftoff/GateOne/)
which makes extensive use of Tornado's asynchronous capabilities. It
also uses multiprocessing and threading to a lesser extent. The
biggest issue I've had trying to write asynchronous code for Gate One
is complexity. Complexity creates problems with expressiveness which
results in code that, to me, feels un-Pythonic. For evidence of this
I present the following example: The retrieve_log_playback()
function: http://bit.ly/W532m6 (link goes to Github)
All the function does is generate and return (to the client browser)
an HTML playback of their terminal session recording. To do it
efficiently without blocking the event loop or slowing down all other
connected clients required loads of complexity (or maybe I'm just
ignorant of "a better way"--feel free to enlighten me). In an ideal
world I could have just done something like this:
import async # The API of the future ;)
async.async_call(retrieve_log_playback, settings, tws,
mechanism=multiprocessing)
# tws == instance of tornado.web.WebSocketHandler that holds the open connection
...but instead I had to create an entirely separate function to act as
the multiprocessing.Process(), create a multiprocessing.Queue() to
shuffle data back and forth, watch a special file descriptor for
updates (so I can tell when the task is complete), and also create a
closure because the connection instance (aka 'tws') isn't pickleable.
After reading through these threads I feel much of the discussion is
over my head but as someone who will ultimately become a *user* of the
"async API of the future" I would like to share my thoughts...
My opinion is that the goal of any async module that winds up in
Python's standard library should be simplicity and portability. In
terms of features, here's my 'async wishlist':
* I should not have to worry about what is and isn't pickleable when I
decide that a task should be performed asynchronously.
* I should be able to choose the type of event loop/async mechanism
that is appropriate for the task: For CPU-bound tasks I'll probably
want to use multiprocessing. For IO-bound tasks I might want to use
threading. For a multitude of tasks that "just need to be async" (by
nature) I'll want to use an event loop.
* Any async module should support 'basics' like calling functions at
an interval and calling functions after a timeout occurs (with the
ability to cancel).
* Asynchronous tasks should be able to access the same namespace as
everything else. Maybe wishful thinking.
* It should support publish/subscribe-style events (i.e. an event
dispatcher). For example, the ability to watch a file descriptor or
socket for changes in state and call a function when that happens.
Preferably with the flexibility to define custom events (i.e don't
have it tied to kqueue/epoll-specific events).
Thanks for your consideration; and thanks for the awesome language.
--
Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ✈ Your flight to the cloud is now boarding.
904-446-8323
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas
(This is a response to GVR's Google+ post asking for ideas; I
apologize in advance if I come off as an ignorant programming newbie)
import async # The API of the future ;)
async.async_call(retrieve_log_playback, settings, tws,
mechanism=multiprocessing)
# tws == instance of tornado.web.WebSocketHandler that holds the open connection
My opinion is that the goal of any async module that winds up in
Python's standard library should be simplicity and portability. In
terms of features, here's my 'async wishlist':
* I should not have to worry about what is and isn't pickleable when I
decide that a task should be performed asynchronously.
* I should be able to choose the type of event loop/async mechanism
that is appropriate for the task: For CPU-bound tasks I'll probably
want to use multiprocessing. For IO-bound tasks I might want to use
threading. For a multitude of tasks that "just need to be async" (by
nature) I'll want to use an event loop.
* Any async module should support 'basics' like calling functions at
an interval and calling functions after a timeout occurs (with the
ability to cancel).
* Asynchronous tasks should be able to access the same namespace as
everything else. Maybe wishful thinking.
* It should support publish/subscribe-style events (i.e. an event
dispatcher). For example, the ability to watch a file descriptor or
socket for changes in state and call a function when that happens.
Preferably with the flexibility to define custom events (i.e don't
have it tied to kqueue/epoll-specific events).
Thanks for your consideration; and thanks for the awesome language.
--
Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ✈ Your flight to the cloud is now boarding.
904-446-8323
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas
It depends on the host. On embedded platforms (e.g. the BeagleBone)
it is more IO-bound than CPU bound (fast CPU but slow disk and slow
memory). On regular x86 systems it is mostly CPU-bound.
>> * I should be able to choose the type of event loop/async mechanism
>> that is appropriate for the task: For CPU-bound tasks I'll probably
>> want to use multiprocessing. For IO-bound tasks I might want to use
>> threading. For a multitude of tasks that "just need to be async" (by
>> nature) I'll want to use an event loop.
>
>
> Ehhh, maybe. This sounds like it confounds the tools for different use
> cases. You can quite easily have threads and processes on top of an event
> loop; that works out particularly nicely for processes because you still
> have to talk to your processes.
>
> Examples:
>
> twisted.internet.reactor.spawnProcess (local processes)
> twisted.internet.threads.deferToThread (local threads)
> ampoule (remote processes)
>
> It's quite easy to do blocking IO in a thread with deferToThread; in fact,
> that's how twisted's adbapi, an async wrapper to dbapi, works.
As I understand it, twisted.internet.reactor.spawnProcess is all about
spawning subprocesses akin to subprocess.Popen(). Also, it requires
writing a sophisticated ProcessProtocol. It seems to be completely
unrelated and wickedly complicated. The complete opposite of what I
would consider ideal for an asynchronous library since it is anything
but simple.
I mean, I could write a separate program to generate HTML playback
files from logs, spawn a subprocess in an asynchronous fashion, then
watch it for completion but I could do that with termio.Multiplex
(see: https://github.com/liftoff/GateOne/blob/master/gateone/termio.py)
.
deferToThread() does what one would expect but in many situations I'd
prefer something like deferToMultiprocessing().
>> * It should support publish/subscribe-style events (i.e. an event
>> dispatcher). For example, the ability to watch a file descriptor or
>> socket for changes in state and call a function when that happens.
>> Preferably with the flexibility to define custom events (i.e don't
>> have it tied to kqueue/epoll-specific events).
>
>
> Like connectionMade, connectionLost, dataReceived etc?
Oh there's a hundred different ways to fire and catch events. I'll
let the low-level async experts decide which is best. Having said
that, it would be nice if the interface didn't use such
network-specific naming conventions. I would prefer something more
generic. It is fine if it uses sockets and whatnot in the background.
--
Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ✈ Your flight to the cloud is now boarding.
deferToThread() does what one would expect but in many situations I'd
prefer something like deferToMultiprocessing().
I would prefer something more generic.
--
Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ✈ Your flight to the cloud is now boarding.
It depends on the host. On embedded platforms (e.g. the BeagleBone)
it is more IO-bound than CPU bound (fast CPU but slow disk and slow
memory). On regular x86 systems it is mostly CPU-bound.
>> * I should be able to choose the type of event loop/async mechanism
>> that is appropriate for the task: For CPU-bound tasks I'll probably
>> want to use multiprocessing. For IO-bound tasks I might want to use
>> threading. For a multitude of tasks that "just need to be async" (by
>> nature) I'll want to use an event loop.
>
>
> Ehhh, maybe. This sounds like it confounds the tools for different use
> cases. You can quite easily have threads and processes on top of an event
> loop; that works out particularly nicely for processes because you still
> have to talk to your processes.
>
> Examples:
>
> twisted.internet.reactor.spawnProcess (local processes)
> twisted.internet.threads.deferToThread (local threads)
> ampoule (remote processes)
>
> It's quite easy to do blocking IO in a thread with deferToThread; in fact,
> that's how twisted's adbapi, an async wrapper to dbapi, works.
As I understand it, twisted.internet.reactor.spawnProcess is all about
spawning subprocesses akin to subprocess.Popen(). Also, it requires
writing a sophisticated ProcessProtocol. It seems to be completely
unrelated and wickedly complicated. The complete opposite of what I
would consider ideal for an asynchronous library since it is anything
but simple.
I mean, I could write a separate program to generate HTML playback
files from logs, spawn a subprocess in an asynchronous fashion, then
watch it for completion but I could do that with termio.Multiplex
(see: https://github.com/liftoff/GateOne/blob/master/gateone/termio.py)
.
>