Hello,
On Fri, 25 Apr 2014 21:05:48 +0300
Paul Sokolovsky <
pmi...@gmail.com> wrote:
[]
> > I suspect your expectactions are tainted by the previous knowledge
> > of the threading API, which has a separate Thread.start() method. I
>
> My expectations are "tainted" by: 1) basic programming rule of thumb
> that you first initialize things properly, and then execute them; 2)
> intuitive feeling, and even explicit knowledge, of Python's "explicit
> is better than implicit" principle; 3) acquaintance (cursory, I have
> to admit) with many-year history of using generators/coroutines for
> async cooperative multitasking, and desire to use that using
> standardized API asyncio promotes.
>
> > think it makes _some_ sense that Thread objects do not start the
> > actual thread automatically, since threads are preemptive and prone
> > to race conditions, and you may want to store the Thread object in
> > some data structure _before_ the thread actually begins executing.
> > With asyncio.Task, even if the task is scheduled to be executed, it
> > is guaranteed not to be executed until you reach "yield from"
> > statement, so you have plenty of opportunity to any setup prior to
> > the task executing.
>
> Let's sum up what you're saying here: asyncio Task implementation, by
> relying on internal asyncio implementation details (so, naive users
> who will get fixation on such behavior will fail miserably in other
> contexts), violates "Explicit is better than implicit" principle *just
> because it can* ?
Ok, I did some (re)reading on the topic, and had some time to think
about it, based on the arguments provided, and here some additional
thoughts and arguments:
Point #1
First of all I probably should have mentioned that my expectations for
coroutine scheduler are set forth by wonderful series on generators and
coroutines by David Beazley. This specific slide give the essence of
it:
http://www.slideshare.net/dabeaz/a-curious-course-on-coroutines-and-concurrency-5286140/137
. So, it's possible to write *coroutine* scheduler in such a way that
coroutines do not (and cannot if needed) access the main loop directly.
They communicate with using yield/yield from, which serve the same
purpose as syscall in an OS design. So, knowing that Python offers such
level separation, it added to cognitive dissonance to see that asyncio
not only does not separate object access, it tightly couple even
behavior of Task to a loop.
Point #2
The latest of David' series was presented just at the recent PyCon
2014:
http://www.dabeaz.com/finalgenerator/ . And from slide 43 he
presents step-by-step walkthru on building a concurrent execution
framework, which (un)surprisingly shapes up as having almost the same
API and architecture asyncio. So, it should be fair to say that those
slides are good tutorial on asyncio design for dummies. So, his
framework is very similar to asyncio: it's starts with
callbacks, then switches to coroutines as more adequate representation,
they got wrapped in Task's for bookkeeping, results are represented by
Future's, then it's shown that Task and Future share many traits, so it
makes sense to make to make one subclass of another, etc.
They are very similar except for one implementation detail: David's
framework doesn't use cooperative multitasking for execution, but
rather a thread pool. You can easily imagine what that means: a started
Task really does start immediately, so if it suddenly starts behind
user's back, there's no time to add callbacks to it later. That's why
David's framework doesn't start Tasks behind user's back, which is
natural solution (like, you don't need to know that it doesn't start
them - it's just default choice). During initial stages of design,
Tasks are kickstarted using a .step() method, later explicit scheduling
function introduced: start_inline_future(), run_inline_future().
So, let's step back at overview the situation.
https://docs.python.org/3.4/library/asyncio-task.html#future explicitly
says that asyncio.Future is "almost" compatible with
concurrent.futures.Future. Why "almost"? Apparently because
concurrent.futures.Future has some features depending on concurrent
execution model and specifically underlying thread/process
implementations, which don't map well to cooperative/event loop
execution model. PEP-3156 explicitly mentions that it would be nice to
unify both Futures in the future.
Certainly, asyncio would learn from such experience and try to provide
API model not relying on particular underlying details which would
hamper compatibility and reuse, yes? No, because what we talk about is
that asyncio (ab)uses the fact that underlying event loop doesn't start
execution immediately, so forcefully schedules a Task a makes user add
important changes to it after it is in active state, which is
backwards from general point of view.
Point #3
Yet another perspective. Ok, after all there's nothing wrong with being
able to schedule a coroutine using a global function - after all,
Point #1 above praises complete separation between coroutines and loop
using a yield. As yield cannot be used outside a function, it's not
so bad idea to provide global function to schedule a coroutine. One
problem here is that "Task" or "async" are not too suggestive names for
a function which performs scheduling. Actually, I have hypothesis why
it's not too plausible to imagine such purpose for them at all. It's
grounded in dichotomization of asyncio API:
1. Some operations are expressed as methods of event loop object, e.g.
loop.run_forever()
loop.call_soon()
2. While other are expressed as global functions taking optional loop
parameter:
asyncio.wait(..., loop=None, ...)
asyncio.sleep(..., loop=None)
This API asymmetry is not particularly obvious from first look. The docs
start with description of loop methods, which kind of sets expectations
that all important functions should be available as such, and the rest
are just objects/factory functions, and not normal functions with side
effects, to which category both
asyncio.Task(..., loop=None)
asyncio.async(..., loop=None)
should be related (regardless of the actual implementation details, like
the fact that "Task" is implemented as a class).
How this issue can be solved (besides being clearly described in docs)?
Well, it would help if the module offered just a particular variety of
API. For example, my problem is that I expected all operations to be
available as methods of loop.
But dropping that and having stuff like:
asyncio.run_forever(loop=None)
would work just as well, and probably would just allow for even more
efficient implementation (no need for dummy loop object when we have
"embedded loop" for example).
Finally, having both models, but offering more complete coverage of
operations in both (with easy-to-understand names) would be good either.