[Python-ideas] Proposal: A simple protocol for generator tasks

22 views
Skip to first unread message

Piet Delport

unread,
Oct 14, 2012, 11:36:59 PM10/14/12
to python...@python.org
[This is a lengthy mail; I apologize in advance!]

Hi,

I've been following this discussion with great interest, and would like
to put forward a suggestion that might simplify some of the questions
that are up in the air.

There are several key point being considered: what exactly constitutes a
"coroutine" or "tasklet", what the precise semantics of "yield" and
"yield from" should be, how the stdlib can support different event loops
and reactors, and how exactly Futures, Deferreds, and other APIs fit
into the whole picture.

This mail is mostly about the first point: I think everyone agrees
roughly what a coroutine-style generator is, but there's enough
variation in how they are used, both historically and presently, that
the concept isn't as precise as it should be. This makes them hard to
think and reason about (failing the "BDFL gets headaches" test), and
makes it harder to define the behavior of all the parts that they
interact with, too.

This is a sketch of an attempt to define what constitutes a
generator-based task or coroutine more rigorously: I think that the
essential behavior can be captured in a small protocol, building on the
generator and iterator protocols. If anyone else thinks this is a good
idea, maybe something like this could work its way into a PEP?

(For the sake of this mail, I will use the term "generator task" or
"task" as a straw man term, but feel free to substitute "coroutine", or
whatever the preferred name ends up being.)


Definition
==========

Very informally: A "generator task" is what you get if you take a normal
Python function and replace its blocking calls with "yield from" calls
to equivalent subtasks.

More formally, a "generator task" is a generator that implements an
incremental, multi-step computation, and is intended to be externally
driven to completion by a runner, or "scheduler", until it delivers a
final result.

This driving process happens as follows:

1. A generator task is iterated by its scheduler to yield a series of
   intermediate "step" values.

2. Each value yielded as a "step" represents a scheduling instruction,
   or primitive, to be interpreted by the task's scheduler.

   This scheduling instruction can be None ("just resume this task
   later"), or a variety of other primitives, such as Futures ("resume
   this task with the result of this Future"); see below for more.

3. The scheduler is responsible for interpreting each "step" instruction
   as appropriate, and sending the instruction's result, if any, back to
   the task using send() or throw().

   A scheduler may run a single task to completion, or may multiplex
   execution between many tasks: generator tasks should assume that
   other tasks may have executed while the task was yielding.

4. The generator task completes by successfully returning (raising
   StopIteration), or by raising an exception. The task's caller
   receives this result.

(For the sake of discussion, I use "the scheduler" to refer to whoever
calls the generator task's next/send/throw methods, and "the task's
caller" to refer to whoever receives the task's final result, but this
is not important to the protocol: a task should not care who drives it
or consumes its result, just like an iterator should not.)


Scheduling instructions / primitives
====================================

(This could probably use a better name.)

The protocol is intentionally agnostic about the implementation of
schedulers, event loops, or reactors: as long as they implement the same
set of scheduling primitives, code should work across them.

There multiple ways to accomplish this, but one possibility is to have a
set common, generic instructions in a standard library module such as
"tasklib" (which could also contain things like default scheduler
implementations, helper functions, and so on).

A partial list of possible primitives (the names are all made up, not
serious suggestions):

1. None: The most basic "do nothing" instruction. This just instructs
   the scheduler to resume the yielding task later.

2. Futures: Instruct the scheduler to resume with the future's result.

   Similar types in third-party libraries, such Deferreds, could
   potentially be implemented either natively by a scheduler that
   supports it, or using a wait_for_deferred(d) helper task, or using
   the idea of a "adapter" scheduler (see below).

3. Control primitives: spawn, sleep, etc.

   - Spawn a new (independent) task: yield tasklib.spawn(task())
   - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
   - Delay execution: yield tasklib.sleep(seconds)
   - etc.

   These could be simple marker objects, leaving it up to the underlying
   scheduler to actually recognize and implement them; some could also
   be implemented in terms of simpler operations (e.g.  sleep(), in
   terms of lower-level suspend and resume operations).

4. I/O operations

   This could be anything from low-level "yield fd_readable(sock)" style
   requests, or any of the higher-level APIs being discussed elsewhere.

   Whatever the exact API ends up being, the scheduler should implement
   these primitives by waiting for the I/O (or condition), and resuming
   the task with the result, if any.

5. Cooperative concurrency primitives, for working with locks, condition
   variables, and so on. (If useful?)

6. Custom, scheduler-specific instructions: Since a generator task can
   potentially yield anything as a scheduler instruction, it's not
   inconceivable for specialized schedulers to support specialized
   instructions. (Code that relies on such special instructions won't
   work on other schedulers, but that would be the point.)

A question open to debate is what a scheduler should do when faced with
an unrecognized scheduling instruction.

Raising TypeError or NotImplementedError back into the task is probably
a reasonable action, and would allow code like:

    def task():
        try:
            yield fancy_magic_instruction()
        except NotImplementedError:
            yield from boring_fallback()
        ...


Generator tasks as schedulers, and vice versa
=============================================

Note that there is a symmetry to the protocol when a generator task
calls another using "yield from":

    def task()
        spam = yield from subtask()

Here, task() is both a generator task, and the effective scheduler for
subtask(): it "implements" subtask()'s scheduling instructions by
delegating them to its own scheduler.

This is a plain observation on its own, however, it raises one or two
interesting possibilities for more interesting schedulers implemented as
generator tasks themselves, including:

- Specialized sub-schedulers that run as a normal task within their
  parent scheduler, but implement for example weighted or priority
  queuing of their subtasks, or similar features.

- "Adapter" schedulers that intercept special scheduler instructions
  (say, Deferreds or other library-specific objects), and implement them
  using more generic instructions to the underlying scheduler.


-- 
Piet Delport

Greg Ewing

unread,
Oct 15, 2012, 5:17:17 AM10/15/12
to python...@python.org
Piet Delport wrote:

> 2. Each value yielded as a "step" represents a scheduling instruction,
> or primitive, to be interpreted by the task's scheduler.

I don't think this technique should be used to communicate
with the scheduler, other than *maybe* for a *very* small
set of operations that are truly primitive -- and even then
I'm not convinced.

To begin with, there are some operations that *can't* rely
on yielded instructions as the only way of invoking them.
Spawning a task, for example -- there must be some way for
non-task code to invoke that, otherwise you wouldn't be able
to get top-level tasks into the system.

Also, consider the operation of unblocking a task that's
waiting for some event to occur. Often you will want to
invoke this using a callback from an event loop, which is
not a generator and can't yield anything to anywhere.

Given that these operations must provide a way of invoking
them using a plain function call, there is little reason
to provide a second way using a yielded instruction.

In any case, I believe that the public interface for *any*
scheduler operation should not be a yielded instruction,
but either a plain function or something called using
yield-from, for reasons I explained to Guido earlier.

> - Specialized sub-schedulers that run as a normal task within their
> parent scheduler, but implement for example weighted or priority
> queuing of their subtasks, or similar features.

There are problems with allowing multiple schedulers to
coexist within the one system, especially if yielded
instructions are the only way to communicate with them.

It might work for instructions to a task's own scheduler
concerning itself, but some operations need to operate on
a *different* task, e.g. unblocking a task when the event
it was waiting for occurs. How do you know which scheduler
is managing it? And even if you can find out, if you have
to control it using yielded instructions, you have no
way of yielding something to a different task's scheduler.

--
Greg
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Calvin Spealman

unread,
Oct 15, 2012, 6:48:20 AM10/15/12
to Piet Delport, python...@python.org
On Sun, Oct 14, 2012 at 11:36 PM, Piet Delport <pjde...@gmail.com> wrote:
> [This is a lengthy mail; I apologize in advance!]

This is what I get for deciding to check up on these threads at 6AM
after a late night.

> Hi,
>
> I've been following this discussion with great interest, and would like
> to put forward a suggestion that might simplify some of the questions
> that are up in the air.
>
> There are several key point being considered: what exactly constitutes a
> "coroutine" or "tasklet", what the precise semantics of "yield" and
> "yield from" should be, how the stdlib can support different event loops
> and reactors, and how exactly Futures, Deferreds, and other APIs fit
> into the whole picture.
>
> This mail is mostly about the first point: I think everyone agrees
> roughly what a coroutine-style generator is, but there's enough
> variation in how they are used, both historically and presently, that
> the concept isn't as precise as it should be. This makes them hard to
> think and reason about (failing the "BDFL gets headaches" test), and
> makes it harder to define the behavior of all the parts that they
> interact with, too.
>
> This is a sketch of an attempt to define what constitutes a
> generator-based task or coroutine more rigorously: I think that the
> essential behavior can be captured in a small protocol, building on the
> generator and iterator protocols. If anyone else thinks this is a good
> idea, maybe something like this could work its way into a PEP?
>
> (For the sake of this mail, I will use the term "generator task" or
> "task" as a straw man term, but feel free to substitute "coroutine", or
> whatever the preferred name ends up being.)

I like that "task" is more general and avoids complaints from some that
these are not "real" coroutines.

> Definition
> ==========
>
> Very informally: A "generator task" is what you get if you take a normal
> Python function and replace its blocking calls with "yield from" calls
> to equivalent subtasks.

"yield" and "yield from", although I'm really disliking the second
being included
at all. More on this later.
What is the difference between the tossed around "yield from task()"
and this "yield tasklib.spawn(task())"

And, why isn't it simply spelled "yield task()"? You have all these different
types that can be yielded to the scheduler from tasks to the scheduler. Why
isn't a task one of those possible types? If the scheduler gets an iterator, it
should schedule it automatically.

> 4. I/O operations
>
> This could be anything from low-level "yield fd_readable(sock)" style
> requests, or any of the higher-level APIs being discussed elsewhere.
>
> Whatever the exact API ends up being, the scheduler should implement
> these primitives by waiting for the I/O (or condition), and resuming
> the task with the result, if any.
>
> 5. Cooperative concurrency primitives, for working with locks, condition
> variables, and so on. (If useful?)

I am sure these will come about, but I think that is considered a
library that sits
on top of whatever API comes out, not part of it.

> 6. Custom, scheduler-specific instructions: Since a generator task can
> potentially yield anything as a scheduler instruction, it's not
> inconceivable for specialized schedulers to support specialized
> instructions. (Code that relies on such special instructions won't
> work on other schedulers, but that would be the point.)
>
> A question open to debate is what a scheduler should do when faced with
> an unrecognized scheduling instruction.
>
> Raising TypeError or NotImplementedError back into the task is probably
> a reasonable action, and would allow code like:
>
> def task():
> try:
> yield fancy_magic_instruction()
> except NotImplementedError:
> yield from boring_fallback()
> ...

Interesting. Can anyone think of an example of this?

>
> Generator tasks as schedulers, and vice versa
> =============================================
>
> Note that there is a symmetry to the protocol when a generator task
> calls another using "yield from":
>
> def task()
> spam = yield from subtask()
>
> Here, task() is both a generator task, and the effective scheduler for
> subtask(): it "implements" subtask()'s scheduling instructions by
> delegating them to its own scheduler.

As raised above, why not simply "yield subtask()"?

> This is a plain observation on its own, however, it raises one or two
> interesting possibilities for more interesting schedulers implemented as
> generator tasks themselves, including:
>
> - Specialized sub-schedulers that run as a normal task within their
> parent scheduler, but implement for example weighted or priority
> queuing of their subtasks, or similar features.

I think that is too messy, you could have so many different scheduler
semantics. Maybe this sort of thing is what your schedule-specific
instructions should be for.

Or, attributes on tasks that schedulers can be known to look for.

> - "Adapter" schedulers that intercept special scheduler instructions
> (say, Deferreds or other library-specific objects), and implement them
> using more generic instructions to the underlying scheduler.

I think we can make yielding tasks a direct operation, and still implment
sub-schedulers. They should be more opaque, I think.

> --
> Piet Delport
>
> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



--
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

Ronny Pfannschmidt

unread,
Oct 15, 2012, 7:39:16 AM10/15/12
to Piet Delport, python...@python.org
Hi Piet,

i like that finally someone is pointing out
how to deal with the *concurrent* part

i have some further notes

* greenlet interaction wanted
since interacting with greenlets is slightly different
from generators

* they don’t get the function arguments at greenlet creation time,
but on the first `switch`

generator outer use:
gn = f(*arg, **kwarg)
gn.next()

greenlet outer use:
gr = greenlet.greenlet(f)
gr.switch(*args, **kw)

* instead of send/next, they always use switch
* `yield` is a function call
-> there is need for a lib to manage the local part
of greenlet operations in any case

(so we should just ensure that the scheduler can
handle their way if `yield`,
but not actually have support/compat code in
the stdlib for their yielding)

* considering regular classes for interaction
since for some protocol implementations
different means might make sense
(this could also be used for the scheduler part of
greenlet interaction)

result -> a protocol for cooperative concurrency

* considering the upcoming pypy transaction module/stm
since using that right could mean "free" parallelism in future
* alternatives for queues/channels are needed
* pools/rate-limiters and other exercises are needed as well
* some kind of default tools for servers are needed

* the stdlib could have a very simple default scheduler
that’s just doing something basic like run all work it can do,
and if it cant block on a io reactor

we just need something that can run() after all has been created

having an api like sheduler.add(gen) would be a plus
(since it would be just like pypy's transaction module)

an example i have in mind is something like

sheduler.add(...)
sheduler.add(...)
sheduler.run()




If things go as I planned on my side,
starting in jan/feb 2013 i'll try a prototype implementation
for further comments/actual experimentation.

-- Ronny

Piet Delport

unread,
Oct 16, 2012, 3:27:01 AM10/16/12
to Greg Ewing, python...@python.org
On Mon, Oct 15, 2012 at 11:17 AM, Greg Ewing
<greg....@canterbury.ac.nz> wrote:
> Piet Delport wrote:
>
>> 2. Each value yielded as a "step" represents a scheduling instruction,
>> or primitive, to be interpreted by the task's scheduler.
>
>
> I don't think this technique should be used to communicate
> with the scheduler, other than *maybe* for a *very* small
> set of operations that are truly primitive -- and even then
> I'm not convinced.

But this is by necessity how the scheduler is *already* being
communicated with, at least for the de facto scheduler instructions like
None, Future, and the other primitives being discussed.

This concept of an "intermediate object yielded by a task to its
scheduler on each step, instructing it how to schedule" is already
unavoidably fundamental to how these tasks / coroutines work: this
proposal is just an attempt to name that concept, and define it more
clearly.


> To begin with, there are some operations that *can't* rely
> on yielded instructions as the only way of invoking them.
> Spawning a task, for example -- there must be some way for
> non-task code to invoke that, otherwise you wouldn't be able
> to get top-level tasks into the system.

I'm definitely not suggesting that this be the *only* way of invoking
operations, or that all operations should be invoked this way.

Certainly, everything that is possible inside this protocol will also be
possible outside of it by directly calling methods on some global
scheduler, but that requires knowing who and what that global scheduler
is.

It's important to note that a globally identifiable scheduler object
might not even exist: it's entirely reasonable, for example, to
implement this entire protocol in Twisted by writing a deferTask(task)
helper that handles generic scheduler instructions (None, Future-alike,
and things like spawn() and sleep()) by just arranging for the
appropriate Twisted callbacks and resumptions to happen under the hood.

(This is basically how Twisted's deferredGenerator works currently: the
main difference is that a deferTask() implementation would be able to
run any generic coroutine / generator task code that uses this protocol,
without that code having to know about Twisted.)

Regarding getting top-level tasks into the system, this can be done in a
variety of ways, depending on how particular applications are
structured. For example, if the stdlib grows a standardized default
event loop:

tasklib.DefaultScheduler(tasks).start()

or:

result = tasklib.run(task())

or with existing frameworks like Twisted:

deferTask(task()).addCallback(consume)
deferTasks(othertasks)
reactor.start()

In other words, only the top level of an application should need to
worry about how the initial scheduler, tasks, and everything else are
started.


> Also, consider the operation of unblocking a task that's
> waiting for some event to occur. Often you will want to
> invoke this using a callback from an event loop, which is
> not a generator and can't yield anything to anywhere.

This can be done with a scheduler primitive that obtains a callable to
resume the current task, like the strawman:

resume = yield tasklib.get_resume()

from the other thread.

However the exact API ends up looking, suspending and resuming tasks are
very fundamental operations, and probably the most worth having as
standardized instructions that any scheduler can implement: a variety of
more powerful abstractions can be generically built on top of them.


> Given that these operations must provide a way of invoking
> them using a plain function call, there is little reason
> to provide a second way using a yielded instruction.

I don't see the former as an argument to avoid supporting the same
operations as standard yielded instructions.

A task can arrange to wait for a Future using plain function calls, or
by yielding it as an instruction (i.e., "result = yield some_future()"):
the ability to do the former should not make the latter any less
desirable.

The advantage of treating certain primitives as yielded scheduler
instructions is that:

- It's generic and scheduler-agnostic: for example, any task can simply
yield a Future to its scheduler without caring exactly how the
scheduler arranges for add_done_callback() to resume the task.

- It requires no global coordination: every generator task already has a
direct line of communication to its immediate scheduler, without
having to identify itself using handles, task ids, or other
mechanisms.

In other words, it's the difference between saying:

h = get_current_task_handle()
current_scheduler.sleep(h, 10)
yield
current_scheduler.suspend(h)
yield

and, saying:

yield tasklib.sleep(10)
yield tasklib.suspend()

where sleep(n) and suspend() are simple generic objects that any
scheduler can recognize and implement, just like how yielded None and
Future values are recognized and implemented.


> In any case, I believe that the public interface for *any*
> scheduler operation should not be a yielded instruction,
> but either a plain function or something called using
> yield-from, for reasons I explained to Guido earlier.

In other words, limiting the allowable set of yielded scheduler
instructions to None, and doing everything else separate API?

This is possible, but it seems like an awful waste of the perfectly good
and dedicated communication channel that already exists between tasks
and their schedulers, in favor of something more complex and indirect.

There's certainly a motivation for global APIs too, as with the
discussion about getting standardized event loops and schedulers in the
stdlib, but I think that is solving a somewhat different problem, and
see this no reason to tie coroutines / generator tasks to those APIs
when simpler, more generic and universal protocol could be defined.

To me, defining locally how a scheduler should behave and respond to
certain yielded types and values is a much more tractable problem than
the question of designing a good global scheduler API that exposes all
the same operations in a way that's portable and usable across many
different application architectures and lifecycles.


> There are problems with allowing multiple schedulers to
> coexist within the one system, especially if yielded
> instructions are the only way to communicate with them.
>
> It might work for instructions to a task's own scheduler
> concerning itself, but some operations need to operate on
> a *different* task, e.g. unblocking a task when the event
> it was waiting for occurs. How do you know which scheduler
> is managing it?

The point of a protocol like this is that there would be no need for
tasks to know which schedulers are managing what: they can limit
themselves to using a generic protocol.

For example, the par() implementation I gave assumes the primitive:

resume = yield tasklib.get_resume()

to get a callable to resume itself, and can simply pass that callable to
the tasks it spawns: the last child to complete just calls resume() to
resume the parent task in its own scheduler.

In this example, the resume callable contains all the necessary state to
resume that particular task. A particular scheduler could implement this
primitive by sending back a closure like:

lambda: current_scheduler.schedule(the_task)

In the case of something like deferTask(), there need not even be any
particular long-lived scheduler aside from the transient calls arranged
by deferTask, and all the state would live in the Twisted reactor and
its queues:

lambda: reactor.callLater(_defertask_iterate, the_task)

As far as the generic protocol is concerned, it does not matter whether
there's a single global scheduler, or multiple schedulers, or no single
scheduler at all: the scheduler side of the protocol is free to be
implemented in many ways, and manage its state however it's convenient.


> And even if you can find out, if you have to control it using yielded
> instructions, you have no way of yielding something to a different
> task's scheduler.

Generally speaking, this should not be necessary: inter-task
communication is a different question to how tasks should communicate
with their immediate scheduler.

Generically controlling the scheduling of different tasks can be done in
many ways:

- The way par() passes its resume callable to its spawned children.

- Using synchronization primitives: for example, an alternative way to
implement something like par() without direct use of suspend/resume is
cooperative condition variable or semaphore.

- Using queues, channels, or similar mechanisms to communicate
information between tasks. (The communicated values can implicitly
even be scheduler instructions themselves, like a queue of Futures.)

If something cannot be done inside this generator task protocol, you can
of course still step outside of it and use other mechanisms directly,
but that necessarily ties your code to those mechanisms, which may not
be as simple and universal as code that only relies on this protocol.

Piet Delport

unread,
Oct 16, 2012, 6:56:44 PM10/16/12
to python...@python.org
On Mon, Oct 15, 2012 at 12:48 PM, Calvin Spealman <ironf...@gmail.com> wrote:
>
> What is the difference between the tossed around "yield from task()"
> and this "yield tasklib.spawn(task())"

"yield from task()" is simply the coroutine / task version of a function
call: it runs the task to completion, and returns its final result.

"yield tasklib.spawn(task())" (or however it ends up being spelled)
would be a scheduler primitive to start a task *without* waiting for its
result: in other words, it's a request that the scheduler start a new,
independent thread of control.


> And, why isn't it simply spelled "yield task()"? You have all these different
> types that can be yielded to the scheduler from tasks to the scheduler. Why
> isn't a task one of those possible types? If the scheduler gets an iterator, it
> should schedule it automatically.

This is a good question: I stopped short of discussing it in the
original message only to keep it short, and in the hope that the answer
is implied.

The short answer is that "yield task()" is the old, hacky, cumbersome,
"legacy"[1] way of calling subtasks, and that "yield from" should
entirely replace the need to have to support it.

Before "yield from", "yield task()" was the only to call subtasks, but
this approach has some major disadvantages:

1. In order for it to work, schedulers must manually implement task
trampolining, which is ugly at best, and prone to bugs if not all
edge cases are handled correctly. (IOW, it effectively places the
burden of implementing PEP 380 onto each scheduler.)

2. It obfuscates exception tracebacks by default, requiring schedulers
that want readable stack traces to take additional pains to clean up
their own non-task frames, while propagating exceptions.

3. It requires schedulers to reliably distinguish between tasks and
other primitives in the first place.

Simply treating all iterators as tasks is not sufficient: to run a
task, you need send() and throw(), at least. (Type-checking for
GeneratorType would be marginally better, but would unnecessarily
preclude for example implementing tasks as classes or C extension
types, which is otherwise entirely possible with this protocol.)


"yield from" simplifies and solves all these problems in elegant swoop:

1. No more manual trampolining: a scheduler can treat any task as a
single unit, and only needs to worry about the single, combined
stream of instructions coming from it.

2. Tracebacks (and return values) take care of themselves, as they
should.

3. By separating the concerns of direct scheduler communication
("yield") and subtask delegation ("yield from"), schedulers can limit
themselves to just knowing about scheduler primitives when dealing
yielded values, which should be more easily and tightly defined than
the full spectrum of tasks in general. (The set of officially-defined
scheduler instructions could end up being as small as None and
Future, say.)


In summary, it's entirely possible for schedulers to continue supporting
the old "yield task()" way of calling subtasks (and this has no problem
fitting into the proposed protocol[2]), but there should be no reason to
do so, and several good reasons not to: hopefully, it will become a
pre-3.3 historical footnote.


[1] For the purposes of this email, interpret "legacy" to mean "older
than 17 days". :)

[2] Interpreted as a scheduler instruction, a task value would simply
mean "resume the current task with the result of completing the
yielded subtask" (modulo the practical question of reliably
type-checking tasks, as mentioned).


>> Raising TypeError or NotImplementedError back into the task is probably
>> a reasonable action, and would allow code like:
>>
>> def task():
>> try:
>> yield fancy_magic_instruction()
>> except NotImplementedError:
>> yield from boring_fallback()
>> ...
>
> Interesting. Can anyone think of an example of this?

I just want to note for the record that I'm not *encouraging* this kind
of thing: I'm just just observing that it would be allowed by the
protocol.

(However, one imaginable use case would be for tasks to send
scheduler-specific hints, that can safely be ignored when those tasks
are running on other scheduler implementations.)


>> This is a plain observation on its own, however, it raises one or two
>> interesting possibilities for more interesting schedulers implemented as
>> generator tasks themselves, including:
>>
>> - Specialized sub-schedulers that run as a normal task within their
>> parent scheduler, but implement for example weighted or priority
>> queuing of their subtasks, or similar features.
>
> I think that is too messy, you could have so many different scheduler
> semantics. Maybe this sort of thing is what your schedule-specific
> instructions should be for.

It shouldn't get messy: the core semantics of any scheduler should
always stay within the proposed protocol.

The above is not the best example of a custom scheduler, though.
Perhaps a better example would be a generic helper function like the
following, that implements throttling throttling of I/O requests made
through it:

def task():
result = yield from io_throttled(subtask(), rate=foo)

io_throttled() would end up sitting between task() and subtask() in the
hierarchy, like so:

... -> task() -> io_throttled() -> subtask() -> ...

To recap, each task is implicitly driven by the scheduler above it, and
implicitly drives the task(s) below it: The outer scheduler drives
task(), which drives io_throttled(), which drives subtask(), and so on.

In this picture: "yield from" is the "most default" scheduler: it simply
delegates all yielded instructions to the outer scheduler.

However, instead of relying on "yield from", io_throttled() can dip down
into the task protocol itself, and drive subtask() directly. This would
allow it to inspect and manipulate the underlying instructions
instructions and responses flowing back and forth, and, assuming that
there's a recognizable standard representation for I/O primitives, it
could keep track of the rate of I/O, and insert delay instructions as
necessary (or something similar).

The key observations I want to make:

* io_throttled() is not special: it is just a normal task, as far as the
tasks above and below it are concerned, and assumes only a
recognizable representation of the fundamental I/O and delay
instructions used.

* To the extent that said underlying primitives are scheduler-agnostic,
io_throttled() can be used or inserted anywhere, without caring how
the underlying scheduler or event loop handles I/O, or how its global
API looks. It just acts locally, in terms of the task protocol.

An example where this kind of thing might actually be useful is an
application or library that wishes to throttle, say, certain HTTP
requests: it could simply internally wrap the tasks that make those
requests in io_throttled(), without any special support from the
underlying scheduler.

This is of course not the only way to solve this particular problem, but
it's an example of how thinking about generator tasks and their
schedulers as two sides of the same underlying protocol could be a
powerful abstraction, enabling a compositional approach to combining
implementations of the protocol that might not be obvious or possible
otherwise.
Reply all
Reply to author
Forward
0 new messages