On 14 September 2017 at 11:44, Eric Snow <ericsnow...@gmail.com> wrote:> send(obj):I still expect any form of object sharing to hinder your
>
> Send the object to the receiving end of the channel. Wait until
> the object is received. If the channel does not support the
> object then TypeError is raised. Currently only bytes are
> supported. If the channel has been closed then EOFError is
> raised.
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.
As with the message passing through channels, I think you'll really
> Handling an exception
> ---------------------
>
> ::
>
> interp = interpreters.create()
> try:
> interp.run("""if True:
> raise KeyError
> """)
> except KeyError:
> print("got the error from the subinterpreter")
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.
One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.
I was going to note that you can already do this:
> Reseting __main__
> -----------------
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module. This means
> that data persists there between ``run()`` calls. Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``. Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
> ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
> after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially. It can wait until later
> if desirable.
interp.run("globals().clear()")
However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.
Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.
It's also the case that unlike Go channels, which were designed from
scratch on the basis of implementing pure CSP, Python has an
established behavioural precedent in the APIs of queue.Queue and
collections.deque: they're unbounded by default, and you have to opt
in to making them bounded.
>> There's a reason why sockets
>> always have bounded buffers -- it's sometimes painful, but the pain is
>> intrinsic to building distributed systems, and unbounded buffers just
>> paper over it.
>
> Papering over a problem is sometimes the right answer actually :-) For
> example, most Python programs assume memory is unbounded...
>
> If I'm using a queue or channel to push events to a logging system,
> should I really block at every send() call? Most probably I'd rather
> run ahead instead.
While the article title is clickbaity,
http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
actually has a good discussion of this point. Search for "compose" to
find the relevant section ("Channels don’t compose well with other
concurrency primitives").
The specific problem cited is that only offering unbuffered or
bounded-buffer channels means that every send call becomes a potential
deadlock scenario, as all that needs to happen is for you to be
holding a different synchronisation primitive when the send call
blocks.
>> > Also, suddenly an interpreter's ability to exploit CPU time is
>> > dependent on another interpreter's ability to consume data in a timely
>> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
>> > IMHO it would be better not to have such coupling.
>>
>> A small buffer probably is useful in some cases, yeah -- basically
>> enough to smooth out scheduler jitter.
>
> That's not about scheduler jitter, but catering for activities which
> occur at inherently different speed or rhythms. Requiring things run
> in lockstep removes a lot of flexibility and makes it harder to exploit
> CPU resources fully.
The fact that the proposal now allows for M:N sender:receiver
relationships (just as queue.Queue does with threads) makes that
problem worse, since you may now have variability not only on the
message consumption side, but also on the message production side.
Consider this example where you have an event processing thread pool
that we're attempting to isolate from blocking IO by using channels
rather than coroutines.
Desired flow:
1. Listener thread receives external message from socket
2. Listener thread files message for processing on receive channel
3. Listener thread returns to blocking on the receive socket
4. Processing thread picks up message from receive channel
5. Processing thread processes message
6. Processing thread puts reply on the send channel
7. Sending thread picks up message from send channel
8. Sending thread makes a blocking network send call to transmit the message
9. Sending thread returns to blocking on the send channel
When queue.Queue is used to pass the messages between threads, such an
arrangement will be effectively non-blocking as long as the send rate
is greater than or equal to the receive rate. However, the GIL means
it won't exploit all available cores, even if we create multiple
processing threads: you have to switch to multiprocessing for that,
with all the extra overhead that entails.
So I see the essential premise of PEP 554 as being to ask the question
"If each of these threads was running its own *interpreter*, could we
use Sans IO style protocols with interpreter channels to separate
internally "synchronous" processing threads from separate IO threads
operating at system boundaries, without having to make the entire
application pervasively asynchronous?"
If channels are an unbuffered blocking primitive, then we don't get
that benefit: even when there are additional receive messages to be
processed, the processing thread will block until the previous send
has completed. Switching the listener and sender threads over to
asynchronous IO would help with that, but they'd also end up having to
implement their own message buffering to manage the lack of buffering
in the core channel primitive.
By contrast, if the core channels are designed to offer an unbounded
buffer by default, then you can get close-to-CSP semantics just by
setting the buffer size to 1 (it's still not exactly CSP, since that
has a buffer size of 0, but you at least get the semantics of having
to alternate sending and receiving of messages).
>> > I expect more often than expected, in complex systems :-) For example,
>> > you could have a recv() loop that also from time to time send()s some
>> > data on another queue, depending on what is received. But if that
>> > send()'s recipient also has the same structure (a recv() loop which
>> > send()s from time to time), then it's easy to imagine to two getting in
>> > a deadlock.
>>
>> You kind of want to be able to create deadlocks, since the alternative
>> is processes that can't coordinate and end up stuck in livelocks or
>> with unbounded memory use etc.
>
> I am not advocating we make it *impossible* to create deadlocks; just
> saying we should not make them more *likely* than they need to.
Right, and I think the queue.Queue and collections.deque model works
well for that, since you can start introducing queue bounds to
propagate backpressure through a system if you're seeing undesirable
memory growth.
>> It's fairly reasonable to implement a mutex using a CSP-style
>> unbuffered channel (send = acquire, receive = release). And the same
>> trick turns a channel with a fixed-size buffer into a bounded
>> semaphore. It won't be as efficient as a modern specialized mutex
>> implementation, of course, but it's workable.
>
> We are drifting away from the point I was trying to make here. I was
> pointing out that the claim that nothing can be shared is a lie.
> If it's possible to share a small datum (a synchronized counter aka
> semaphore) between processes, certainly there's no technical reason
> that should prevent it between interpreters.
>
> By the way, I do think efficiency is a concern here. Otherwise
> subinterpreters don't even have a point (just use multiprocessing).
Agreed, and I think the interaction between the threading module and
the interpreters module is one we're going to have to explicitly call
out as being covered by the provisional status of the interpreters
module, as I think it could be incredibly valuable to be able to send
at least some threading objects through channels, and have them be an
interpreter-specific reference to a common underlying sync primitive.
>> Unfortunately while technically you can construct a buffered channel
>> out of an unbuffered channel, the construction's pretty unreasonable
>> (it needs two dedicated threads per channel).
>
> And the reverse is quite cumbersome as well. So we should favour the
> construct that's more convenient for users, or provide both.
As noted above, I think consistency with design intuitions formed
through the use of queue.Queue is also an important consideration.
Cheers,
Nick.
--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
FWIW, Go's channels (and goroutines) don't implement pure CSP. They
provide a variant that the Go authors felt was more in-line with the
language's flavor. The channels in the PEP aim to support a more pure
implementation.
> Python has an
> established behavioural precedent in the APIs of queue.Queue and
> collections.deque: they're unbounded by default, and you have to opt
> in to making them bounded.
Right. That's part of why I'm leaning toward support for buffered channels.
> While the article title is clickbaity,
> http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
> actually has a good discussion of this point. Search for "compose" to
> find the relevant section ("Channels don’t compose well with other
> concurrency primitives").
>
> The specific problem cited is that only offering unbuffered or
> bounded-buffer channels means that every send call becomes a potential
> deadlock scenario, as all that needs to happen is for you to be
> holding a different synchronisation primitive when the send call
> blocks.
Yeah, that blog post was a reference for me as I was designing the
PEP's channels.
+1
> If channels are an unbuffered blocking primitive, then we don't get
> that benefit: even when there are additional receive messages to be
> processed, the processing thread will block until the previous send
> has completed. Switching the listener and sender threads over to
> asynchronous IO would help with that, but they'd also end up having to
> implement their own message buffering to manage the lack of buffering
> in the core channel primitive.
>
> By contrast, if the core channels are designed to offer an unbounded
> buffer by default, then you can get close-to-CSP semantics just by
> setting the buffer size to 1 (it's still not exactly CSP, since that
> has a buffer size of 0, but you at least get the semantics of having
> to alternate sending and receiving of messages).
Yep, I came to the same conclusion.
>> By the way, I do think efficiency is a concern here. Otherwise
>> subinterpreters don't even have a point (just use multiprocessing).
>
> Agreed, and I think the interaction between the threading module and
> the interpreters module is one we're going to have to explicitly call
> out as being covered by the provisional status of the interpreters
> module, as I think it could be incredibly valuable to be able to send
> at least some threading objects through channels, and have them be an
> interpreter-specific reference to a common underlying sync primitive.
Agreed. I'll add a note to the PEP.
-eric
On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan <ncog...@gmail.com> wrote:
> The problem relates to the fact that there aren't any memory barriers
> around CPython's INCREF operations (they're implemented as an ordinary
> C post-increment operation), so you can get the following scenario:
>
> * thread on CPU A has the sole reference (ob_refcnt=1)
> * thread on CPU B acquires a new reference, but hasn't pushed the
> updated ob_refcnt value back to the shared memory cache yet
> * original thread on CPU A drops its reference, *thinks* the refcnt is
> now zero, and deletes the object
> * bad things now happen in CPU B as the thread running there tries to
> use a deleted object :)
I'm not clear on where we'd run into this problem with channels.
Mirroring your scenario:
* interpreter A (in thread on CPU A) INCREFs the object (the GIL is still held)
* interp A sends the object to the channel
* interp B (in thread on CPU B) receives the object from the channel
* the new reference is held until interp B DECREFs the object
From what I see, at no point do we get a refcount of 0, such that
there would be a race on the object being deleted.
The only problem I'm aware of (it dawned on me last night), is in the
case that the interpreter that created the object gets deleted before
the object does. In that case we can't pass the deletion back to the
original interpreter. (I don't think this problem is necessarily
exclusive to the solution I've proposed for Bytes.)
-eric
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
After we move to not sharing the GIL between interpreters:
Channel.send(obj): # in interp A
incref(obj)
if type(obj).tp_share == NULL:
raise ValueError("not a shareable type")
set_owner(obj) # obj.owner or add an obj -> interp entry to global table
ch.objects.append(obj)
Channel.recv(): # in interp B
orig = ch.objects.pop(0)
obj = orig.tp_share()
set_shared(obj, orig) # add to a global table
return obj
bytes.tp_share():
obj = blank_bytes(len(self))
obj.ob_sval = self.ob_sval # hand-wavy memory sharing
return obj
bytes.tp_free(): # under no-shared-GIL:
# most of this could be pulled into a macro for re-use
orig = lookup_shared(self)
if orig != NULL:
current = release_LIL()
interp = lookup_owner(orig)
acquire_LIL(interp)
decref(orig)
release_LIL(interp)
acquire_LIL(current)
# clear shared/owner tables
# clear/release self.ob_sval
free(self)
> And that's the real pay-off that comes from defining this in terms of the
> memoryview protocol: Py_buffer structs *aren't* Python objects, so it's only
> a regular C struct that gets passed across the interpreter boundary (the
> reference to the original objects gets carried along passively as part of
> the CIV - it never gets *used* in the receiving interpreter).
Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
ways that are applicable to channels here. I'm simply reticent to
lock PEP 554 into such a specific solution as the buffer-specific CIV.
I'm trying to accommodate anticipated future needs while keeping the
PEP as simple and basic as possible. It's driving me nuts! :P Things
were *much* simpler before I added Channels to the PEP. :)
> I don't think we should be touching the behaviour of core builtins solely to
> enable message passing to subinterpreters without a shared GIL.
Keep in mind that I included the above as a possible solution using
tp_share() that would work *after* we stop sharing the GIL. My point
is that with tp_share() we have a solution that works now *and* will
work later. I don't care how we use tp_share to do so. :) I long to
be able to say in the PEP that you can pass bytes through the channel
and get bytes on the other side.
My mind is drawn to the comparison between that and the question of
CIV vs. tp_share(). CIV would be more like the post-451 import world,
where I expect the CIV would take care of the data sharing operations.
That said, the situation in PEP 554 is sufficiently different that I'm
not convinced a generic CIV protocol would be better. I'm not sure
how much CIV could do for you over helpers+tp_share.
Anyway, here are the leading approaches that I'm looking at now:
* adding a tp_share slot
+ you send() the object directly and recv() the object coming out of
tp_share()
(which will probably be the same type as the original)
+ this would eventually require small changes in tp_free for
participating types
+ we would likely provide helpers (eventually), similar to the new
buffer protocol,
to make it easier to manage sharing data
* simulating tp_share via an external global registry (or a registry
on the Channel type)
+ it would still be hard to make work without hooking into tp_free()
* CIVs hard-coded in Channel (or BufferViewChannel, etc.) for specific
types (e.g. buffers)
+ you send() the object like normal, but recv() the view
* a CIV protocol on Channel by which you can add support for more types
+ you send() the object like normal but recv() the view
+ could work through subclassing or a registry
+ a lot of conceptual similarity with tp_share+tp_free
* a CIV-like proxy
+ you wrap the object, send() the proxy, and recv() a proxy
+ this is entirely compatible with tp_share()
Here are what I consider the key metrics relative to the utility of a
solution (not in any significant order):
* how hard to understand as a Python programmer?
* how much extra work (if any) for folks calling Channel.send()?
* how much extra work (if any) for folks calling Channel.recv()?
* how complex is the CPython implementation?
* how hard to understand as a type author (wanting to add support for
their type)?
* how hard to add support for a new type?
* what variety of types could be supported?
* what breadth of experimentation opens up?
The most important thing to me is keeping things simple for Python
programmers. After that is ease-of-use for type authors. However, I
also want to put us in a good position in 3.7 to experiment
extensively with subinterpreters, so that's a big consideration.
Consequently, for PEP 554 my goal is to find a solution for object
sharing that keeps things simple in Python while laying a basic
foundation we can build on at the C level, so we don't get locked in
but still maximize our opportunities to experiment. :)
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
While I'm actually trying not to say much here so that I can avoid this discussion now, here's just a couple of ideas and thoughts from me at this point:(A)Instead of sending bytes and receiving memoryviews, one could consider sending *and* receiving memoryviews for now. That could then be extended into more types of objects in the future without changing the basic concept of the channel. Probably, the memoryview would need to be copied (but not the data of course). But I'm guessing copying a memoryview would be quite fast.
This would hopefully require less API changes or additions in the future. OTOH, giving it a different name like MemChannel or making it 3rd party will buy some more time to figure out the right API. But maybe that's not needed.
(B)We would probably then like to pretend that the object coming out the other end of a Channel *is* the original object. As long as these channels are the only way to directly pass objects between interpreters, there are essentially only two ways to tell the difference (AFAICT):1. Calling id(...) and sending it over to the other interpreter and checking if it's the same.2. When the same object is sent twice to the same interpreter. Then one can compare the two with id(...) or using the `is` operator.There are solutions to the problems too:1. Send the id() from the sending interpreter along with the sent object so that the receiving interpreter can somehow attach it to the object and then return it from id(...).2. When an object is received, make a lookup in an interpreter-wide cache to see if an object by this id has already been received. If yes, take that one.Now it should essentially look like the received object is really "the same one" as in the sending interpreter. This should also work with multiple interpreters and multiple channels, as long as the id is always preserved.
(C)One further complication regarding memoryview in general is that .release() should probably be propagated to the sending interpreter somehow.
(D)I think someone already mentioned this one, but would it not be better to start a new interpreter in the background in a new thread by default? I think this would make things simpler and leave more freedom regarding the implementation in the future. If you need to run an interpreter within the current thread, you could perhaps optionally do that too.
Examples
========
Run isolated code
-----------------
::
interp = interpreters.create()
print('before')
interp.run('print("during")')
print('after')