Add an introspection API to Executor

Ram Rachum

unread,

Aug 24, 2014, 5:41:01 PM8/24/14

to python...@googlegroups.com

Sometimes I want to take a live executor, like a `ThreadPoolExecutor`, and check up on it. I want to know how many threads there are, how many are handling tasks and which tasks, how many are free, and which tasks are in the queue.

I asked on Stack Overflow: http://stackoverflow.com/questions/25474204/checking-up-on-a-concurrent-futures-threadpoolexecutor

There's an answer there, but it uses private variables and it's not part of the API.

I suggest it become a part of the API. There should be an API for checking on what the executor is currently doing and answering all the questions I raised above.

Thanks,

Ram.

Guido van Rossum

unread,

Aug 24, 2014, 6:19:18 PM8/24/14

to Ram Rachum, python...@googlegroups.com

I've had similar needs.

--
--Guido van Rossum (on iPad)

Ram Rachum

unread,

Aug 25, 2014, 9:22:34 AM8/25/14

to python...@googlegroups.com

I don't like the fact that this API would be restricted by the API of the underlying concurrency primitives. Can't we have the executor keep tabs on its state in a way that we can access?

On Mon, Aug 25, 2014 at 4:12 PM, Dan O'Reilly <orei...@gmail.com> wrote:

Adding active/idle/total worker counts for both ThreadPoolExecutor and ProcessPoolExecutor is pretty straightforward; I threw a patch together for both in 30 minutes or so. However, I don't think its possible to inspect the contents of a ProcessPoolExecutor's queue without actually consuming items from it. While it *is* possible with ThreadPoolExecutor, I don't think we should expose it - the queue.Queue() implementation ThreadPoolExecutor relies on doesn't have a public API for inspecting its contents, so ThreadPoolExecutor probably shouldn't expose one, either. Identifying which task each worker is processing is possible, but would perhaps require more work than its worth, at least for ProcessPoolExecutor.

I do think adding worker count APIs is reasonable, and in-line with a TODO item in the ThreadPoolExecutor source:

# TODO(bquinlan): Should avoid creating new threads if there are more
# idle threads than items in the work queue.

So, at the very least there have been plans to internally keep track active/idle thread counts. If others agree it's a good idea, I'll open an issue on the tracker for this and include my patch (which also addresses that TODO item).

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Dan O'Reilly

unread,

Aug 25, 2014, 9:45:12 AM8/25/14

to Ram Rachum, Python-Ideas

Adding active/idle/total worker counts for both ThreadPoolExecutor and ProcessPoolExecutor is pretty straightforward; I just threw a patch together for both in 30 minutes or so. However, I don't think its possible to inspect the contents of a ProcessPoolExecutor's queue without actually consuming items from it. While it *is* possible with ThreadPoolExecutor, I don't think we should expose it - the queue.Queue() implementation ThreadPoolExecutor relies on doesn't have a public API for inspecting its contents, so ThreadPoolExecutor probably shouldn't expose one, either. Identifying which task each worker is processing is possible, but would perhaps require more work than its worth, at least for ProcessPoolExecutor.

I do think adding worker count APIs is reasonable, and in-line with a TODO item in the ThreadPoolExecutor source:

# TODO(bquinlan): Should avoid creating new threads if there are more

# idle threads than items in the work queue.

So, at the very least there have been plans to internally keep track active/idle thread counts. If others agree it's a good idea, I'll open an issue on the tracker for this and include my patch (which also addresses that TODO item).

On Sun, Aug 24, 2014 at 5:41 PM, Ram Rachum <ram.r...@gmail.com> wrote:

Guido van Rossum

unread,

Aug 25, 2014, 1:32:36 PM8/25/14

to Dan O'Reilly, Ram Rachum, Python-Ideas

Doesn't queue.Queue also have methods qsize(), empty() and full()? We could easily wrap those. There's always the caveat that the numbers may be out of date as soon as you print them.

--
--Guido van Rossum (python.org/~guido)

Antoine Pitrou

unread,

Aug 25, 2014, 1:36:14 PM8/25/14

to python...@python.org

Le 25/08/2014 09:44, Dan O'Reilly a écrit :
>
> So, at the very least there have been plans to internally keep track
> active/idle thread counts. If others agree it's a good idea, I'll open
> an issue on the tracker for this and include my patch (which also
> addresses that TODO item).

I agree that basic executor parameters could be reflected, and I also
agree that some other pieces of runtime state cannot be reliably
computed and therefore shouldn't be exposed.

Don't hesitate to open an issue with your patch.

Regards

Antoine.

Ram Rachum

unread,

Aug 25, 2014, 1:37:46 PM8/25/14

to python...@googlegroups.com, python-ideas

"some other pieces of runtime state cannot be reliably computed"

Can you please specify which ones you mean, and why not reliable?

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/pl3r5SsbLLU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Antoine Pitrou

unread,

Aug 25, 2014, 1:58:44 PM8/25/14

to python...@python.org

Le 25/08/2014 13:37, Ram Rachum a écrit :
> "some other pieces of runtime state cannot be reliably computed"
>
> Can you please specify which ones you mean, and why not reliable?

I cannot say for sure without taking a more detailed look at
concurrent.futures :-) However, any runtime information such as "the
tasks current being processes" (as opposed to, say, waiting) may not be
available to the calling thread or process, or may be unreliable once it
returns to the function's caller (since the actual state may have
changed in-between).

In the former case (information not available to the main process), we
can't expose the information at all; in the latter case, we may still
choose to expose it with the usual caveats in the documentation (exactly
like Queue.qsize()).

Ram Rachum

unread,

Aug 25, 2014, 2:17:17 PM8/25/14

to python...@googlegroups.com, python-ideas

Maybe I'm missing something, but I don't think that's something that should block implementation.

Information not available? Change the executor code to make that information available. Information could have been changed? So what? That is to be expected. When I read a file in Python, by the time the line finished someone could have written something to that file so the result of the read may not be current. Even if I read just a simple variable, by the next line it might have been changed by another thread. I really don't see why any of that deserves special consideration.

Antoine Pitrou

unread,

Aug 25, 2014, 2:45:41 PM8/25/14

to python...@python.org

Le 25/08/2014 14:16, Ram Rachum a écrit :
> Maybe I'm missing something, but I don't think that's something that
> should block implementation.
>
> Information not available? Change the executor code to make that
> information available.

Not if that would make the implementation much more complicated, or
significantly slower.

Guido van Rossum

unread,

Aug 25, 2014, 2:55:37 PM8/25/14

to Antoine Pitrou, Python-Ideas

It might be worth it to make the implementation somewhat more complicated if it serves a good purpose, for example giving the user of the program insights into how well the executor is performing. Without such insight you may be attempting to tune parameters (like the pool size) without being able to evaluate their effect.

Dan O'Reilly

unread,

Aug 25, 2014, 4:04:47 PM8/25/14

to Guido van Rossum, Antoine Pitrou, Python-Ideas

I'll take a look at this again tonight and see if more detailed information (e.g. which tasks are actually being processed) can be determined without too much added complexity and/or performance penalties. If I can come up with something reasonable for both ProcessPool/ThreadPool, I'll add it to the changes I've already made. Either way, I'll create an issue to track this.

Andrew Barnert

unread,

Aug 25, 2014, 4:06:02 PM8/25/14

to gu...@python.org, Antoine Pitrou, Python-Ideas

I don't think there's any issue with letting people introspect the executor. The problem is that the main thing you get is a queue, and there's a limit to how introspectable a queue can be.

In particular, if you want to iterate the waiting tasks, you have to iterate the queue, and there's no safe way to do that.

Since CPython's queue.Queue happens to be just a deque and a mutex, you could make it iterable at the cost of blocking all producers and consumers (which might be fine for many uses, like debugging or exploratory programming), or provide a snapshot API to return a copy of the deque.

But do you want to make that a requirement on all subclasses of Queue, and all other implementations' queue modules? Does ProirityQueue have to nondestructively iterate a heap in order? Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib? What does multiprocessing.Queue do on each implementation?

I don't think the costs are worth the benefit. And I assume that's why the existing queue API doesn't provide an iteration or snapshot mechanism.

But there's an option that might be worth doing:

Provide a queue.IntrospectableQueue type that _is_ defined to have such a mechanism, but to otherwise work like a Queue (except maybe less efficiently). Then provide an optional parameter for the Executor that lets you specify an alternate queue constructor in place of the default. So, when exploring or debugging, you could pass queue.IntrospectableQueue (or multiprocessing.IntrospectableQueue for ProcessPoolExecutor).

Whether the interface is "lock_and_return_iterator" or "snapshot", this would be trivial to implement in CPython, and other Pythons could just copy the CPython version instead of extending their native queue types.

Sent from a random iPhone

Ram Rachum

unread,

Aug 25, 2014, 5:07:40 PM8/25/14

to python...@googlegroups.com, Python-Ideas

Sounds good to me. Having to specify `IntrospectableQueue` to the executor is a bit of a chore, but not too bad to get this functionality. I also bet that the performance difference wouldn't be an issue for most uses.

---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/pl3r5SsbLLU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.

Dan O'Reilly

unread,

Aug 25, 2014, 10:52:53 PM8/25/14

to Ram Rachum, Python-Ideas, python...@googlegroups.com

The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though. Adding support for identifying which tasks are active introduces some extra overhead, which I think can reasonably be made optional. If we're going to use a different Queue class to enable introspection, we might as well disable the other stuff that we're doing to make introspection work. It also makes it easier to raise an exception if an API is called that won't work without IntrospectableQueue being used.

>> Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib?

For what it's worth, Jython just uses CPython's queue.Queue implementation, as far as I can tell.

>> What does multiprocessing.Queue do on each implementation?

In addition to a multiprocessing.Queue, the ProcessPoolExecutor maintains a dict of all submitted work items, so that can be used instead of trying to inspect the queue itself.

Ethan Furman

unread,

Aug 25, 2014, 11:03:09 PM8/25/14

to python...@python.org

On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
>
> The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar)
> keyword to the Executor rather than a queue class, though.

Passing the class is the better choice -- it means that future needs can be more easily met by designing the queue
variant needed and passing it in -- having a keyword to select only one option is unnecessarily limiting.

--
~Ethan~

Dan O'Reilly

unread,

Aug 25, 2014, 11:32:25 PM8/25/14

to Ethan Furman, Python-Ideas

In that case, what's the best way to disallow use of APIs that require an introspectable queue implementation? Using isinstance(self._work_queue, IntrospectableQueue) would work, but seems to be nearly as limiting as using an introspectable keyword. Perhaps IntrospectableQueue could support __iter__ as a way of iterating over a snapshot of enqueued items - The Executor could try iterating over the queue when it needs to inspect its contents, raising an appropriate exception (something like "Provided queue class must be introspectable") if that fails. If people prefer __iter__ isn't used for that purpose, we could just do the same thing with whatever public method ends up being used to get the snapshot instead.

Antoine Pitrou

unread,

Aug 25, 2014, 11:44:22 PM8/25/14

to python...@python.org

Le 25/08/2014 23:02, Ethan Furman a écrit :
> On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
>>
>> The IntrospectableQueue idea seems reasonable to me. I think I would
>> prefer passing an introspectable (or similar)
>> keyword to the Executor rather than a queue class, though.
>
> Passing the class is the better choice -- it means that future needs can
> be more easily met by designing the queue variant needed and passing it
> in -- having a keyword to select only one option is unnecessarily limiting.

What if an implementation wants to use something other than a queue?
It seems you're breaking the abstraction here.

Regards

Antoine.

Andrew Barnert

unread,

Aug 25, 2014, 11:52:54 PM8/25/14

to Dan O'Reilly, Ram Rachum, python...@googlegroups.com, Python-Ideas

On Monday, August 25, 2014 7:52 PM, Dan O'Reilly <orei...@gmail.com> wrote:

>The IntrospectableQueue idea seems reasonable to me. I think I would prefer passing an introspectable (or similar) keyword to the Executor rather than a queue class, though. Adding support for identifying which tasks are active introduces some extra overhead, which I think can reasonably be made optional. If we're going to use a different Queue class to enable introspection, we might as well disable the other stuff that we're doing to make introspection work. It also makes it easier to raise an exception if an API is called that won't work without IntrospectableQueue being used.

Even though this was my suggestion, let me play devil's advocate for a second…

The main reason to use this is for debugging or exploratory programming.

In the debugger, of course, it's not necessary, because you can just break and suspend all the threads while you do what you want. Would it be reasonable to do the same thing outside the debugger, by providing a threading.Thread.suspend API (and of course the pool and executor APIs have a suspend method that suspends all their threads) so you can safely access the queue's internals?

Obviously suspending threads in general is a bad thing to do unless you're a big fan of deadlocks, but for debugging and exploration it seems reasonable; if a program occasionally deadlocks or crashes while you're screwing with its threads to see what happens, well, you were screwing with its threads to see what happens…

That might be a horrible attractive nuisance, but if you required an extra flag to be passed in at construction time to make these methods available, and documented that it was unsafe and potentially inefficient, it might be acceptable.

On the other hand, it's hard to think of a case where this is a good answer but "just run it in the debugger" isn't a better answer…

>>> Does Jython have to use a mutex and a deque instead of a more efficient (and possibly lock-free) queue from the Java stdlib?
>
>For what it's worth, Jython just uses CPython's queue.Queue implementation, as far as I can tell.

Now that I think about it, that makes sense; if I really need a lock-free thread pool and queue in Jython, I'm probably going to use the native Java executors, not the Python ones, right?

>>> What does multiprocessing.Queue do on each implementation?
>
>In addition to a multiprocessing.Queue, the ProcessPoolExecutor maintains a dict of all submitted work items, so that can be used instead of trying to inspect the queue itself.

Interesting. This implies that supplying an inspectable queue class may not be the best answer here; instead, we could have an option for an inspectable work dict, which would just expose the existing one for ProcessPoolExecutor, while it would make ThreadPoolExecutor maintain an equivalent dict as a thread-local in the launching thread. (I'm assuming you only need to inspect the jobs from the launching process/thread here… I'm not sure if that's sufficient for the OP's intended use or not.)

Andrew Barnert

unread,

Aug 25, 2014, 11:58:54 PM8/25/14

to Antoine Pitrou, python...@python.org

On Monday, August 25, 2014 8:44 PM, Antoine Pitrou <ant...@python.org> wrote:

> > Le 25/08/2014 23:02, Ethan Furman a écrit :
>> On 08/25/2014 07:51 PM, Dan O'Reilly wrote:
>>>
>>> The IntrospectableQueue idea seems reasonable to me. I think I would
>>> prefer passing an introspectable (or similar)
>>> keyword to the Executor rather than a queue class, though.
>>
>> Passing the class is the better choice -- it means that future needs can
>> be more easily met by designing the queue variant needed and passing it
>> in -- having a keyword to select only one option is unnecessarily limiting.
>
> What if an implementation wants to use something other than a queue?
> It seems you're breaking the abstraction here.

A collection of threads and a shared queue is almost the definition of a thread pool. What else would you use?

Also, this could make it a lot easier to create variations on ThreadPoolExecutor without subclassing or forking it. For example, if you want your tasks to run in priority order, just give it a priority queue keyed on task.priority. If you want a scheduled executor, just give it a priority queue whose get method blocks until the first task's task.timestamp or a new task is added ahead of the first. And so on.

I'm not sure if that's a good idea or not, but it's an interesting possibility at least…

Guido van Rossum

unread,

Aug 26, 2014, 12:09:45 AM8/26/14

to Andrew Barnert, Antoine Pitrou, python...@python.org

Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case.

Antoine Pitrou

unread,

Aug 26, 2014, 12:15:13 AM8/26/14

to python...@python.org

Le 25/08/2014 23:56, Andrew Barnert a écrit :
>>
>> What if an implementation wants to use something other than a queue?
>> It seems you're breaking the abstraction here.
>
> A collection of threads and a shared queue is almost the definition of a thread pool. What else would you use?

Definitions don't necessarily have any relationship with the way a
feature is implemented. Perhaps some version of concurrent.futures would
like to use some advanced dispatch mechanism provided by the OS (or
shared memory, or whatever).

(I'll note that such "flexibility" has been chosen for the API of
threading.Condition and it is making it difficult to write an optimized
implementation that would you use OS-native facilities, such as pthread
condition variables)

We have come from a simple proposal to introspect some runtime
properties of an executor to the idea of swapping out a building block
with another. That doesn't sound reasonable.

Regards

Antoine.

Ethan Furman

unread,

Aug 26, 2014, 12:23:17 AM8/26/14

to python...@python.org

On 08/25/2014 09:08 PM, Guido van Rossum wrote:
>
> Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case.

I went in search of docs to see what the API actually was, and while I know the source code is a great place to go look
for education and finer points, should we have to go looking there just to see what the __init__ parameters are?

I'm going to go out on a limb and say that ThreadPoolExecutor takes a max_workers param, but I only have that because
it's in the example.

On the up side, having a link to the source is really cool. Having clicked on that I now know that max_workers is the
only param taken. ;)

--
~Ethan~

Ethan Furman

unread,

Aug 26, 2014, 12:28:15 AM8/26/14

to python...@python.org

On 08/25/2014 09:08 PM, Guido van Rossum wrote:
>

> Hm. Maybe we should not complicate the API after all. This seems a lot of theorizing without enough of a use case.

Introspection (aka debugging) is an important use case. Having looked at the code, and with Antoine's comments in mind,
I'd be happy with whatever Dan can get in there without changing the queuing implementation -- if anyone needs that much
flexibility, they can take the Python code and massage it to their own desires.

--
~Ethan~

Antoine Pitrou

unread,

Aug 26, 2014, 12:39:09 AM8/26/14

to python...@python.org

Le 26/08/2014 00:22, Ethan Furman a écrit :
>
> I went in search of docs to see what the API actually was, and while I
> know the source code is a great place to go look for education and finer
> points, should we have to go looking there just to see what the __init__
> parameters are?

So, you didn't find the docs?

https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor

"""
class concurrent.futures.ThreadPoolExecutor(max_workers)

An Executor subclass that uses a pool of at most max_workers
threads to execute calls asynchronously.
"""

https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor

"""
class concurrent.futures.ProcessPoolExecutor(max_workers=None)

An Executor subclass that executes calls asynchronously using a
pool of at most max_workers processes. If max_workers is None or not
given, it will default to the number of processors on the machine.
"""

Regards

Antoine.

Ethan Furman

unread,

Aug 26, 2014, 2:12:48 AM8/26/14

to python...@python.org

On 08/25/2014 09:37 PM, Antoine Pitrou wrote:
> Le 26/08/2014 00:22, Ethan Furman a écrit :
>>
>> I went in search of docs to see what the API actually was, and while I
>> know the source code is a great place to go look for education and finer
>> points, should we have to go looking there just to see what the __init__
>> parameters are?
>
> So, you didn't find the docs?
>
> https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor
>
> """
> class concurrent.futures.ThreadPoolExecutor(max_workers)
>
> An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously.
> """
>
> https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor
>
> """
> class concurrent.futures.ProcessPoolExecutor(max_workers=None)
>
> An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If
> max_workers is None or not given, it will default to the number of processors on the machine.
> """

I did find the docs, and even with your plain text guide I almost didn't see them when I looked just now. Too much
fancy going on there, and all the green examples -- yeah, it's hard to read.

For comparison, here's what help(ThreadPoolExecutor) shows:

Much easier to understand.

Looking at the docs again, I think the biggest hurdle to finding that line and recognizing it for what it is is the fact
that it comes /after/ all the examples. That's backwards. Why would you need examples for something you haven't read yet?

--
~Ethan~

Nick Coghlan

unread,

Aug 26, 2014, 5:50:28 AM8/26/14

to Ethan Furman, python...@python.org

On 26 Aug 2014 16:12, "Ethan Furman" <et...@stoneleaf.us> wrote:
> Looking at the docs again, I think the biggest hurdle to finding that line and recognizing it for what it is is the fact that it comes /after/ all the examples. That's backwards. Why would you need examples for something you haven't read yet?

Many of our module docs serve a dual purpose as a tutorial *and* as an API reference. That's actually a problem, and often a sign of a separate "HOWTO" guide trying to get out.

Actually doing the work to split them is rather tedious though, so it tends not to happen very often.

Cheers,
Nick.

Dan O'Reilly

unread,

Aug 26, 2014, 10:00:28 PM8/26/14

to Ethan Furman, Python-Ideas

As promised, I've opened issue22281 (http://bugs.python.org/issue22281), and attached a patch that makes an attempt at implementing this. Let's continue any further discussion on this topic there.

Reply all

Reply to author

Forward