Adding multithreading to PythonKit

rex-remind

unread,

Apr 22, 2020, 6:54:11 PM4/22/20

to Swift for TensorFlow

Following up from https://github.com/tensorflow/swift/issues/224

Hello, I'm investigating adding multithreading to PythonKit. My motivation is I have a high throughput data transformation engine written in Python and I want to explore moving the majority of the code incrementally over to Swift via PythonKit, and potentially, eventually have some ML system and/or Tensorflow as a backend to the system once we're mostly ported over to Swift. In the current system the main unit of concurrency is a Thread in Python. Given this, it would be necessary that calls to and from Swift and Python properly use the GIL and take advantage of multithreading.

Currently, PythonKit expects only Swift to Python calls, so at first I will focus my efforts there, though I will also ultimately need Python to Swift calls to work (possibly via PythonObjects wrapping closures).

From what I can tell, in order to properly use the GIL where some Outside-Lang calls Python, the Outside-Lang simply needs to lock the GIL while calling out to Python ops and then unlock it when returning to the Outside-Lang. This seems straight forward enough to add to PythonKit.

I can see a few options on how to get this integrated:
A) Make every call from Swift to Python Lock and Release the GIL by default.
B) Have an ENV var or some configuration which determines whether or not this application is multithreaded and then do (A) accordingly.
C) Have a special wrapper function / block that institutes Locking and Releasing the GIL for every call to Python.
D) Have a special wrapper function / block that Locks and Releases the GIL for the entirety of that block.

(B) seems strictly better than (A), unless we determine there to be no obvious reason to have non-multithreaded PythonKit.

Even given (B), we may still want either (C) or (D) or some other mechanism to use the GIL ad-hoc.

With (D), any thread locking the GIL will hold it potentially for longer intervals than it needs to, especially if there's Swift code intermingled with Python code during that lock interval. However, the GIL does time-out and force being temporarily relinquished so other Python threads have a chance to execute. So this might ultimately both be a simpler model and one that matches the GILs expected usage.

With (C), no Swift specific code will ever be executing and holding the GIL at the same time. However, if there's a segment of code with lots of bouncing between Swift and Python, retaining and releasing the GIL, this could possibly cause some bottleneck to performance.

I'm really hoping to source others ideas and opinions here so we can land on the best possible model. Thanks!

Dave Abrahams

unread,

Apr 28, 2020, 11:20:07 AM4/28/20

to rex-remind, Swift for TensorFlow

Hi there;

I've been planning to respond to this. I'm a little underwater at the moment but will try to follow up in the next few days. Sorry for the delay.

-Dave

--
To unsubscribe from this group and stop receiving emails from it, send an email to swift+un...@tensorflow.org.

--

-Dave

rex-remind

unread,

Apr 28, 2020, 1:10:22 PM4/28/20

to Swift for TensorFlow, R...@remind101.com

Thank you, appreciate it.

To unsubscribe from this group and stop receiving emails from it, send an email to sw...@tensorflow.org.

--
-Dave

Dave Abrahams

unread,

Apr 30, 2020, 4:55:21 PM4/30/20

to rex-remind, Swift for TensorFlow

On Wed, Apr 22, 2020 at 3:54 PM rex-remind <R...@remind101.com> wrote:

Following up from https://github.com/tensorflow/swift/issues/224

Hello, I'm investigating adding multithreading to PythonKit. My motivation is I have a high throughput data transformation engine written in Python and I want to explore moving the majority of the code incrementally over to Swift via PythonKit, and potentially, eventually have some ML system and/or Tensorflow as a backend to the system once we're mostly ported over to Swift. In the current system the main unit of concurrency is a Thread in Python. Given this, it would be necessary that calls to and from Swift and Python properly use the GIL and take advantage of multithreading.

I don't know how much you know about this, so apologies if I'm just telling you stuff you've already got figured out…

Another option that steps around the GIL issue is to use process-based parallelism in Python (e.g. https://docs.python.org/3/library/concurrent.futures.html, https://docs.python.org/3/library/multiprocessing.shared_memory.html). Whether it's possible to make that work for your particular code is unknown to me.

As https://opensource.com/article/17/4/grok-gil explains very well, Python multithreading will never speed up compute that is written in Python. If python has to wait for I/O or long-running compute written in Swift, you can gain some performance by allowing other Python threads to run.

Are you on Python 2 or 3?

Currently, PythonKit expects only Swift to Python calls, so at first I will focus my efforts there, though I will also ultimately need Python to Swift calls to work (possibly via PythonObjects wrapping closures).

Shouldn't be too hard to add…

From what I can tell, in order to properly use the GIL where some Outside-Lang calls Python, the Outside-Lang simply needs to lock the GIL while calling out to Python ops and then unlock it when returning to the Outside-Lang. This seems straight forward enough to add to PythonKit.

Yes. Once you have both directions going, you'll have outside-lang calling python calling outside-lang, and you'll want a withUnlockedGIL { ... } construct to unblock python threads during long-running Swift operations

https://wiki.python.org/moin/boost.python/HowTo#Multithreading_Support_for_my_function

I can see a few options on how to get this integrated:
A) Make every call from Swift to Python Lock and Release the GIL by default.

Is the GIL a recursive mutex? If not, this could be a problem when your Swift code has been entered from Python and it calls back into Python.

I'm still just googling around for info; I found this: https://github.com/pybind/pybind11/issues/1276 which indicates that how to handle this may be an open question.

B) Have an ENV var or some configuration which determines whether or not this application is multithreaded and then do (A) accordingly.
C) Have a special wrapper function / block that institutes Locking and Releasing the GIL for every call to Python.
D) Have a special wrapper function / block that Locks and Releases the GIL for the entirety of that block.

(B) seems strictly better than (A), unless we determine there to be no obvious reason to have non-multithreaded PythonKit.

Well, if (A) just works and has no performance impact, I'd say it's strictly better.

Even given (B), we may still want either (C) or (D) or some other mechanism to use the GIL ad-hoc.

You'll definitely want both withLockedGIL { ... } (at least internally) and withUnlockedGIL { ... } (public) constructs.

With (D), any thread locking the GIL will hold it potentially for longer intervals than it needs to, especially if there's Swift code intermingled with Python code during that lock interval. However, the GIL does time-out and force being temporarily relinquished so other Python threads have a chance to execute.

Not unless the Python interpreter is running. If you are in Swift called from Python, and you haven't explicitly unlocked the GIL, the interpreter is stopped waiting for you and no other Python threads will execute no matter how long you hold the GIL.

--

-Dave

rex-remind

unread,

May 1, 2020, 8:15:13 PM5/1/20

to Swift for TensorFlow, R...@remind101.com

Thanks for the follow up!

> Another option that steps around the GIL issue is to use process-based parallelism in Python.

> Python multithreading will never speed up compute that is written in Python.

Our system relies heavily on a DB for any shared memory access and pretty much all the CPU intensive work is on the DB itself, so not Python. It just needs to pop into Python to acquire DB locks, read, transform, and write as well as coordinate operations from a high level. This system in general has proved to be only memory and IO intensive, not CPU. Since we spend little time in Python itself we gain in horizontal scalability by just increasing thread count, up to a point, and then adding instances (which so far we've only needed for redundancy). Given how well this has worked so far I don't think we'll diverge from this model currently.

What I value with Swift is simply better, cleaner, type safe code, which can help us generalize the system more quickly while lowering maintenance burdens in the long run - Python has turned out to be difficult to work with at this code base's size and complexity. Also, I can see an exciting future where we leverage Swift auto-diff / Tensorflow as well as we generalize what we've built.

(And possibly we'll gain on memory performance vs Python too? Though not a critical factor to this right now.)

> Are you on Python 2 or 3?

Python 3.7.5 currently.

> Once you have both directions going, you'll have outside-lang calling python calling outside-lang, and you'll want a withUnlockedGIL { ... }

Ah, indeed!

> Is the GIL a recursive mutex?

Good call out! I'm not sure and I'll read your posted link and more about it. Looking at this doc https://docs.python.org/3/c-api/init.html?highlight=pygilstate_check#c.PyGILState_Ensure it seems like one can acquire the GIL recursively as long as each returned value from PyGILState_Ensure gets its own call to PyGILState_Release.

> Well, if (A) just works and has no performance impact, I'd say it's strictly better.

Yup, you're right 👍. My only concern was if acquiring / releasing the GIL in of itself impacts performance in a significant way on non-multithreaded applications but it's possibly negligible.

> You'll definitely want both withLockedGIL { ... } (at least internally) and withUnlockedGIL { ... } (public) constructs.

Agreed

> Not unless the Python interpreter is running.

Yes, right again. I hadn't thought that far ahead but this is important to consider.

I think this is enough to get me started on a first pass of an implementation. I'll follow up on anything else I discover on recursive locks and if you find anything yourself I'll much appreciate the additional input.

Thanks!

Dave Abrahams

unread,

May 4, 2020, 12:27:46 PM5/4/20

to rex-remind, Swift for TensorFlow

On Fri, May 1, 2020 at 5:15 PM rex-remind <R...@remind101.com> wrote:

Our system relies heavily on a DB for any shared memory access and pretty much all the CPU intensive work is on the DB itself, so not Python. It just needs to pop into Python to acquire DB locks, read, transform, and write as well as coordinate operations from a high level. This system in general has proved to be only memory and IO intensive, not CPU. Since we spend little time in Python itself we gain in horizontal scalability by just increasing thread count, up to a point, and then adding instances (which so far we've only needed for redundancy). Given how well this has worked so far I don't think we'll diverge from this model currently.

Allow me to just point out that in addition to GIL performance concerns, you'll want to account for any costs to the programming model implied by your need to manage it. I'm not trying to say you shouldn't use threads, just that you should account for all relevant factors in making a choice.

Best of luck, and LMK if there's anything else I can do to help!

--

-Dave

Reply all

Reply to author

Forward