--
To unsubscribe from this group and stop receiving emails from it, send an email to swift+un...@tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to sw...@tensorflow.org.
---Dave
Following up from https://github.com/tensorflow/swift/issues/224Hello, I'm investigating adding multithreading to PythonKit. My motivation is I have a high throughput data transformation engine written in Python and I want to explore moving the majority of the code incrementally over to Swift via PythonKit, and potentially, eventually have some ML system and/or Tensorflow as a backend to the system once we're mostly ported over to Swift. In the current system the main unit of concurrency is a Thread in Python. Given this, it would be necessary that calls to and from Swift and Python properly use the GIL and take advantage of multithreading.
Currently, PythonKit expects only Swift to Python calls, so at first I will focus my efforts there, though I will also ultimately need Python to Swift calls to work (possibly via PythonObjects wrapping closures).
From what I can tell, in order to properly use the GIL where some Outside-Lang calls Python, the Outside-Lang simply needs to lock the GIL while calling out to Python ops and then unlock it when returning to the Outside-Lang. This seems straight forward enough to add to PythonKit.
I can see a few options on how to get this integrated:
A) Make every call from Swift to Python Lock and Release the GIL by default.
B) Have an ENV var or some configuration which determines whether or not this application is multithreaded and then do (A) accordingly.
C) Have a special wrapper function / block that institutes Locking and Releasing the GIL for every call to Python.
D) Have a special wrapper function / block that Locks and Releases the GIL for the entirety of that block.
(B) seems strictly better than (A), unless we determine there to be no obvious reason to have non-multithreaded PythonKit.
Even given (B), we may still want either (C) or (D) or some other mechanism to use the GIL ad-hoc.
With (D), any thread locking the GIL will hold it potentially for longer intervals than it needs to, especially if there's Swift code intermingled with Python code during that lock interval. However, the GIL does time-out and force being temporarily relinquished so other Python threads have a chance to execute.
PyGILState_Ensure
gets its own call to PyGILState_Release.
Our system relies heavily on a DB for any shared memory access and pretty much all the CPU intensive work is on the DB itself, so not Python. It just needs to pop into Python to acquire DB locks, read, transform, and write as well as coordinate operations from a high level. This system in general has proved to be only memory and IO intensive, not CPU. Since we spend little time in Python itself we gain in horizontal scalability by just increasing thread count, up to a point, and then adding instances (which so far we've only needed for redundancy). Given how well this has worked so far I don't think we'll diverge from this model currently.