Bob and Multithreading

55 views
Skip to first unread message

Manuel Günther

unread,
Oct 20, 2015, 3:22:00 PM10/20/15
to bob-devel
I just stumbled over the fact that we usually cannot run most of Bob's functionality in parallel threads.
I understand that pure Python code won't run in parallel. Even though you can create several threads (for example using the multithreading module, or even better, with the multiprocessing.pool.ThreadPool), there will be only one thread active at a time.

I remember that -- during Bob 1 times -- I have talked to Andre about this.
He told me that there is something called the "Global Interpreter Lock", which a pure Python thread needs to obtain to be able to run.
However, C++ code can be run in parallel -- as long as it is thread safe.
Now that we are using the Pyhon C-API to bind our C++ code, the interface for that seems to be pretty simple:
https://docs.python.org/2/c-api/init.html#thread-state-and-the-global-interpreter-lock

Py_BEGIN_ALLOW_THREADS
... Do some blocking I/O operation ...
Py_END_ALLOW_THREADS

I wonder if we should starting to use this mechanism to make our code multi-threadable.
However, we need to make sure that the code is actually thread safe (which is not the case for all functionality, as sometimes classes use internally cached variables).
Maybe, adding special nose tests for that would be great.

Please share your opinion about multi-threading in Bob.

Manuel Günther

unread,
Oct 20, 2015, 3:32:20 PM10/20/15
to bob-devel
Just to show, how easy multi-threading with the multiprocessing.pool.ThreadPool is,here is some code that I have recently used to parallelize a list comprehension:

import multiprocessing
pool
=
multiprocessing.pool.ThreadPool(8) #number of parallel threads

def run_on_element(data):
  ... run some code
  return result

list_of_elements = [...]
processed_data = pool.map(run_on_element, list_of_elements)

# equal to:
processed_data = [run_on_element(element) for element in list_of_elements]

I have used that code to train a list of SVM's in scipy, and apparently, scipy implements the unlocking well, as I got a speedup of approximately 8 (using 8 cores).

@Tiago: I remember that you were thinking about getting the GMM training in parallel using multi-threading. Maybe you can have a look at the code above. Let me know, if it helps in any way.

André Anjos

unread,
Oct 20, 2015, 4:06:11 PM10/20/15
to bob-...@googlegroups.com
Hello,

Multi-threading is not an easy thing to master. Synchronisation problems are particularly painful to debug and not always reproducible.

That said, using the macros you have mentioned, one can unlock the Python GIL alright.

If your code is pure C/C++ and does not, in any way, call back the Python API, you're good.

If your code depends on the Python API or objects generated in Python, then care must the taken to ensure that none of the objects being used by your unlocked code will vanish because the interpreter just decided. One common way is by re-acquiring the GIL for those interactions with Python objects.

Some consequences from my mind:

1. Logging at Bob is diverged to python logging. If I remember correctly, I re-acquire the lock before emitting any message, but one must observe that, if the code calls the logging module too often and logging is diverged to Python logging, then in practice this will hit your potential speed-up to the point MT becomes useless.

2. Our blitz-numpy bridge sometimes produces blitz::Arrays which point to numpy array memory. For efficiency reasons, we avoid data copying when we can. If the array ceases to exist while the thread is running, this may lead to unpredictable crashes.

3. You may not call any Py* or PyArray* or PyBlitzArray* while you're at the C++ code. These all require the GIL to be active.

There you are, my 2 cents on this subject!

A

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dr. André Anjos
Idiap Research Institute
Centre du Parc - rue Marconi 19
CH-1920 Martigny, Suisse
Phone: +41 27 721 7763
Fax: +41 27 721 7712
http://andreanjos.org

Tiago Freitas Pereira

unread,
Oct 22, 2015, 3:31:05 AM10/22/15
to bob-...@googlegroups.com
Hi Manuel,

Actually I was training a GMM using multiprocessing with mpi4py. 
I implemented a satellite package for that purpose. Still needs an update for the bob 2.0, but it is here (https://github.com/tiagofrepereira2012/parallel_trainers).

Cheers


On Tue, Oct 20, 2015 at 9:32 PM, Manuel Günther <siebe...@googlemail.com> wrote:

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Manuel Günther

unread,
Sep 30, 2016, 1:46:36 PM9/30/16
to bob-devel
Hi,

sorry for digging out such an old thread, but it seems that this question has come up in some new package (again, it is bob.learn.em).

So, Andre, when I understand you correctly, pure C++ code that works on read-only blitz::Array's that have been casted from numpy.ndarrays are thread safe -- as long as no internally cached memory is used. 
Hence, when we are sure that our pure C++ code is thread safe otherwise, we might use the above-mentioned code block to allow running pure C++ code to run in a separate thread, e.g.:

PyObject* my_binding_function(PyObject* self, ...){
BOB_TRY
 
// get parameters
 
...
 
Py_BEGIN_ALLOW_THREADS
    self->run_function(...)
 
Py_END_ALLOW_THREADS

 
// turn output into python types
 
...

 
return ...
BOB_CATCH_MEMBER
}


This might dramatically speed up some of our processes, e.g., being able to handle several images in parallel threads.

So, I would definitely say that we should move forward to include this in functions that are computationally heavy. 
Note that we should not add it for small functions, as the overhead of releasing and acquiring the GIL might take longer than the function runs.

One option would be to introduce a boolean parameter to the Python function (something like ``allow_thread `` that is by default false) that allows users to run the function in parallel threads. This would make the code most flexible, but would require larger modifications.

What do you think?
Reply all
Reply to author
Forward
0 new messages