CPU core usage

38 views
Skip to first unread message

Francesco Tuveri

unread,
Sep 22, 2016, 10:23:52 AM9/22/16
to bob-devel
Hi,
I'm training some GMMs (bob.learn.em.GMMMachine) using the EM trainer (bob.learn.em.ML_GMMTrainer) on a machine with 32 cores, but it looks like bob is able to use only one of them. However, if for example I run a numpy.dot() with a script or in the interactive shell, I can see that all the available cores are used. Is this a normal behaviour, or I can do simething like recompiling bob? Thanks.

Manuel Günther

unread,
Sep 22, 2016, 12:07:39 PM9/22/16
to bob-devel
Indeed, there is currently no way of parallelizing the GMM training using multithreading, multiprocessing or OpenMP. Currently, we use GridTK (https://pypi.python.org/pypi/gridtk) for parallelization on the Python side. For example, we use GridTK in the parallel implementation inside bob.bio.gmm: https://pypi.python.org/pypi/bob.bio.gmm

However, theoretically there is nothing to stop you from using multiprocessing for the training. For example, you might want to rewrite a parallelized version of the ``bob.learn.em.train`` function: https://gitlab.idiap.ch/bob/bob.learn.em/blob/master/bob/learn/em/train.py#L12 following the idea of https://gitlab.idiap.ch/bob/bob.bio.gmm/blob/master/bob/bio/gmm/tools/gmm.py to parallelize the e-steps in both KMeans and GMM training (which is actually done using the same function).

Hmm... maybe I find some time to implement that for you. It should not be too complicated (at least not for me, knowing all the internals of bob).

Cheers
Manuel

Tiago Freitas Pereira

unread,
Sep 22, 2016, 2:08:23 PM9/22/16
to bob-...@googlegroups.com
I've done something in the past using MPI https://github.com/tiagofrepereira2012/parallel_trainers

I developed the GMM, TV Matrix (i-vector) and the U Matrix (ISV) training.

Maybe we can revamp it.

cheers

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago
Message has been deleted

Manuel Günther

unread,
Sep 22, 2016, 5:25:50 PM9/22/16
to bob-devel
Hmm... with multiprocessing it apparently is not that simple than I thought. The problem is that Python's ``multiprocessing`` module requires all objects to be pickle'able, which is not the case for classes implemented in C++. Hence, we cannot use the ``multiprocessing`` module here that simple.

Tiago's version uses MPI, which might be a better option. However, it looks relatively complicated. 
Maybe getting it to run with OpenMP on the C++ side might be an easier choice. We could add a ``BOB_USE_OPENMP`` flag to compile with OpenMP. 
I have lately used OpenMP in another project. However, I would need to have a closer look on how to use it in Bob, for example, to see which of the memory is actually used shared in the C++ functions.

Manuel

Tiago Freitas Pereira

unread,
Sep 23, 2016, 3:40:15 AM9/23/16
to bob-...@googlegroups.com
Hey Manuel,

I think you don't need to go that far.
Ok, our python objects are not pickleable, but the data is (numpy array that stores GMMMachine and GMMStats information).

For instance you can do this map-reduce(EM) transmiting numpy arrays between the processes (like I did with mpi4py).
I will try to sketch something with python multiprocessing.



--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Francesco Tuveri

unread,
Sep 23, 2016, 9:06:57 AM9/23/16
to bob-devel
Thanks for your help, but don't worry, I can manage a map-reduce implementation of the EM steps. My main concern was to find out if there was some out-of-the-box optimization available.

Manuel Günther

unread,
Sep 23, 2016, 1:55:50 PM9/23/16
to bob-devel
Dear Francesco,

I think one of the limitations of Bob is currently that it is not able to use the CPU if several cores are available. A better implementation would be great.

@tiago: I have created a branch "multiprocessing" in bob.learn.em, where I have put together a small first trial to run in parallel, including a test case. Unfortunately, it doesn't work properly yet, because I don't handle the data correctly yet. I will leave this to you.

Cheers
Manuel

Tiago Freitas Pereira

unread,
Sep 26, 2016, 3:16:28 PM9/26/16
to bob-...@googlegroups.com
Hi Manuel,

Thanks for the branch, I'll take care of it as soon as possible.

Cheers

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Amir Mohammadi

unread,
May 7, 2019, 4:47:28 AM5/7/19
to bob-devel
Hi everyone,

We finally have a working multi core implementation in bob.learn.em thanks to Tiago and Manuel.
Right now only K-Means and GMM ML and MAP trainings are supported.

Please go ahead and test it:

Here is the instructions to test it:

conda create -n bdt -c https://www.idiap.ch/software/bob/conda/label/beta -c https://www.idiap.ch/software/bob/conda bob.devtools
conda activate bdt
git clone https://gitlab.idiap.ch/bob/bob.learn.em.git
cd bob.learn.em
git checkout -b multiprocessing origin/multiprocessing
bdt create multicoregmm
conda activate multicoregmm
# if you want more packages in this environment just conda/pip install them.
buildout
bin/nosetests -sv # make sure tests are passing
# The way to run in multi core mode is to provide a
# ThreadsPool to the bob.learn.em.train function
# for an example of normal training and see the tests for an example of making it parallel:

If you find bugs, you can report them here or if you want to contribute, you can open a pull request on our mirror in github:
https://github.com/bioidiap/bob.learn.em/tree/multiprocessing

Bob's Gaussian Mixture Models (GMM) implementation is one of the most robust implementations of GMMs out there.
Now thanks to bob.bio.gmm (parallel in grid) and bob.learn.em (parallel in cores), you can
train a GMM using multiple machines and multiple cores.

Best,
Amir

On Mon, Sep 26, 2016 at 9:16 PM Tiago Freitas Pereira <tiagofr...@gmail.com> wrote:
Hi Manuel,

Thanks for the branch, I'll take care of it as soon as possible.

Cheers
On Fri, Sep 23, 2016 at 7:55 PM, 'Manuel Günther' via bob-devel <bob-...@googlegroups.com> wrote:
Dear Francesco,

I think one of the limitations of Bob is currently that it is not able to use the CPU if several cores are available. A better implementation would be great.

@tiago: I have created a branch "multiprocessing" in bob.learn.em, where I have put together a small first trial to run in parallel, including a test case. Unfortunately, it doesn't work properly yet, because I don't handle the data correctly yet. I will leave this to you.

Cheers
Manuel

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/

---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Tiago

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/

---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages