Multicore programming with cython

1,663 views
Skip to first unread message

Teemu Ikonen

unread,
Jun 4, 2010, 5:16:51 PM6/4/10
to cython-users
Hi list,

I have a simple, trivially parallelizable function (summing some
elements of a Numpy array) which I would like to make as fast as
possible. What is the best way to utilize multiple cores with cython
currently?

I have no experience on Python threading or multiprocessing packages,
would they be the way to go, or is there an option more specific to
cython? On the other hand, my function is so simple that writing it in
pure C with OpenMP (or similar low-level API) and calling it from
cython would not be a major headache.

Best,

Teemu

gabriele.lanaro

unread,
Jun 4, 2010, 6:46:14 PM6/4/10
to cython-users
My two cents:

I would suggest you to use the multiprocessing module in python 2.6
(not cython related), expecially the pool class:

from multiprocessing import Pool

def f(x):
return x*x

if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)"
asynchronously
print result.get(timeout=1) # prints "100" unless your
computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"

Anyway, I think that some numpy operations are yet parallelized (I've
just heard about this).

Lisandro Dalcin

unread,
Jun 4, 2010, 9:26:44 PM6/4/10
to cython...@googlegroups.com
On 4 June 2010 18:16, Teemu Ikonen <tpik...@gmail.com> wrote:
>
> On the other hand, my function is so simple that writing it in
> pure C with OpenMP (or similar low-level API) and calling it from
> cython would not be a major headache.
>

This sounds good, and it'll likely be the fastest way.

--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169

Hoyt Koepke

unread,
Jun 4, 2010, 9:25:51 PM6/4/10
to cython...@googlegroups.com
> from multiprocessing import Pool
>
> def f(x):
>    return x*x
>
> if __name__ == '__main__':
>    pool = Pool(processes=4)              # start 4 worker processes
>    result = pool.apply_async(f, [10])     # evaluate "f(10)"
> asynchronously
>    print result.get(timeout=1)           # prints "100" unless your
> computer is *very* slow
>    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"
>

This would work, but in this case (and using cython) python's
multiprocessing module might be a bit more effort / featureful than
you want. I might suggest just writing a quick C function putting in
an open MP directive, and calling that. If you mark it as static
inline, put it in a C separate header file and include that from
cython, it would be pretty painless.

The advantage is that openmp is trivial to use if you parallelize
loops in C/C++. Just add in "#pragma omp parallel for" in front of
your loop and most of the time you're done.

The disadvantage is that it's possible that your process is memory
bound -- simply summing stuff often is -- in which case parallelizing
it might not help. Also, not every compiler supports openmp (gcc 4.3
and up do, but it's not standard before that), so you might have extra
issues there.

--Hoyt

++++++++++++++++++++++++++++++++++++++++++++++++
+ Hoyt Koepke
+ University of Washington Department of Statistics
+ http://www.stat.washington.edu/~hoytak/
+ hoy...@gmail.com
++++++++++++++++++++++++++++++++++++++++++

sturlamolden

unread,
Jul 4, 2010, 1:50:19 PM7/4/10
to cython-users

> I have a simple, trivially parallelizable function (summing some
> elements of a Numpy array) which I would like to make as fast as
> possible. What is the best way to utilize multiple cores with cython
> currently?

You have several options, which one i better I cannot say:

- You can use Python threads. In Cython you can release the GIL using
"with nogil", so that is not an issue. Python threads are native
threads.

- You can use multiprocessing, but the advantage over Python threads
is not there unless you manipulate Python objects all the time.

- You can use OpenMP in C, C++ or Fortran, and call this from Cython.
This is what I usually do. It better to use OpenMP than messing with
threads manually.

- You can use MPI, either directly from Cython (declare the C API you
need) or mpi4py.

- You can use OpenCL, if your processor has a driver for it.

- Use your OS' threading facilities, and use cdefs.


Sturla
Reply all
Reply to author
Forward
0 new messages