Thread local variables in prange.

Daniil Pakhomov

unread,

Jul 2, 2015, 5:10:32 AM7/2/15

to cython...@googlegroups.com

Hello.

I would be very glad if someone will help me to solve this problem.
I want to use thread-local variables in a prange loop.

Why it's not possible to use cython.parallel.threadlocal like it's mentioned here
https://github.com/cython/cython/wiki/enhancements-prange ?

Is this the only possible workaround by now:
https://groups.google.com/forum/#!topic/cython-users/Ady-DdWu6rE ?

Thank you.

Jerome Kieffer

unread,

Jul 2, 2015, 7:25:06 AM7/2/15

to cython...@googlegroups.com

I agree this is a limitation ...

I often use a temporary shared array of size (nthread, output_size), then perform the merge within that array.

here is an example, a kind of histogram.

cdef int[:,:] tmp = numpy.zeros((numthreads, size2), int)
cdef int[:] out = numpy.zeros(size2, int)
for i in parallel.prange(size1, nogil=True):
threadid = parallel.threadid()
j = <int> data[i]
tmp[threadid, j] += 1
for j in parallel.prange(size2, nogil=True):
s = 0
for i in range(numthreads):
s = s + tmp[i, j]
out[j] += s

Note
* "s = 0" and "s = s + x" to enforce the thread locality (i.e. not shared)
* "out[j] +=" to enforce a parallel reduction so out is shared and not local

This works when the number of core is a few: there is a large overhead
with 2 cores or less and requires a lot of memory when there are too
many cores.

Cheers,

--
Jérôme Kieffer
tel +33 476 882 445

Daniele Nicolodi

unread,

Jul 2, 2015, 7:32:46 AM7/2/15

to cython...@googlegroups.com

I may be missing something, but what is the difference in memory usage
with respect to a solution that uses thread local storage? Each thread
would have to allocate its own tmp array with shape (size2, ), resulting
in exactly the same memory consumption.

Cheers,
Daniele

Jerome Kieffer

unread,

Jul 2, 2015, 11:51:42 AM7/2/15

to cython...@googlegroups.com

On Thu, 02 Jul 2015 13:32:42 +0200
Daniele Nicolodi <dan...@grinta.net> wrote:

> > This works when the number of core is a few: there is a large overhead
> > with 2 cores or less and requires a lot of memory when there are too
> > many cores.
>
> I may be missing something, but what is the difference in memory usage
> with respect to a solution that uses thread local storage? Each thread
> would have to allocate its own tmp array with shape (size2, ), resulting
> in exactly the same memory consumption.

you are right, it is the same.

Reply all

Reply to author

Forward