Thread local variables in prange.

605 views
Skip to first unread message

Daniil Pakhomov

unread,
Jul 2, 2015, 5:10:32 AM7/2/15
to cython...@googlegroups.com
Hello.

I would be very glad if someone will help me to solve this problem.
I want to use thread-local variables in a prange loop.

Why it's not possible to use cython.parallel.threadlocal like it's mentioned here
https://github.com/cython/cython/wiki/enhancements-prange ?

Is this the only possible workaround by now:
https://groups.google.com/forum/#!topic/cython-users/Ady-DdWu6rE ?

Thank you.

Jerome Kieffer

unread,
Jul 2, 2015, 7:25:06 AM7/2/15
to cython...@googlegroups.com
I agree this is a limitation ...

I often use a temporary shared array of size (nthread, output_size), then perform the merge within that array.

here is an example, a kind of histogram.

cdef int[:,:] tmp = numpy.zeros((numthreads, size2), int)
cdef int[:] out = numpy.zeros(size2, int)
for i in parallel.prange(size1, nogil=True):
threadid = parallel.threadid()
j = <int> data[i]
tmp[threadid, j] += 1
for j in parallel.prange(size2, nogil=True):
s = 0
for i in range(numthreads):
s = s + tmp[i, j]
out[j] += s

Note
* "s = 0" and "s = s + x" to enforce the thread locality (i.e. not shared)
* "out[j] +=" to enforce a parallel reduction so out is shared and not local

This works when the number of core is a few: there is a large overhead
with 2 cores or less and requires a lot of memory when there are too
many cores.

Cheers,

--
Jérôme Kieffer
tel +33 476 882 445

Daniele Nicolodi

unread,
Jul 2, 2015, 7:32:46 AM7/2/15
to cython...@googlegroups.com
I may be missing something, but what is the difference in memory usage
with respect to a solution that uses thread local storage? Each thread
would have to allocate its own tmp array with shape (size2, ), resulting
in exactly the same memory consumption.

Cheers,
Daniele


Jerome Kieffer

unread,
Jul 2, 2015, 11:51:42 AM7/2/15
to cython...@googlegroups.com
On Thu, 02 Jul 2015 13:32:42 +0200
Daniele Nicolodi <dan...@grinta.net> wrote:

> > This works when the number of core is a few: there is a large overhead
> > with 2 cores or less and requires a lot of memory when there are too
> > many cores.
>
> I may be missing something, but what is the difference in memory usage
> with respect to a solution that uses thread local storage? Each thread
> would have to allocate its own tmp array with shape (size2, ), resulting
> in exactly the same memory consumption.

you are right, it is the same.
Reply all
Reply to author
Forward
0 new messages