Unfortunately, yes. What we should have is block-local declarations, like
for i in prange(...):
cdef double psum = ...
To declare something truly private and not a reduction. Currently, you
will need to do something horrible like this:
from cython.parallel cimport parallel, prange, threadid
from libc.stdlib cimport malloc, free
cimport openmp
cdef Py_ssize_t i, j
cdef double *psum, *sum
psum = <double *> malloc(sizeof(double) * openmp.omp_get_max_threads() * 32)
with nogil, parallel():
sum = psum + 32 * threadid()
for i in prange(m):
sum[0] = 0
for j in range(n):
sum[0] += f(j)
func(..., sum[0], ...)
free(psum)
The multiplication with 32 is to avoid false sharing (assuming your
cache lines aren't bigger than 256 bytes), another reason why
block-local declarations would be much nicer here.
Another approach *may* work in your case:
with parallel:
for i in range(...)
# NOTE: range, not prange
Dag
You'd have to adjust the loop bounds to implement work sharing in that
case, and the inplace operator would still specify a reduction. In
fact, I get the error 'Reductions not allowed for parallel blocks',
although I think they should be allowed (at some point it was
considered "too much magic" for some reason?).