I have written some simple code to loop over the rows of a matrix, compute the sum of the columns of each row, and return the result. The functionality should be identical to np.sum(matrix, axis=1), except I am trying to use openMP to parallelize the outer loop over the rows. My code is included below.
The error I get is "Cannot read reduction variable in loop body". I recognize this as an openMP-specific error, and I have run a sanity check on this by removing the call to prange and the code operates as expected. I'm sure the error must be very basic since the code is so simple, but I do not see the cause when trying the parallel version. Many thanks in advance for helping with what is probably a very silly mistake.
from cython.parallel import prange, parallel
import numpy as np
cimport numpy as cnp
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def parallel_sum_over_columns(cnp.float64_t[:, :] matrix):
cdef int m = matrix.shape[0]
cdef int n = matrix.shape[1]
cdef int i, j
cdef cnp.float64_t accumulator = 0.
cdef cnp.float64_t[:] result = np.zeros(m, dtype='float64')
with nogil, parallel(num_threads=8):
for i in prange(m, schedule='dynamic'):
accumulator = 0.
for j in range(n):
accumulator += matrix[i, j]
result[i] = accumulator
return np.array(result)