Help parallelization for CPU

schr...@iwm.mw.tu-dresden.de

unread,

Apr 23, 2018, 10:03:02 AM4/23/18

to Numba Public Discussion - Public

At a first step I sucessfully compiled the hot code of my program with numba.
This accelerated my program significantly. Thank you numba team!

At a second step I tried to accelerate the code further by parallelization (CPU).
The code shown below runs four times as long as the jitted single thread version.
But I expected it to be faster.
Is it possible with numba to sucessfully parallelize the code below?

from numpy import inf, array, int32, float64, amin, full
import numba as nb
from numba import prange

@jit(nopython=True, parallel=True)
def _iterate_parallel(
        capy_idxs, indptr, indices,  # read only, ndarray[int32]
        heat_src, capy,  # read only, ndarray[float64]
        condc,  # read only, ndarray[float64, float64]
        temp,  # read and write from all threads, ndarray[float64]
        dt):  # read only float64
    has_forced_static = False  # should be shared by all threads
    tau = full(capy_idxs.size, inf, dtype=float64)  # should be shared by all threads
    # the idea is to run only the following loop in parallel,
    # therefore use the prange function
    for c in prange(capy_idxs.size):
        # i, sum_condc, ap, b and j should be private inside prange
        i = capy_idxs[c]
        sum_condc = 0.
        ap = heat_src[i]
        # is this loop parallelized in the context of the loop above?
        for j in range(indptr[i], indptr[i + 1]):
            # removed the += reduction operator for the two lines below
            # because these are no reductions in the context of the prange loop
            sum_condc = sum_condc + condc[i, indices[j]]
            ap = ap + condc[i, indices[j]] * temp[indices[j]]
        if sum_condc == 0:
            continue
        if capy[i] == 0.:
            b = 1.
        else:
            tau[c] = capy[i] / sum_condc
            b = dt / tau[c]
            if b > 1.:
                b = 1.
                has_forced_static = True
        temp[i] = temp[i] - b*(temp[i] - ap/sum_condc)
    # temp array is also changed, must it also be returned?
    return amin(tau), has_forced_static

I ask, because the current parallelization mechanisms in numba are a bit unclear for me.
I found only one explained example in the docs.

Thanks S. Schroeder

Message has been deleted

schr...@iwm.mw.tu-dresden.de

unread,

Apr 23, 2018, 12:56:18 PM4/23/18

to Numba Public Discussion - Public

Here is the code with better syntax hihlighting:

from numpy import inf, array, int32, float64, amin, full
import numba as nb
from numba import prange

@nb.jit(nopython=True, parallel=True)

Ehsan Totoni

unread,

Apr 23, 2018, 9:20:27 PM4/23/18

to Numba Public Discussion - Public

With prange, the iteration instances are divided among threads to execute. The programmer needs to make sure the iterations don't have any parallelism "conflict" such as race conditions (e.g. writing to same memory location). For this code, variable `i`, which is `capy_idxs[c]`, is used for writing to `temp`. Depending on values of capy_idxs there could be conflicts.

schr...@iwm.mw.tu-dresden.de

unread,

Apr 24, 2018, 4:17:47 AM4/24/18

to Numba Public Discussion - Public

I know there are conflicts regarding `temp`. The result of `temp` is dependent on the processing order of `i`.
From the perspective of the algorithm, this is correct. The `temp` array should be shared for read and write between threads.
Does numba produce code that does this, regardless of the conflicts?

And why is the `parallel=True` version four times slower as the jitted single thread version with `parallel=False`?

Ehsan Totoni

unread,

Apr 24, 2018, 7:26:11 AM4/24/18

to Numba Public Discussion - Public

This could be a bug. Could you please open an issue with a full program to reproduce?

Reply all

Reply to author

Forward