Optimizing code for Numba

Juan Nunez-Iglesias

未讀,

2016年11月1日凌晨1:19:152016/11/1

收件者：numba...@continuum.io

Hi,

I'm trying to write some numba code that's a little bit like an image convolution: step over an image, look at the current pixel's neighbors, and for each distinct neighbor, add their values to some COO indices. Here's the code:

@numba.jit(nopython=True, cache=True, nogil=True)
def write_pixel_graph(image, indices, steps, distances, row, col, data):
    """Step over `image` to build a graph of nonzero pixel neighbors.

    Parameters
    ----------
    image : int array
        The input image.
    indices : int array
        The raveled indices into `image` containing nonzero entries.
    steps : int array, shape (N,)
        The raveled index steps to find a pixel's neighbors in `image`.
    distances : float array, shape (N,)
        The euclidean distance from a pixel to its corresponding
        neighbor in `steps`.
    row : int array
        Output array to be filled with the "center" pixel IDs.
    col : int array
        Output array to be filled with the "neighbor" pixel IDs.
    data : float array
        Output array to be filled with the distances from center to
        neighbor pixels.

    Notes
    -----
    No size or bounds checking is performed. Users should ensure that
    - No index in `indices` falls on any edge of `image` (or the
      neighbor computation will fail or segfault).
    - The `steps` and `distances` arrays have the same shape.
    - The `row`, `col`, `data` are long enough to hold all of the
      edges.

    """
    image = image.ravel()
    n_neighbors = steps.size
    k = 0
    for h in indices:
        i = image[h]
        if image[i] != 0:
            for j in range(n_neighbors):
                n = steps[j] + i
                if image[n] != 0:
                    row[k] = image[i]
                    col[k] = image[n]
                    data[k] = distances[j]
                    k += 1

I'm having two problems:

1) It still seems quite slow, taking ~360ms to process ~150K indices, which works out to 300ns per inner loop (for j in range(n_neighbors) — there's 8 neighbors per pixel). Are there any obvious optimization steps I'm missing?

2) The use of an `indices` array was a trick I added later, when I realized that only 5% of my pixels are nonzero. I was previously iterating over the *entire* image, and checking whether the center pixel was nonzero. Astonishingly, this earlier approach is 24x *faster* than iterating over `indices`! Here's the faster code:

@numba.jit(nopython=True, cache=True, nogil=True)
def write_pixel_graph(image, steps, distances, row, col, data):
    image = image.ravel()
    n_neighbors = steps.size
    start_idx = np.max(steps)
    end_idx = image.size + np.min(steps)
    k = 0
    for i in range(start_idx, end_idx + 1):
        if image[i] != 0:
            for j in range(n_neighbors):
                n = steps[j] + i
                if image[n] != 0:
                    row[k] = image[i]
                    col[k] = image[n]
                    data[k] = distances[j]
                    k += 1

Any help is much appreciated!

Juan.

Juan Nunez-Iglesias

未讀,

2016年11月1日凌晨1:37:312016/11/1

收件者：numba...@continuum.io

I just saw that there was a bug in my "indices" code. The new code is only 2x slower than iterating over all indices:

        for j in range(n_neighbors):
            n = steps[j] + h


            if image[n] != 0:

                row[k] = image[h]


                col[k] = image[n]
                data[k] = distances[j]
                k += 1

Juan Nunez-Iglesias

未讀,

2016年11月1日凌晨1:42:022016/11/1

收件者：numba...@continuum.io

And the new computation takes 22ns per inner loop — perhaps inside acceptable territory? I count ~15 operations, and I'm running this on a 1.1GHz Core m3. But I'm not an expert on this, so ideas as to whether I'm in "as good as it gets" territory are still appreciated! Thanks!

Juan.

William Shipman

未讀,

2016年11月6日下午3:14:252016/11/6

收件者：numba...@continuum.io

Some random thoughts that may or may not help here, perhaps you've already considered them:

Is the indices array sorted? Depending on sparsity this might help reduce cache misses.
ndarray.ravel can copy the array if it needs to to make a flat array, why not use ordinary indices?
Since this is an image, I assume you're accessing neighbours in the usual sense, like the pixels above, below, left and right. Why not write a double for loop for this instead of accessing yet more arrays?

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CA%2BJHcKRBVCNGr1s0VjBMQc-6VtpJa1nzr8J%3Dn053kNjOLr8ziA%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Juan Nunez-Iglesias

未讀,

2016年11月6日下午6:04:532016/11/6

收件者：numba...@continuum.io

Hi William,

Thanks for your response! The answers to Qs 2 and 3 is that I want my code to be agnostic to both image dimension (I have use cases for both 2D and 3D images) and to the definition of neighborhood, ie how many diagonals to include — and 3D neighborhoods can get hairy.

As to question 1, indices are sorted but the image is very sparse — I am analyzing skeletons. So I suspect I am getting cache misses galore, but I also don't think I can do much about it.

Thanks again! After reading a bit more of the numba docs, I think I just need to start looking more carefully at the inspect_types() output, and see whether there's any unnecessary ops that can be eliminated in there.

Is there any reason why numba couldn't/shouldn't use an HTML view similar to cython -a to display this kind of function, as a clue to expensive operations? e.g. with an "output_annotations=True" to jit().

Thanks,

Juan.

To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.

To post to this group, send email to numba...@continuum.io.

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CAOUkfAxhi3ACXBnGyy7r1wx93D_44KtDwbRGsqPdwoH-MMs_JQ%40mail.gmail.com.

William Shipman

未讀,

2016年11月7日下午2:59:222016/11/7

收件者：numba...@continuum.io

There's also the inspect_asm() function that could help - it gives you the final assembler code that was generated.

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CAOUkfAxhi3ACXBnGyy7r1wx93D_44KtDwbRGsqPdwoH-MMs_JQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--

You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CA%2BJHcKTk-4qVJ-sr3QyK%2Bq9hP5bCe57Gm5_jqXtO%2BLrtPBJ5tg%40mail.gmail.com.

回覆所有人

回覆作者

轉寄