Hi,
I'm trying to write some numba code that's a little bit like an image convolution: step over an image, look at the current pixel's neighbors, and for each distinct neighbor, add their values to some COO indices. Here's the code:
@numba.jit(nopython=True, cache=True, nogil=True)
def write_pixel_graph(image, indices, steps, distances, row, col, data):
"""Step over `image` to build a graph of nonzero pixel neighbors.
Parameters
----------
image : int array
The input image.
indices : int array
The raveled indices into `image` containing nonzero entries.
steps : int array, shape (N,)
The raveled index steps to find a pixel's neighbors in `image`.
distances : float array, shape (N,)
The euclidean distance from a pixel to its corresponding
neighbor in `steps`.
row : int array
Output array to be filled with the "center" pixel IDs.
col : int array
Output array to be filled with the "neighbor" pixel IDs.
data : float array
Output array to be filled with the distances from center to
neighbor pixels.
Notes
-----
No size or bounds checking is performed. Users should ensure that
- No index in `indices` falls on any edge of `image` (or the
neighbor computation will fail or segfault).
- The `steps` and `distances` arrays have the same shape.
- The `row`, `col`, `data` are long enough to hold all of the
edges.
"""
image = image.ravel()
n_neighbors = steps.size
k = 0
for h in indices:
i = image[h]
if image[i] != 0:
for j in range(n_neighbors):
n = steps[j] + i
if image[n] != 0:
row[k] = image[i]
col[k] = image[n]
data[k] = distances[j]
k += 1
I'm having two problems:
1) It still seems quite slow, taking ~360ms to process ~150K indices, which works out to 300ns per inner loop (for j in range(n_neighbors) — there's 8 neighbors per pixel). Are there any obvious optimization steps I'm missing?
2) The use of an `indices` array was a trick I added later, when I realized that only 5% of my pixels are nonzero. I was previously iterating over the *entire* image, and checking whether the center pixel was nonzero. Astonishingly, this earlier approach is 24x *faster* than iterating over `indices`! Here's the faster code:
@numba.jit(nopython=True, cache=True, nogil=True)
def write_pixel_graph(image, steps, distances, row, col, data):
image = image.ravel()
n_neighbors = steps.size
start_idx = np.max(steps)
end_idx = image.size + np.min(steps)
k = 0
for i in range(start_idx, end_idx + 1):
if image[i] != 0:
for j in range(n_neighbors):
n = steps[j] + i
if image[n] != 0:
row[k] = image[i]
col[k] = image[n]
data[k] = distances[j]
k += 1
Any help is much appreciated!
Juan.