Dividing a large image into smaller overlapping blocks for parallel processing

2,918 views
Skip to first unread message

Riaan van den Dool

unread,
Aug 31, 2013, 10:04:53 AM8/31/13
to scikit...@googlegroups.com
Hi guys

I would like to use scikit-image to process large images, for example (5696, 13500).

In the interest of speed I need to divide the image into smaller sub-images with the possibility of processing these in parallel.

If I define the sub-images so that neighbouring sub-images overlap then edge effects should not be a problem for the algorithm operating on each sub-image.

This is probably a specific case of the more general border/edge-effect handling issue as addressed by the mode parameter here:

My questions:
  1. Is there already a image-division function/strategy implemented in scikit-image? 
  2. Is this something that might be included in future if an implementation is available?
  3. Please share any references to articles or code that deals with this.
Riaan


 

Johannes Schönberger

unread,
Aug 31, 2013, 12:49:31 PM8/31/13
to scikit...@googlegroups.com
Hi Riaan,

Unfortunately we do not have (at least I do not know of) a function similar to Matlab's `blockproc`. Such feature would be a great addition to skimage!

Regards, Johannes
> --
> You received this message because you are subscribed to the Google Groups "scikit-image" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Riaan van den Dool

unread,
Aug 31, 2013, 2:05:04 PM8/31/13
to scikit...@googlegroups.com
The blockproc function's signature provides a useful starting point, thanks. 

I will have to think about how to do the parallel execution from the function.

Blockproc provides two 'padding' methods: replicate and symmetric. I guess what I need could be called margin, or overlap perhaps. 

For the margin case it might make sense that such a function merely returns an array of block definitions, rather than blocks of pixel data. But this would not be so applicable for the replicate and symmetric cases I think.

R

Johannes Schönberger

unread,
Aug 31, 2013, 2:17:25 PM8/31/13
to scikit...@googlegroups.com
Some hints:

- pad image with skimage.util.pad, which allows a large number of padding methods
- spawn a pool of processes using Python's multiprocessing package in the standard library
- use shared memory to provide read access to complete image
- define slices of image blocks and add them to a processing queue

Riaan van den Dool

unread,
Aug 31, 2013, 3:26:51 PM8/31/13
to scikit...@googlegroups.com
Thanks

Colin Lea

unread,
Aug 31, 2013, 7:01:55 PM8/31/13
to scikit...@googlegroups.com
You also might want to look into joblib which makes it very easy to do parallel computations. This is used frequently in sklearn to speedup code.

Stéfan van der Walt

unread,
Sep 1, 2013, 5:34:48 AM9/1/13
to scikit-image
On Sat, Aug 31, 2013 at 8:17 PM, Johannes Schönberger <js...@demuc.de> wrote:
> - pad image with skimage.util.pad, which allows a large number of padding methods
> - spawn a pool of processes using Python's multiprocessing package in the standard library
> - use shared memory to provide read access to complete image
> - define slices of image blocks and add them to a processing queue

How about we add an `overlap` parameter to
`skimage.utill.view_as_windows`? That should solve this problem.

Stéfan

Johannes Schönberger

unread,
Sep 1, 2013, 5:37:22 AM9/1/13
to scikit...@googlegroups.com
Yes that should be very useful. Nevertheless, I think a function like Matlab's blockproc would be a really good addition.

Colin Lea

unread,
Sep 1, 2013, 11:53:51 AM9/1/13
to scikit...@googlegroups.com
I just made a function blocproc to do this. I'll add a pull request later today. It works in a similar way as the matlab feature. 

Johannes Schönberger

unread,
Sep 1, 2013, 1:25:46 PM9/1/13
to scikit...@googlegroups.com
Great!

Riaan van den Dool

unread,
Sep 3, 2013, 3:27:51 PM9/3/13
to scikit...@googlegroups.com
I have created a gist with my thoughts of what such a function could look like.


The examples shown are a bit contrived, with border_size=(0,0) and non-overlapping 'rolling' windows, but the idea should be clear I think.

The proc_func function can either process and return a value synchronously, or create a separate job/process for each window, depending on implementation. By keeping this logic separate from the windowing function any multiprocessing-type solution can be used according to preference. In the case of asynchronous processing the results tuple will either be filled with None, or any other value returned by proc_func, ie it will not be the asynchronous result itself and the proc_func will have to implement a way for the asynchronous results to be returned when available (callback function as an example).

Your thoughts?

Riaan

Colin Lea

unread,
Sep 3, 2013, 9:23:15 PM9/3/13
to scikit...@googlegroups.com
Thanks Riaan, I've already made a PR for this. See here: https://github.com/scikit-image/scikit-image/pull/723/

Riaan van den Dool

unread,
Sep 4, 2013, 12:06:24 AM9/4/13
to scikit...@googlegroups.com
Yes Colin, I saw, and I have commented on the PR as well.

I am submitting this as an alternative approach, because I am not sure that the approach you took is 100% in line with what I need.

R

Riaan van den Dool

unread,
Sep 11, 2013, 2:20:22 AM9/11/13
to scikit...@googlegroups.com
The current state of the PR looks promising. I have added a comment with some points for discussion around design choices.


Reply all
Reply to author
Forward
0 new messages