Oversplitting by watershed

987 views
Skip to first unread message

Frank

unread,
Nov 12, 2012, 7:43:21 AM11/12/12
to scikit...@googlegroups.com
Dear group,

I have some issues with the watershed algorithm implemented in scikits image. I use a global threshold to segment cells from background, but some cells touch and I want them to be split. Watershed seems the appropriate way to deal with my problem, however my particles are split in too many pieces. Is there a way to adjust the sensitivity of the watershed method?

Many thanks for any suggestion!

The code that I use looks like below. An example image that I want to process can be downloaded here: https://dl.dropbox.com/u/10373933/test.jpg

# packages needed to perform image processing and analysis
import numpy as np
import scipy as scp
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import scipy.ndimage as nd
import skimage
from skimage import io
from skimage.morphology import watershed, is_local_maximum
from skimage.segmentation import find_boundaries, visualize_boundaries
from skimage.color import gray2rgb

#read files jpeg file
image = mpimg.imread('c:\\test.jpg')
image_thresh = image > 140
labels = nd.label(image_thresh)[0]
distance = nd.distance_transform_edt(image_thresh)
local_maxi = is_local_maximum(distance, labels=labels, footprint=np.ones((9, 9)))
markers = nd.label(local_maxi)[0]
labelled_image = watershed(-distance, markers, mask=image_thresh)

#find outline of objects for plotting
boundaries = find_boundaries(labelled_image)
img_rgb = gray2rgb(image)
overlay = np.flipud(visualize_boundaries(img_rgb,boundaries))
imshow(overlay)



Tony Yu

unread,
Nov 12, 2012, 4:56:47 PM11/12/12
to scikit...@googlegroups.com
Hi Frank,

Actually, I don't think the issue is in the watershed segmentation. Instead, I think the problem is in the marker specification: Using local maxima creates too many marker points when a blob deviates greatly from a circle. (BTW, does anyone know if there are any differences between `is_local_maximum` and `peak_local_max`? Maybe the former should be deprecated.)

Using the centroids of blobs gives cleaner results. See slightly-modified example below.

Best,
-Tony

# packages needed to perform image processing and analysis
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage as nd

from skimage import io
from skimage import measure
from skimage.morphology import watershed
from skimage.segmentation import find_boundaries, visualize_boundaries
from skimage.color import gray2rgb

#read files jpeg file
image = io.imread('test.jpg')

image_thresh = image > 140
labels = nd.label(image_thresh)[0]
distance = nd.distance_transform_edt(image_thresh)

props = measure.regionprops(labels, ['Centroid'])
coords = np.array([np.round(p['Centroid']) for p in props], dtype=int)
# Create marker image where blob centroids are marked True
markers = np.zeros(image.shape, dtype=bool)
markers[tuple(np.transpose(coords))] = True

labelled_image = watershed(-distance, markers, mask=image_thresh)

#find outline of objects for plotting
boundaries = find_boundaries(labelled_image)
img_rgb = gray2rgb(image)
overlay = visualize_boundaries(img_rgb, boundaries, color=(1, 0, 0))

plt.imshow(overlay)
plt.show()

Josh Warner

unread,
Nov 12, 2012, 6:57:19 PM11/12/12
to scikit...@googlegroups.com
I've looked at these two algorithms, and the biggest difference seems to be in the output.  `is_local_maximum` returns a Boolean array, while `peak_local_max` returns the indices of points corresponding to maxima (a la `np.sort` vs. `np.argsort`, though you cannot just pass the output of `peak_local_max` as indices). 

The other differences are more subtle, but significant.

The API for `peak_local_max` could use some cleanup - the first threshold kwarg is set to 'deprecated'(!) and IMHO should be removed - but this algorithm allows finer control over thresholding and peak searching. This would be good to avoid finding phantom peaks in noise if large dark regions were present. One significant drawback of this algorithm is the min_distance kwarg is set by default to 10, rather arbitrary, and ANY input (even the minimum value of 1) excludes both neighboring pixels AND the border.  See example below.

In contrast, `is_local_maximum` has a much simpler API.  It doesn't have the finer thresholding / peak searching controls, but has a unique ability to search for peaks ONLY within arbitrary, connected, labeled regions.  This has some interesting potentials for masking etc, though I believe within each label only one peak will be found.  This algorithm also has the ability to search arbitrary local regions for peaks using something akin to a morphological structuring element, through the `footprint=` kwarg.  The documentation for this could probably be clarified.  

The way `peak_local_max` excludes borders concerns me for general use, as does its default `min_distance=10`, and personally I would prefer to wok around the limitations in `is_local_maximum`.  

A best-of-both-worlds combination could probably be created without overly much effort...


Snippet showing border-excluding behavior of `peak_local_max`, which will only get worse with higher values of `min_distance`:

import numpy as np
import matplotlib.pyplot as plt
from skimage.feature import peak_local_max
from skimage.morphology import is_local_maximum

# Generate standardized random data
np.random.seed(seed=1234)
testim = np.random.randint(0, 255, size=(20, 20))

# Find peaks using both methods
ismax   = is_local_maximum(testim)                  # Boolean image returned
peakmax = peak_local_max(testim, min_distance=1)    # (M, 2) indices returned

# `peakmax` not plottable - placing values in 2d array
Ipeakmax = np.zeros(testim.shape)
Ipeakmax[peakmax[:, 0], peakmax[:, 1]] = 1

# Show the results
fig, ax = plt.subplots(ncols=2, nrows=1)
ax[0].imshow(ismax, cmap='gray')
ax[0].set_title('Peaks found by `is_local_maximum`')
ax[1].imshow(Ipeakmax, cmap='gray')
ax[1].set_title('Peaks found by `peak_local_max`')

plt.show()

Tony Yu

unread,
Nov 12, 2012, 10:42:44 PM11/12/12
to scikit...@googlegroups.com
Thanks for the detailed response!

On Mon, Nov 12, 2012 at 6:57 PM, Josh Warner <silvertr...@gmail.com> wrote:
I've looked at these two algorithms, and the biggest difference seems to be in the output.  `is_local_maximum` returns a Boolean array, while `peak_local_max` returns the indices of points corresponding to maxima (a la `np.sort` vs. `np.argsort`, though you cannot just pass the output of `peak_local_max` as indices). 

The other differences are more subtle, but significant.

The API for `peak_local_max` could use some cleanup - the first threshold kwarg is set to 'deprecated'(!) and IMHO should be removed - but this algorithm allows finer control over thresholding and peak searching.


I think it's marked 'deprecated' so that it can be removed gracefully (i.e. it gives people time to change their code). That said, it was marked 'deprecated' a few releases back, so it's probably ripe for removal.

 
This would be good to avoid finding phantom peaks in noise if large dark regions were present. One significant drawback of this algorithm is the min_distance kwarg is set by default to 10, rather arbitrary, and ANY input (even the minimum value of 1) excludes both neighboring pixels AND the border.  See example below.

In contrast, `is_local_maximum` has a much simpler API.  It doesn't have the finer thresholding / peak searching controls, but has a unique ability to search for peaks ONLY within arbitrary, connected, labeled regions.  This has some interesting potentials for masking etc, though I believe within each label only one peak will be found.  This algorithm also has the ability to search arbitrary local regions for peaks using something akin to a morphological structuring element, through the `footprint=` kwarg.  The documentation for this could probably be clarified.  

The way `peak_local_max` excludes borders concerns me for general use, as does its default `min_distance=10`, and personally I would prefer to wok around the limitations in `is_local_maximum`.  

A best-of-both-worlds combination could probably be created without overly much effort...


This is a great summary of the API differences. I agree that excluding the border region is a bit of a wart. (My guess is that this could be fixed by changing the boundary condition on the maximum filter used in `peak_local_max`.) Although I agree that defaulting to `min_distance=10` is arbitrary, I'm not sure there's an obvious choice. I would normally assume that peaks separated by 1 pixel are just noise.

The footprint and mask parameter would definitely be a nice addition to `peak_local_max`.

If you have time, a pull request to address some or all of these issues would be great. If not, maybe you could this as an issue on github?

Thanks,
-Tony

 
--
 
 

Josh Warner

unread,
Nov 13, 2012, 12:35:08 AM11/13/12
to scikit...@googlegroups.com
I can probably put something together.  What should the goal be?  Expand the featureset of one algorithm, such that the other can be collapsed into a wrapper function with no loss of backwards compatibility, or expand the featureset of one and eliminate the other (carefully changing all internal references to the old function)?  

The latter might be the best/ideal world solution, but even if all of the internal references were changed appropriately it could break 3rd party code.  I would lean toward the former option, moving in the direction of `is_local_maximum`, though this does appear to be the slower algorithm at present. 

Frank Pennekamp

unread,
Nov 13, 2012, 4:54:15 AM11/13/12
to scikit...@googlegroups.com
Hi Tony,

thanks for helping me out on this again. Your solution produces a nice segmentation of the image, but the particles that need to be split remain touching (the diving cell left of the big blob in the middle; the two cells in the lower left quarter that touch on their tips). I think it is the same result as just using the global threshold.

I agree with you that the problem seem to be the markers. I have about three times more markers than actual objects, so that's not corresponding to the actual number of objects at all. On the other extreme, replacing the regional maxima with the centroid of the thresholded blobs is not splitting the touching objects, because there is only one centroid per object.

I had some success in splitting objects with the watershed algorithm implemented in ImageJ, maybe there is a way of translating their approach into Python. Their description is the follwing:

Watershed segmentation of the Euclidian distance map (EDM) is a way of automatically separating or cutting apart particles that touch (Watershed separation of a grayscale image is available via the Find Maxima... command). The Watershed command requires a binary image containing black particles on a white background. It first calculates the Euclidian distance map and finds the ultimate eroded points (UEPs). It then dilates each of the UEPs (the peaks or local maxima of the EDM) as far as possible - either until the edge of the particle is reached, or the edge of the region of another (growing) UEP. Watershed segmentation works best for smooth convex objects that don't overlap too much.

watershed example

Ultimate points: generates the ultimate eroded points (UEPs) of the EDM. Requires a binary image as input. The UEPs represent the centers of particles that would be separated by segmentation. The UEP's gray value is equal to the radius of the inscribed circle of the corresponding particle. Use Process>Binary>Options to set the background color (black or white) and the output type.

How could i get the ultimate eroded points in scikit image? There seems no function to do so for the moment, but may you have a suggestion how to tackle this problem?

Many thanks in any case for your help already!

Best,

   Frank

--
 
 

Frank Pennekamp

unread,
Nov 13, 2012, 6:57:33 AM11/13/12
to scikit...@googlegroups.com
Hi,

just found a way to get my desired result: applying a gaussian filter to the distance map allows be to adjust the number of local maxima found and thereby controlling the sensitivity of the following watershed. Maybe not the best option, but it serves the purpose. Here the code if you are interested to check it yourself.


# packages needed to perform image processing and analysis
import numpy as np
import scipy as scp
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import scipy.ndimage as nd
import skimage
from skimage import io
from skimage.morphology import watershed, is_local_maximum
from skimage.segmentation import find_boundaries, visualize_boundaries
from skimage.color import gray2rgb

#read files jpeg file
image = mpimg.imread('c:\\test.jpg')
image_thresh = image > 140
labels = nd.label(image_thresh)[0]

distance = nd.distance_transform_edt(image_thresh)
#apply Gaussian filter to the distance map to merge several local maxima into one
distance=nd.gaussian_filter(distance,3)

local_maxi = is_local_maximum(distance, labels=labels, footprint=np.ones((9, 9)))
markers = nd.label(local_maxi)[0]
labelled_image = watershed(-distance, markers, mask=image_thresh)

#find outline of objects for plotting
boundaries = find_boundaries(labelled_image)
img_rgb = gray2rgb(image)
overlay = np.flipud(visualize_boundaries(img_rgb,boundaries))
imshow(overlay)

Cheers,

    Frank

Stéfan van der Walt

unread,
Nov 13, 2012, 3:58:16 PM11/13/12
to scikit...@googlegroups.com
On Mon, Nov 12, 2012 at 3:57 PM, Josh Warner <silvertr...@gmail.com> wrote:
> In contrast, `is_local_maximum` has a much simpler API. It doesn't have the

Just for the record, `is_local_maximum` is mentioned in:

http://shop.oreilly.com/product/0636920020219.do

So, if we could write a unified backend but still expose this
function, that'd be good!

Stéfan

Tony Yu

unread,
Nov 13, 2012, 8:26:16 PM11/13/12
to scikit...@googlegroups.com
I'm also in favor of keeping `is_local_maximum` and `peak_local_max` as separate functions, primarily because they have different return values (both of which have valid use cases). But... I'd be in favor of deprecating the current `is_local_maximum` in the `morphology` subpackage and renaming it to `is_local_max` in the `feature` subpackage. Unfortunately emoving the current function would break code (although the transition could be smooth if it's removed after a couple of releases with a deprecation warning). What do you think?


On Tue, Nov 13, 2012 at 12:35 AM, Josh Warner <silvertr...@gmail.com> wrote:
I can probably put something together.  What should the goal be?  Expand the featureset of one algorithm, such that the other can be collapsed into a wrapper function with no loss of backwards compatibility, or expand the featureset of one and eliminate the other (carefully changing all internal references to the old function)?  

The latter might be the best/ideal world solution, but even if all of the internal references were changed appropriately it could break 3rd party code.  I would lean toward the former option, moving in the direction of `is_local_maximum`, though this does appear to be the slower algorithm at present. 

As mentioned above, I think it'd be best to have a single core function, but keep the other function (though possibly renamed and relocated) as a wrapper of the core function. If `is_local_maximum` is slower, I think it would be good to start with `peak_local_max` as the base.

As a first pass, it'd be great to fix the border issue with `peak_local_max`. (It might be nice to make this optional, since a user may want to require that the peak must be `min_distance` away from the image border---I could go either way on whether or not this is optional). Adding masking and a footprint parameter would be great, and I assume it should be straightforward (note that `scipy.ndimage.maximum_filter` has a footprint parameter).

Best,
-Tony

Josh Warner

unread,
Nov 14, 2012, 12:20:02 AM11/14/12
to scikit...@googlegroups.com
Thanks for the input, Tony and Stefan.  Unless someone else wants to tackle this, I'll see what I can put together over a week or so.  

Josh
Reply all
Reply to author
Forward
0 new messages