Dice loss for segmentation networks

isen...@googlemail.com

unread,

Dec 21, 2016, 8:05:04 AM12/21/16

to lasagne-users

Hi everyone,

I was trying to implement a dice loss for my segmentation network and came across some problems. The dice is a score that is often used for comparing segmentations in medical applications. It is somewhat similar, but more forgiving than, Jaccard. Its formula is: 2*intersect/(num_pred + num_gt).

I know that training on the dice is probably not the best loss function out there, but it at least in theory should tackle the problem of high class imbalanced often encountered in segmentation problems. It seems to have been successfully applied already (https://arxiv.org/abs/1606.04797). However, when I implement it in theano and use the negative dice as loss function, my network does not learn at all. Do you have any suggestions what I may have done wrong?

(Network is standard UNet architecture, I tried sgd and adam optimizer with a broad range of learning rates (0.01 ... 1e-6 in case of adam))

My implementation of dice calculation is added below. You can run the code as is (if you don't have medpy, comment that section out). I compare my numbers with medpy, a python package that can also compute the dice, as well as with a numpy implementation of mine. All values are the same. Just to get a feeling for the gradients (and maybe to figure out why my nets dont train), i tried to make theano calculate them wrt img_pred. When I do grad_fn = theano.function([img_pred_sym, img_gt_sym], grads), theano complains that img_gt_sym is not part of the computational graph, although it clearly is as it is required for the computation of dc_sym_theano. Any suggestions?

Any help would be greatly appreciated!

Cheers,

Fabian

import theano.tensor as T
import numpy as np

def dice(y_pred, y_true, n_classes):
    # DICE is 2 * intersect / (num_pred + num_true)
    y_true = T.flatten(y_true)
    y_pred = T.argmax(y_pred, axis=1)

    dice = T.zeros(n_classes)

    for i in range(n_classes):
        i_val = T.constant(i)
        y_true_i = T.eq(y_true, i_val)
        y_pred_i = T.eq(y_pred, i_val)
        dice = T.set_subtensor(dice[i], (T.constant(2.) * T.sum(y_true_i * y_pred_i) + T.constant(1e-7)) / (T.sum(y_true_i) + T.sum(y_pred_i) + T.constant(1e-7)))
    return dice

def dice_npy(y_pred, y_true, n_classes):
    y_true = y_true.flatten()
    y_pred = np.argmax(y_pred, -1)

    dice = np.zeros(n_classes)

    for i in range(n_classes):
        y_true_i = np.equal(y_true, i)
        y_pred_i = np.equal(y_pred, i)
        dice[i] = (2. * np.sum(y_true_i * y_pred_i) + np.float32(1e-7)) / (np.sum(y_true_i) + np.sum(y_pred_i) + np.float32(1e-7))
    return dice

if __name__ == "__main__":
    # just generate 2 dummy images
    # img_gt is 0/1 encoded for foreground/background
    img_gt = np.zeros((100, 100), dtype=np.int32)
    # fill gt with some values
    img_gt[:, :65] = 1

    # img pred is a some pseudo probability map. for simplicity, we se the whole 0 channel to 0.4 and some 1 channel to a higher value
    img_pred = np.zeros((2, 100, 100), dtype=np.float32)
    img_pred[1, :50, :] = 0.55
    img_pred[0, :, :] = 0.4
    # this will create a 10000,2 matrix with each pixel being represented by a row. This is what the output of a segmentation network looks like
    img_pred = np.transpose(img_pred, (1, 2, 0))
    img_pred = img_pred.reshape((10000, 2))

    # now we compute some dice coefficients with different packages
    # medpy
    import medpy.metric
    dc_medpy = np.float32(medpy.metric.dc(img_gt, img_pred.argmax(-1).reshape(100,100)))

    # our numpy implementation, note that this one gived a dice score for backgorund as well
    dc_npy = np.float32(dice_npy(img_pred, img_gt, 2))

    # our theano implementation, note that this one gived a dice score for backgorund as well
    # symbolic variables
    img_pred_sym = T.matrix()
    img_gt_sym = T.matrix()

    # define the symbolic dice coeff
    dc_sym_theano = dice(img_pred_sym, img_gt_sym, 2)
    import theano
    # create a theano function that computes the symbolic dice
    dc_fct_theano = theano.function([img_pred_sym, img_gt_sym], dc_sym_theano)

    dc_theano = dc_fct_theano(img_pred, img_gt.astype(np.float32))

    # these three are equal, we did not make any mistake so far
    assert dc_theano[1] == dc_npy[1] == dc_medpy

    # we now want to have a look at the gradient of the dice score
    grads = T.grad(T.mean(dc_sym_theano), img_pred_sym)
    # for some reason this crashes:
    # theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 1 is not part of the computational graph needed to compute the outputs: <TensorType(float32, matrix)>.
    # however, img_gt_sym is clearly part of the computational graph since it is required for the computation of dc_sym_theano
    grad_fn = theano.function([img_pred_sym, img_gt_sym], grads)

Jan Schlüter

unread,

Jan 6, 2017, 6:14:12 PM1/6/17

to lasagne-users, isen...@googlemail.com

When I do grad_fn = theano.function([img_pred_sym, img_gt_sym], grads), theano complains that img_gt_sym is not part of the computational graph, although it clearly is as it is required for the computation of dc_sym_theano. Any suggestions?

Sure, it is required for dc_sym_theano, that's why dc_fct_theano compiles fine. But here you're only asking for the gradient -- and it turns out it's not required for the gradient:

>>> theano.printing.debugprint(grads)

Elemwise{second,no_inplace} [id A] ''

|<TensorType(float32, matrix)> [id B]

|InplaceDimShuffle{x,x} [id C] ''

|TensorConstant{0.0} [id D]

The gradient is simply zero, independent of the input. That's why it doesn't learn.

Now the next question is how to formulate the loss such that it becomes differentiable. You could try to soften the argmax (https://bouthilx.wordpress.com/2013/04/21/a-soft-argmax/) or find a different formulation altogether (https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py#L18-L25).

isen...@googlemail.com

unread,

Jan 9, 2017, 8:00:08 AM1/9/17

to lasagne-users, isen...@googlemail.com

Hi Jan,

once again thank you very much! I was not familiar with theano.printing.debugprint.

In the meantime I tried implementing the gradient by hand (as in https://arxiv.org/abs/1606.04797) but got stuck in the depths of theano ops (used opFromGraph as in the Saliency example, then tried to overwrite the grad method). Your suggestions seem much more elegant than that!

I am however reluctant with using the soft argmax since I think it is biased towards higher class labels. In the example you provided the authors apply it to predicting coordinates where I think this bias is not as much of a problem because the target variable here is quasi continuous (ordinal) whereas class labels are nominal. If you disagree please let me know ;-)

I was aware of the link to the ultrasound segmentation example and thought my implementation would to the same thing. However, it turns out that they use the softmax probability rather than the argmax (which I assumed for some reason).

My segmentation problem has more than 2 classes but thanks to your help I could now successfully implement a dice loss that seems to be differentiable.

Thank you very much again!

Cheers,

Fabian

isen...@googlemail.com

unread,

Jan 16, 2017, 11:40:00 AM1/16/17

to lasagne-users, isen...@googlemail.com

Hi,

it's me again. I also attempted to implement the dice loss layer of the V-Net Paper (https://arxiv.org/abs/1606.04797, their code is available) in theano for comparison with my other dice implementations. Since they use the argmax in the forward pass and most of the backward pass (except in one place) I had to write my own theano op for that and redefine its gradient. This is the first time implementing an op by myself and I just cannot get it to run correctly. Maybe I overlooked something?

My op takes 2 inputs: One is the softmax output (of shape num_pixels x num_classes) and the other is the ground truth (may be a vector or a matrix). Its output should be a vector of length num_classes containing the dice values for each class (i disregard minibatches for now).

Here is my theano op implementation:

class DiceOp(theano.Op):
    __props__ = tuple("n")


    def __init__(self, n):
        self.n = n
        theano.Op.__init__(self)


    def make_node(self, softmax_output, gt_flat):
        softmax_output = theano.tensor.as_tensor_variable(softmax_output)
        gt_flat = theano.tensor.as_tensor_variable(gt_flat)
        return theano.Apply(self, [softmax_output, gt_flat], [T.vector(dtype=softmax_output.dtype)])


    def perform(self, node, inputs, output_storage, params=None):
        theano.config.exception_verbosity='high'
        out_pred = inputs[0].argmax(1)
        out_gt = inputs[1].flatten()
        output_storage[0][0] = T.zeros(self.n, dtype=inputs[0].dtype)
        for i in range(self.n):
            x_pred = T.eq(out_pred, i)
            x_gt = T.eq(out_gt, i)
            intersect = T.sum(x_gt * x_pred)
            denominator = T.sum(x_pred) + T.sum(x_gt)
            output_storage[0][0] = T.set_subtensor(output_storage[0][0][i], T.constant(2) * intersect / (denominator + T.constant(1e-5)))


    def infer_shape(self, node, in_shapes):
        return [(in_shapes[0][0],)]


    def grad(self, inp, grads):
        print len(inp), len(grads)
        for i in inp:
            print i
        softmax_output = inp[0]
        gt_flat = inp[1]
        (g, ) = grads
        d_softm = T.zeros(softmax_output.shape, softmax_output.dtype)
        out_pred = T.argmax(softmax_output, 1)


        for i in range(self.n):
            x_pred = T.eq(out_pred, i)
            x_gt = T.eq(gt_flat, i)
            intersect = T.sum(x_gt*x_pred)
            denominator = T.sum(x_pred) + T.sum(x_gt)
            d_softm = T.set_subtensor(d_softm[:, i],
                                T.constant(2) *
                                (x_gt * denominator - T.constant(2) * intersect * softmax_output[:, i]) /
                                (T.power(denominator, T.constant(2.))) * g[i]
                                )


        return [d_softm, grad_not_implemented(self, 1, gt_flat)]

And here is some code to test its functionality:

def generate_dummy_image_and_seg():
    img_gt = np.zeros((100, 100))
    img_gt[:50, :] = 1
    img_pred = np.zeros((100, 100, 2))
    img_pred[:, 60:, 0] = 0.52
    img_pred[:, :60, 0] = 0.3
    img_pred[:, 60:, 1] = 0.48
    img_pred[:, :60, 1] = 0.7
    img_pred_flat = img_pred.reshape((-1, 2))
    return img_gt, img_pred, img_pred_flat

if __name__ == "__main__":
    img_gt, img_pred, img_pred_flat = generate_dummy_image_and_seg()
    dc_medpy = metric.dc(img_gt, img_pred.argmax(-1))

    img_gt_sym = T.vector()
    img_pred_sym = T.matrix()

    dc_op = DiceOp(2)
    dc_theano_sym = dc_op(img_pred_sym, img_gt_sym)
    dc_theano_fct = theano.function([img_pred_sym, img_gt_sym], dc_theano_sym)
    dc = dc_theano_fct(img_pred_flat.astype(np.float32), img_gt.flatten().astype(np.float32))
    dc_grad_from_op = theano.function([img_pred_sym, img_gt_sym], T.grad(T.mean(dc_theano_sym), img_pred_sym))
    dc_grad_from_op(img_pred_flat.astype(np.float32), img_gt.flatten().astype(np.float32))

The op works so far in this very basic setting, but what is strange is that dc_theano_fct does not return a numerical value but a theano expression. If I do dc.eval() it returns what I would actually have expected. This is not how theano functions usually work, at least in my experience.

Furthermore I am a bit confused by output_storage. Following the theano tutorial on writing ops I should have used output_storage[0] = T.zeros(self.n, dtype=inputs[0].dtype) (instead of output_storage[0][0]), but doing this will result in an empty return value (dc is just empty when running the code from above).

The gradient seems to work just fine.

When I use this op and plug it into my segmentation network as loss then all functions will compile but when calling the train function of the network I get the following TypeError which I have a hard time interpreting:

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-21-c2e81042d903> in <module>()

----> 1 train_fn(data, seg, w)

/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)

884 node=self.fn.nodes[self.fn.position_of_error],

885 thunk=thunk,

--> 886 storage_map=getattr(self.fn, 'storage_map', None))

887 else:

888 # old-style linkers raise their own exceptions

/usr/local/lib/python2.7/dist-packages/theano/gof/link.pyc in raise_with_op(node, thunk, exc_info, storage_map)

154 scalar_values = []

155 for ipt in thunk.inputs:

--> 156 if getattr(ipt[0], "size", -1) <= 5:

157 scalar_values.append(ipt[0])

158 else:

/usr/local/lib/python2.7/dist-packages/theano/tensor/var.pyc in __nonzero__(self)

73 def __nonzero__(self):

74 # Python 2.x

---> 75 return self.__bool__()

76

77 def __bool__(self):

/usr/local/lib/python2.7/dist-packages/theano/tensor/var.pyc in __bool__(self)

89 else:

90 raise TypeError(

---> 91 "Variables do not support boolean operations."

92 )

93

TypeError: Variables do not support boolean operations.

Any help would be very much appreciated!

Cheers,

Fabian

Jan Schlüter

unread,

Jan 19, 2017, 1:23:39 PM1/19/17

to lasagne-users, isen...@googlemail.com

The op works so far in this very basic setting, but what is strange is that dc_theano_fct does not return a numerical value but a theano expression. If I do dc.eval() it returns what I would actually have expected. This is not how theano functions usually work, at least in my experience.

Correct. It's because of your "perform()" implementation -- this method is meant to be the pure-Python implementation of an Op, using numpy. Apparently, Theano does not check whether you actually return a numpy array, and happily passes on the Theano expression instead. But all following errors you describe stem from this misconception.

If you want to define an Op with symbolic forward and backward passes, you can use OpFromGraph as done here: https://github.com/Lasagne/Recipes/blob/master/examples/Saliency%20Maps%20and%20Guided%20Backpropagation.ipynb
Otherwise, if you want to define an Op with numpy forward and backward passes, you would modify your current Op to use numpy in perform(), and return an instance of a DiceGradOp in its grad() method. The DiceGradOp would again have a perform() method to perform the operation in numpy. This is not the most performant way, but the easiest to implement -- and since it only affects the loss function, the performance impact from doing that part of the computation on CPU and in uncompiled code is not extreme (i.e., it should still be fast enough to run some initial experiments).

Best, Jan

sajidi...@gmail.com

unread,

Nov 27, 2017, 7:13:47 PM11/27/17

to lasagne-users

Hi Fabian,,

I have developed the caffe loss layer by looking at your code. Can you please confirm the correction of this code. Best.

"

import sys

caffe_root = '/home/yonatan/Downloads/caffe-segnet-cudnn5-master/python'
sys.path.insert(0, caffe_root)
import caffe
import numpy as np


class DiceLossLayer(caffe.Layer):
   """
   Compute the Dice Loss in the same manner
   to demonstrate the class interface for developing layers in Python.
   """

   def setup(self, bottom, top):
      # check input pair
      if len(bottom) != 2:
         raise Exception("Need two inputs to compute distance.")

   def reshape(self, bottom, top):
      # check input dimensions match
      nclass = 5
      if bottom[0].count != nclass * bottom[1].count:
         raise Exception("Inputs must have the same dimension.")
      # difference is shape of inputs
      self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
      # loss output is scalar
      top[0].reshape(1)

   def forward(self, bottom, top):
      self.diff[...] = bottom[1].data
      top[0].data[...] = 1 - self.dice_coef_multi_class(bottom[0], bottom[1])

   def backward(self, top, propagate_down,  bottom):
      if propagate_down[1]:
         raise Exception("label not diff")
      elif propagate_down[0]:
         a=(-2. * self.diff + self.dice) / self.sum
         bottom[0].diff[...] = a
      else:
         raise Exception("no diff")
      # =============================

   def dice_coef_multi_class(self, y_pred, y_true):
      n_classes = 5
      y_true=y_true.data
      y_pred=y_pred.data
      y_pred = np.argmax(y_pred, 1)
      y_pred = np.expand_dims(y_pred,1)

      y_pred=np.ndarray.flatten(y_pred)
      y_true = np.ndarray.flatten(y_true)

      dice = np.zeros(n_classes)
      self.sum = np.zeros([n_classes])


      for i in range(n_classes):
         y_true_i = np.equal(y_true, i)
         y_pred_i = np.equal(y_pred, i)

         self.sum[i] = np.sum(y_true_i) + np.sum(y_pred_i) + np.float32(1e-7)
         dice[i] = (2. * np.sum(y_true_i * y_pred_i) + np.float32(1e-7)) / self.sum[i]
      self.sum=np.sum(self.sum)
      self.dice=np.sum(dice)
      return self.dice

Reply all

Reply to author

Forward