Implementing Multiclass Dice Loss Function

1,108 views
Skip to first unread message

Depo Depo

unread,
Feb 17, 2021, 9:24:31 AM2/17/21
to Keras-users
I am doing multi class segmentation using UNet. My output from the model is,

    outputs = layers.Conv3D(n_classes, (1, 1, 1), padding="same", activation='softmax')(d4)

Using SparseCategoricalCrossentropy I can train the network fine. Now I would like to also try dice coefficient as the loss function. My true and pred shapes are as follows,

    y_true = tf.constant([0.0, 1.0, 2.0])
    y_pred = tf.constant([[0.9, 0.95, 0.90], [0.1, 0.8, 0.5], [0.1, 0.8, 0.9]])

I've implemented dice coeffient as follows,

    def softargmax(x, beta=1e10):
        x = tf.convert_to_tensor(x)
        x_range = tf.range(x.shape.as_list()[-1], dtype=x.dtype)
        return tf.reduce_sum(tf.nn.softmax(x*beta) * x_range, axis=-1)

    def dice_coef(y_true, y_pred, smooth=1e-7):
        y_true = K.flatten(K.one_hot(K.cast(y_true, 'int32'), num_classes=n_classes))
    
        y_pred = softargmax(y_pred)
        y_pred = K.flatten(K.one_hot(K.cast(y_pred, 'int32'), num_classes=n_classes))

        intersect = K.sum(y_true * y_pred, axis=-1)
        denom = K.sum(y_true + y_pred, axis=-1)
        return K.mean((2. * intersect / (denom + smooth)))

    def dice_loss(y_true, y_pred):
            return 1 - dice_coef(y_true, y_pred)

I can test this in isolation

    dice_coef(y_true, y_pred)

However when I try to use this as the loss function I get an error that says,

     ValueError: No gradients provided for any variable:

Googling for the error I found that this happens when a function is not differentiable. while debugging problem seems to be arise from `y_pred = K.flatten...` line but it works for `y_true` why does it fail for `y_pred`?

Talha Bukhari

unread,
Feb 25, 2021, 8:26:49 PM2/25/21
to Keras-users
Hi,

As I see the provided code snippet, the issue of differentiability arises due to one-hot encoding y_pred. Note that you don't need to one-hot encode the softmax output from the network. Furthermore, I'm not sure why you are using a function named softargmax; simply using softmax (across the channel dimension) should do.

Let me step back a little bit. Firstly, the inputs to the Dice loss function should be:
  1. One-hot encoded ground truth tensor.
  2. Softmax-ed output tensor from the neural network.
(Note that both tensors would be of the same shape and in the format: N, H, W, ..., C.)

Then, proceed to evaluate the numerator and denominator expressions. A recommended approach here is to sum over 'all except channel dimensions' instead of 'only the channel dimensions' that you are doing here. Therefore, both numerator and denominator will be tensors of shape (N, C). After this step, you should proceed as you are already doing, i.e., divide numerator and denominator element-wise and return mean (or 1 - mean) of the result. That's it!

For an intuition behind the Dice loss function, refer to my comment (as well as other's answers) at Cross-Validation [1]. I also pointed out an apparent mistake in the, now deprecated, keras-contrib implementation of Jaccard loss function [2].

Best,
Talha

Reply all
Reply to author
Forward
0 new messages