SoftmaxLoss backprop

25 views

SoftmaxWithLossbackward

Skip to first unread message

Alex Ter-Sarkisov

unread,

Sep 3, 2017, 2:52:17 PM9/3/17

to Caffe Users

From softmax_loss_layer.cpp, the derivative of the loss backpropagated to the previous layer is

bottom_diff[i * dim + label_value * inner_num_ + j] -= 1;

In other words, the value of the derivative in corresponding pixel is reduced by 1 in the backward stage. In such case I don't understand, how it is possible to get positive derivatives using this loss function in the last layer. Yet I observe this all the time. What am I missing?

Przemek D

unread,

Sep 7, 2017, 3:59:03 AM9/7/17

to Caffe Users

If the derivative of a weight with respect to the layer's output is negative and the derivative of the loss input is negative, then by the chain rule the derivative of that weight with respect to the loss will be positive (minus times minus equals plus). Andrej Karpathy's notes explain this in more detail.

Alex Ter-Sarkisov

unread,

Sep 7, 2017, 6:26:00 AM9/7/17

to Caffe Users

Thanks Przemek, I actually sat down and derived it myself. It is indeed -1 + softmax(x). My problem was that due to to imbalance in data weight update went in the opposite direction, so the derivatives in deeper upsampling layers changed the sign. I started using an Infogain loss function since an it fixed it.

Reply all

Reply to author

Forward

0 new messages