SoftmaxLoss backprop

24 views
Skip to first unread message

Alex Ter-Sarkisov

unread,
Sep 3, 2017, 2:52:17 PM9/3/17
to Caffe Users
From softmax_loss_layer.cpp, the derivative of the loss backpropagated to the previous layer is 

 bottom_diff[i * dim + label_value * inner_num_ + j] -= 1;

In other words, the value of the derivative in corresponding pixel is reduced by 1 in the backward stage. In such case I don't understand, how it is possible to get positive derivatives using this loss function in the last layer. Yet I observe this all the time. What am I missing? 

Przemek D

unread,
Sep 7, 2017, 3:59:03 AM9/7/17
to Caffe Users
If the derivative of a weight with respect to the layer's output is negative and the derivative of the loss input is negative, then by the chain rule the derivative of that weight with respect to the loss will be positive (minus times minus equals plus). Andrej Karpathy's notes explain this in more detail.

Alex Ter-Sarkisov

unread,
Sep 7, 2017, 6:26:00 AM9/7/17
to Caffe Users
Thanks Przemek, I actually sat down and derived it myself. It is indeed -1 + softmax(x). My problem was that due to to imbalance in data weight update went in the opposite direction, so the derivatives in deeper upsampling layers changed the sign. I started using an Infogain loss function since an it fixed it. 
Reply all
Reply to author
Forward
0 new messages