why the bottom_diff of softmax_loss layer should scale with loss

Justin Ko

unread,

Jan 2, 2018, 6:27:57 AM1/2/18

to Caffe Users

In the code SoftmaxWithLossLayer<Dtype>::Backward_cpu

after compute the bottom_diff as bottom_diff[i * dim + label_value * inner_num_ + j] -= 1; there is another operation to scale the gradient :

 Dtype loss_weight = top[0]->cpu_diff()[0] /get_normalizer(normalization_, count);
 caffe_scal(prob_.count(), loss_weight, bottom_diff);

But as I know, acoording to the equation of softmax loss , its bottom_diff should be calculated as this (from ufldl deep learning tutorial):
in the equation above, bottom_diff only scale with count without the multiply with the loss:top[0]->cpu_diff()[0], so i am curious why caffe implement as this ,is there any other considerations or did I have wrong understanding of the theory

Thanks, any reply would be greatly appreciated.

Justin Ko

unread,

Jan 2, 2018, 6:40:46 AM1/2/18

to Caffe Users

The image in the post is in below, sorry to mistake of image

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/Ry7HWbgSuFE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users+unsubscribe@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/1e22a9b5-630a-4908-be5e-4bcbd01e0661%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Justin Ko

unread,

Jan 2, 2018, 7:39:10 PM1/2/18

to Caffe Users

I found that top[0]->cpu_diff()[0] is the loss weight ,not the cpu_data that stores the loss, that makes sense,this question closed, thanks.

Reply all

Reply to author

Forward