Help to understand CPU implementation of euclidean loss layer

Yoann

unread,

May 12, 2015, 6:22:05 AM5/12/15

to caffe...@googlegroups.com

Hi all,

I'm not used to develop code with CPU optimization and this is why I'm a bit lost when I try to understand the code in euclidean_loss_layer.cpp. I need help to understand what is done in this file and make my own loss layer :).

I understand that the goal is to compte diff_ which contains the loss_weight.

To compute diff_, the forward pass in Forward_cpu function computes the table of the difference between the predicted values and the ground truths (the number depends on the batch size). Right? :)

This table is stored in diff_. Why?

Then, the dot and loss values are computed and the loss value is stored in top[0]->mutable_cpu_data()[0].

For the backward pass, I don't really understand what is happening.

const Dtype sign = (i == 0) ? 1 : -1;

Why is the sign negative when computing gradients with respect to the label inputs?

const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();

Where does this equation come from? I can't find any match with http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm or http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
What is top[0]->cpu_diff()[0]? Is it the loss value stored in top[0]->mutable_cpu_data()[0]?
Thus, what is the difference between diff_ and data_ for a Blob?

Finally, the loss weight is computed with: bottom[i]->mutable_cpu_diff() = alpha * diff_.cpu_data()

diff_.cpu_data() is the table of the difference computed in the forward pass, right?

As you can see, the backward process is still obscure for me.

Thanks in advance for your help!

Best,

Yoann

Prem kumar

unread,

Mar 17, 2016, 2:42:17 AM3/17/16

to Caffe Users

Hi Yoann,

Did you understand any of those? I'm exactly looking for the same answers that you were. Any inputs would be appreciated.

Only thing that i could understand is const Dtype sign = (i == 0) ? 1 : -1; , Since we are taking partial derivative w.r.t actual labels, it gives '-ve' because of the minus sign in 1/(2N)*Summation{||yhat - y||2}

Thank you,

Prem

Christopher Reale

unread,

Mar 17, 2016, 6:41:42 PM3/17/16

to Caffe Users

1. For one sample, loss function is L=|x-y|^2. Gradient with respect to x is positive. With respect to y is negative.

2. top[0]->cpu_diff()[0] is normally the gradient coming back from the next layer through output. Since this is a loss function, normally this will be the last layer in the network, so it's just a constant.

bottom[i]->num() is the number of samples coming in at a time from input i. Normally this is the same as your batch size. This keeps it consistent with the Forward function (caclulates the average euclidean loss rather than the sum)

data is used to store the signal that propagates forward to the next layer(s) of the network

diff is used to store gradients that propagate backwards

3. Yes.

Chris

Reply all

Reply to author

Forward