I am regarding the way the Caffe performs the backward part of the Convolution layer. In mathematics, the convolution provide a local connection as follows y=Wx
(forward part), where x is input and y is output, W is weight
In backward part, we will compute the partial different (d) of the loss function E with respect to W and x. Given the gradient w.r.t the output, backward aims to compute the gradient w.r.t input and internal parameters (weight and bias). We can express it as the formulas
//Gradient w.r.t. bottom data x
dE/dx = dE/dy . dy/dx = dE/dy . w (Eq.1)
//Gradient of E wrt w
dE/dw = dE/dy . dy/dw = dE/dy . x (Eq.2)
Now, we will move to the implementation in Caffe. First, we need to get data of top and bottom layer
input (data, diff)-->Convolution (data,diff) -->output (data,diff)
In the Caffe, we can get the weight W, and bottom data x in (Eq.1), (Eq.2) as follows:
const Dtype* weight = this->blobs_[0]->cpu_data();
const Dtype* bottom_data = bottom[i]->cpu_data();
Let's compute gradient w.r.t weight (Eq.2) as example as :
this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,top_diff + n * this->top_dim_, weight_diff);
where the function
weight_cpu_gemm is defined asweight_cpu_gemm(const Dtype* input,const Dtype* output, Dtype* weights)
My question is that the output of weight_cpu_gemm is top_diff as above definition. Why does not weight_diff? Because as my understood, top_diff likes dE/dy and weight_diff likes dE/dw in Eq.2, Is it right? If not, could you please explain to me the relationship between mathematics notation and Caffe's variable? Thanks