From mathematics equations to programming of Convolution in Caffe

16 views
Skip to first unread message

john1...@gmail.com

unread,
Feb 2, 2017, 6:00:27 AM2/2/17
to Caffe Users
I am regarding the way the Caffe performs the backward part of the Convolution layer. In mathematics, the convolution provide a local connection as follows y=Wx
(forward part), where x is input and y is output, W is weight

In backward part, we will compute the partial different (d) of the loss function E with respect to W and x. Given the gradient w.r.t the output, backward aims to compute the gradient w.r.t input and internal parameters (weight and bias). We can express it as the formulas

    //Gradient w.r.t. bottom data x
    dE
/dx = dE/dy . dy/dx = dE/dy . w      (Eq.1)
   
//Gradient of E wrt w  
    dE
/dw = dE/dy . dy/dw = dE/dy . x      (Eq.2)

Now, we will move to the implementation in Caffe. First, we need to get data of top and bottom layer

input (data, diff)-->Convolution (data,diff) -->output (data,diff)

In the Caffe, we can get the weight  W, and bottom data x in (Eq.1), (Eq.2) as follows:

const Dtype* weight = this->blobs_[0]->cpu_data();
const Dtype* bottom_data = bottom[i]->cpu_data();

Let's compute gradient w.r.t weight (Eq.2) as example as :

this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,top_diff + n * this->top_dim_, weight_diff);

where the function weight_cpu_gemm is defined as
weight_cpu_gemm(const Dtype* input,const Dtype* output, Dtype* weights)

My question is that the output of weight_cpu_gemm is top_diff as above definition. Why does not weight_diff? Because as my understood, top_diff likes dE/dy and weight_diff likes dE/dw in Eq.2, Is it right? If not, could you please explain to me the relationship between mathematics notation and Caffe's variable? Thanks
Reply all
Reply to author
Forward
0 new messages