Can I implement Caffe's 3D convolution as a sum of 2D convolution results+how to add the bias terms?

1,029 views
Skip to first unread message

Gil Levi

unread,
Aug 26, 2014, 3:32:58 PM8/26/14
to caffe...@googlegroups.com
Hi,

As an exercise, I'm implementing the convolution layers from scratch and I wanted to consult regarding some issue. 

I'm only referring to the imagenet model. 

I noticed that Caffe's 3D filters always have the same depth (or number of channels) as their input. For example, the input for the for the first conv layer is a 3x227x227 matrix. Now, the first layer contains 96 filters of size 3x11x11. That means that in the depth dimension, there's no sliding, only simple multiplication (there is no "room" to perform convolution as the filter depth is equal to the input matrix depth).

So what I'm asking is this: 

1. For a single filter (of the 96), can I apply regular 2D convolution on each of the channels to get three 2D convolution results and then just sum them up (or apply on them some other function)? The output is of size 96x55x55, so the depth dimension somehow disappears (I would expect the result to be 96x3x55x55, as the result of 3D convolution is 3D).  


2. Another, simpler issue - The bias term (the second blob of the layer) is a vector of size 96. I assume each bias element corresponds to one filter. Now, do I just need to add the bias element to each element of the filter's convolution result (which is of size 55x55)?


Thanks,
Gil

Cliff Woolley

unread,
Aug 26, 2014, 6:35:44 PM8/26/14
to caffe...@googlegroups.com
On Tue, Aug 26, 2014 at 3:32 PM, Gil Levi <gil.l...@gmail.com> wrote:
I noticed that Caffe's 3D filters always have the same depth (or number of channels) as their input. For example, the input for the for the first conv layer is a 3x227x227 matrix. Now, the first layer contains 96 filters of size 3x11x11. That means that in the depth dimension, there's no sliding, only simple multiplication (there is no "room" to perform convolution as the filter depth is equal to the input matrix depth).
 
That's right.  These are two-dimensional convolutions.  The third dimension is some number of values ("colors") per pixel.  The convolution takes an input of N images with C colors sized H*W and produces an output of N images with K colors sized P*Q.  But it is spatially convolutional only in the H*W -> P*Q dimensions.  Every one of the K filters looks at all C colors.  [Modulo the "groups" concept, where if groups==2, then K/2 of the filters look at C/2 of the inputs, and this happens twice.]
2. Another, simpler issue - The bias term (the second blob of the layer) is a vector of size 96. I assume each bias element corresponds to one filter. Now, do I just need to add the bias element to each element of the filter's convolution result (which is of size 55x55)?
 
Each bias value is for a single filter, right (i.e, there are K of them).  You add the bias element to the corresponding single output color of all P*Q pixels of all N images.
 
--Cliff
 
PS: the N and K used above are different than the ones used in Caffe's convolution implementation, wherein M,N,K refer to matrix multiplication dimensions, which are different.
 
 
 

Gil Levi

unread,
Aug 31, 2014, 12:15:10 PM8/31/14
to caffe...@googlegroups.com
Thanks for your help, Cliff.

Gil

בתאריך יום רביעי, 27 באוגוסט 2014 01:35:44 UTC+3, מאת Cliff Woolley:

npit

unread,
Jun 5, 2015, 10:03:37 AM6/5/15
to caffe...@googlegroups.com
That's right.  These are two-dimensional convolutions.  The third dimension is some number of values ("colors") per pixel.  The convolution takes an input of N images with C colors sized H*W and produces an output of N images with K colors sized P*Q.  But it is spatially convolutional only in the H*W -> P*Q dimensions.  Every one of the K filters looks at all C colors. 

If every kernel looks at each slice in the C dimension, wouldn't that make the network colorblind (since the kernels for each color channel of the input image would be the same)?

汤旭

unread,
Dec 8, 2015, 3:08:11 AM12/8/15
to Caffe Users
Hi, do you know how to do it right now?  I have the same problem with you.   Could I get ur email address so that i can email you ?

在 2014年8月27日星期三 UTC+8上午3:32:58,Gil Levi写道:

汤旭

unread,
Dec 8, 2015, 3:19:31 AM12/8/15
to Caffe Users
My email address is tan...@shanghaitech.edu.cn

在 2015年12月8日星期二 UTC+8下午4:08:11,汤旭写道:

Soumen Pramanik

unread,
May 22, 2017, 11:48:08 AM5/22/17
to Caffe Users
Hello ,
Can you please let me know how to compute the output with number of kernel size as depth while ignoring the depth of input?
Reply all
Reply to author
Forward
0 new messages