How does a 3D filter results in a 2D response?

32 views
Skip to first unread message

askari...@gmail.com

unread,
Jan 1, 2017, 4:14:24 PM1/1/17
to Caffe Users
Hello.
I have a question about convolution layers. Consider this tutorial. The first convolutional layer takes a 3x227x227 image as an input and applies 96 filters of size 3x11x11 on it. As I understand in this layer we separate the channels of the input and filter, and apply each filter to its respective channel. For instance we apply the first channel of the filter to first channel of the image, the second channel of the filter to the second channel of the image and finally we apply the third channel of filter to the the third channel of the image. Now considering the stride length (which is 4) the result should be a 55x55 image with three channels. But in the activation we have a 55x55 image with only one channel. I want to know how do we get  from 3 channel response to a one channel response. What is the exact procedure?

Thank you for your time.

Przemek D

unread,
Jan 2, 2017, 5:47:56 AM1/2/17
to Caffe Users
You are right in your intuition that we "apply" i-th channel of a filter to i-th channel of the input blob. You must however understand what it really means to "apply" a filter.
The procedure can be imagined as cutting out a part of the input blob (receptive field on a neuron). To go with your example, it'd be a 3x11x11 image, containing pixels (for example) 0:10 in width & height axes. Now we flatten that image into a vector and do the same with the 3x11x11 filter, and compute a dot product between the two. This way we obtain a single value for each filter for each spatial location in the image.
For more detailed reading, I encourage you to check this article (link is to the section on conv layers).
Reply all
Reply to author
Forward
0 new messages