What are the meaning of 1x1 conv layers?

瀏覽次數:1,067 次
跳到第一則未讀訊息

Alex Orloff

未讀,
2016年1月28日 下午5:39:452016/1/28
收件者:Caffe Users
If I make a convolution with core 1x1 it simply means multiplication whole image on same coefficient.
Really don't understand how can it help in recognition process.
Sorry for such dump question, may be some articles regarding it?
Thanks

Oleg Klimov

未讀,
2016年1月29日 清晨6:17:382016/1/29
收件者:Caffe Users
On Friday, January 29, 2016 at 1:39:45 AM UTC+3, Alex Orloff wrote:
If I make a convolution with core 1x1 it simply means multiplication whole image on same coefficient.

Right, but if it's followed by ReLU or other nonlinearity it makes sense. Think of applying rectifier, flipping function upside down (*= -1), applying again.

Oleg.

Youssef Kashef

未讀,
2016年1月29日 上午10:41:082016/1/29
收件者:Caffe Users
A 1x1 conv. filter would not be able to learn any spatial features, but it would still be capable of learning any position invariant linear combination of input (e.g. combination of colors in early layers), if at all relevant to the task at hand.

ath...@ualberta.ca

未讀,
2016年1月29日 下午2:27:492016/1/29
收件者:Caffe Users
If the previous layer has 128 feature maps (say) then "1x1 convolutions" are convolutions across all these feature maps with filters each of size 1x1x128. Say one chooses to have 64 of these 1x1x128 dim filters, then the result will be 64 features maps, each the same size as before. View each output feature map as "per-pixel" projections (dot-product) onto a lower dimensional space using a single learned filter (weights tied) across all feature maps. Basically, they just crush 128 feature maps (representational responses to 128 learned filters) into 64 feature maps ignoring the spatial dimension.

Remember that larger filters like a 3x3x128 filter would also learn to summarize feature responses across all feature maps so in this way all size filters do the same thing. The only difference is that 1x1 (learned) filters ONLY do this across feature-maps where 3x3 filters (say) also consider local spatial correlations.

So, they are used for two reasons:

1. Dimensionality reduction: When performing larger size convolutions (spatial 3x3 or 5x5...) over a large number of feature maps, bringing down the dimensions in depth (# feature maps) reduces computations dramatically. This is done in GoogLeNet Inception modules (2).

2. Since ReLU will be applied again, it is yet another non-linearity that can be helpful.

See: 1. Network in Network paper: http://arxiv.org/abs/1312.4400
        2. Going Deeper with Convolutions paper: http://arxiv.org/abs/1409.4842

Hope this helps

Cheers,
Andy

Alex Orloff

未讀,
2016年1月29日 下午3:33:232016/1/29
收件者:Caffe Users
Thank you Andy.
If you not object, one more question.
Can I use pooling layers with stride=2 and kernel=2 to blobs with odd dimentions?
What will be output blob such case?

пятница, 29 января 2016 г., 22:27:49 UTC+3 пользователь ath...@ualberta.ca написал:

ath...@ualberta.ca

未讀,
2016年2月1日 下午1:03:452016/2/1
收件者:Caffe Users
(n - f)/s +1  where n is size (width or height), f is filter size, s is stride.

zzz

未讀,
2016年2月2日 下午3:01:482016/2/2
收件者:Caffe Users
I think we can understand the 1*1 convolution as a pixel-wise linear classifier.
For example the input feature map size is (128,500,500).
If the output is (1,500,500), the convolution kernel size would be (1,128,1,1). This 128 dimensional vector is actually a linear classifier.

Correct me if  I am wrong please.
回覆所有人
回覆作者
轉寄
0 則新訊息