Hi everybody,
I'd like to describe a deep learning network which has Maxout/Feature_pool layers (I heard these two denominations, I'll explain what this layer do in detail) in caffe description.
Maxout/Feature_pool layer look at each pixel of 2 channels (usually after a convolution) and combine them in 1 channel by choosing the MAX value. The image size doesn't change but the number of channels is divided by 2.
The dimensions can be like this example:
Input of maxout layer: (1,10,100,100)
Maxout combine channels 2 by 2
Output of maxout layer: (1,5,100,100)
As I know, this layer doesn't exist in caffe (In the ones defined on
http://caffe.berkeleyvision.org) but there is certainly a way to describe it.
As i'm a new caffe user, i presume I'm not the first to ask but i still can't find a clear answer about it.
I've read that it can be described by a SLICE layer followed by an ELTWISE layer. But it is not clear to me how they compute the datas.
Maybe someone have an answer here?
Thanks