How to calculate layer output size?

Ilya Zhenin

unread,

Jun 15, 2015, 9:49:11 AM6/15/15

to caffe...@googlegroups.com

For example, I have image of size 1x34x34. Then, in training phase there is a convolution on this image.

num_output: 128
kernel_size: 5
stride: 2

As I understand after convolution there will be 128 images with 1 channel for gray image input and with 3 for RGB. Is it right? Further, how to compute size of this images? Would it be 15x15? And if kernel_size = 4?

also let's suppose that next layer is pooling

pool: MAX
kernel_size: 2
stride: 2

that takes as input previous convolution..How to calculate size of pooling layer?

Philip H

unread,

Jun 19, 2015, 11:19:49 AM6/19/15

to caffe...@googlegroups.com

You will get 128 activation maps for the 128 kernels you're training.

Each activation map will have the size (width - kernel_size + 2*pad)/stride +1 and likewise for the height (as documented here http://caffe.berkeleyvision.org/tutorial/layers.html).

The same thing happens in your pooling layer.

Cheers,

Phil

Jarvis Du

unread,

Aug 20, 2015, 3:18:30 PM8/20/15

to Caffe Users

Hi Phil,

Can you tell me whether the divide / is to take floor or ceil? I guess it should be floor, but somehow it is ceil for maxpooling in caffe.

Jarvis Du

unread,

Aug 21, 2015, 10:02:00 AM8/21/15

to Caffe Users

A Small example would be the following prototxt file.

name: "net"
input: "data" 
input_dim: 1 
input_dim: 3 
input_dim: 231 
input_dim: 231 
state { 
  phase: TEST 
} 
layer { 
  name: "conv1" 
  type: "Convolution" 
  bottom: "data" 
  top: "conv1" 
  convolution_param { 
    num_output: 48 
    pad: 4 
    kernel_h: 9 
    kernel_w: 9 
    stride_h: 4 
    stride_w: 4 
  } 
} 
layer { 
  name: "relu1" 
  type: "ReLU" 
  bottom: "conv1" 
  top: "conv1" 
} 
layer { 
  name: "conv2" 
  type: "Convolution" 
  bottom: "conv1" 
  top: "conv2" 
  convolution_param { 
    num_output: 64 
    pad: 0 
    kernel_h: 5 
    kernel_w: 5 
    stride_h: 1
    stride_w: 1 
  } 
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    pad: 0
    kernel_h: 2
    kernel_w: 2
    stride_h: 2
    stride_w: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "pool2"
  top: "pool2"
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 0
    kernel_h: 3
    kernel_w: 3
    stride_h: 1
    stride_w: 1
  }
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: MAX
    pad: 0
    kernel_h: 2
    kernel_w: 2
    stride_h: 2
    stride_w: 2
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "pool3"
  top: "pool3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "pool3"
  top: "conv4"
  convolution_param {
    num_output: 64
    pad: 0
    kernel_h: 3
    kernel_w: 3
    stride_h: 1
    stride_w: 1
  }
}
layer {                                                                      
  name: "pool4"
  type: "Pooling"
  bottom: "conv4"
  top: "pool4"
  pooling_param {
    pool: MAX
    pad: 0
    kernel_h: 2
    kernel_w: 2
    stride_h: 2
    stride_w: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "pool4"
  top: "pool4"
}layer {
  name: "conv5"
  type: "Convolution"
  bottom: "pool4"
  top: "conv5"
  convolution_param {
    num_output: 32
    pad: 0
    kernel_h: 3
    kernel_w: 3
    stride_h: 1
    stride_w: 1
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}

For this architecture, the final output should be 32*3*3=288, but it gives 32*4*4=512. By scrutinizing every layer, the problem comes with pooling layer.

For example, for a maxpooling layer with input size 25*25, where the pooling parameters include: kernel size of 2 and stride of 2. The output should be of size 12*12 rather than 13*13 from caffe.

On Friday, June 19, 2015 at 11:19:49 AM UTC-4, Philip H wrote:

Sergii Bondariev

unread,

Jan 11, 2017, 6:30:45 PM1/11/17

to Caffe Users

It is indeed rounded up in caffe, see pooling_layer.cpp, not "ceil":

pooled_height_ = static_cast<int>(ceil(static_cast<float>(

height_ + 2 * pad_h_ - kernel_h_) / stride_h_)) + 1;

pooled_width_ = static_cast<int>(ceil(static_cast<float>(

width_ + 2 * pad_w_ - kernel_w_) / stride_w_)) + 1;

Reply all

Reply to author

Forward