Why in FCN 8 for in first convolutional layers used padding = 100? What's the point of it?

Ilya Zhenin

unread,

Aug 15, 2016, 8:04:51 AM8/15/16

to Caffe Users

FCN for semantic segmentation taken from here:https://github.com/shelhamer/fcn.berkeleyvision.org

here is the layer:

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 100
    kernel_size: 3
    stride: 1
  }
}

whjxnyzh

unread,

Sep 1, 2016, 5:17:46 AM9/1/16

to Caffe Users

I also want to know why.

up

在 2016年8月15日星期一 UTC+8下午8:04:51，Ilya Zhenin写道：

xdtl

unread,

Sep 1, 2016, 10:25:47 AM9/1/16

to Caffe Users

My guess is for segmentation problem, an output mask with exactly the same size as the input image is always desired. If there is no padding, the output of each layer will get smaller and smaller, and it is impossible for the following deconvolution layers to recover the original dimension.

whjxnyzh

unread,

Sep 2, 2016, 3:28:30 AM9/2/16

to Caffe Users

But why use 100 padding. I think this is abnormal.

Thanks.

在 2016年9月1日星期四 UTC+8下午10:25:47，xdtl写道：

Ilya Zhenin

unread,

Sep 2, 2016, 5:21:33 AM9/2/16

to Caffe Users

Size gets shrinked anyway at pooling stages(and this zero padding through all those layers gets from 100 pixels to something about 3-5 pixels), network produces nothing at this boundaries, added by padding, so I really can't see why add it at the first layer(network at next layers padded also but just with 1-2 pixels to compensate convolutional kernel size). Upsampling here comes though deconvolutional layer.

My suggestion now is following, network output from itermidiate layers gets concatenated at later stages with deconv layers output, so padding added to match the sizes. But not sure about that, since nowhere before deconvolutional layers blobs do not cropped to fixed size, size depends from, so there is few other ways: add padding when needed instead of wasting memory from the start, or more precise strides calculation to match sizes. But it's just suggestion, not sure whether it's true.

четверг, 1 сентября 2016 г., 17:25:47 UTC+3 пользователь xdtl написал:

Sankar Pira

unread,

Sep 4, 2016, 7:33:53 PM9/4/16

to Caffe Users

As per my understanding, you don't need to pad. But if you remove the padding (100), you need to adjust the other layers padding especially, at the end of the network, to make sure the output matches the label/input size.

If you look at matconvnet implementation of fcn8, you will see they removed the padding and adjusted other layer parameters.

Reply all

Reply to author

Forward