Different learned parameters with/without group in Deconvolution layer

19 views

Skip to first unread message

john1...@gmail.com

unread,

May 21, 2017, 4:55:55 AM5/21/17

to Caffe Users

I found an interesting thing in computing number of parameters. Let's take an example of Deconvolution layer as

    layer {
      name: "data"
      type: "HDF5Data"
      top: "data"
      top: "label"
      include {
        phase: TRAIN
      }
      hdf5_data_param {
        source: "./list.txt"
        batch_size: 1
      }
    }
    layer {
      name: "conv1"
      type: "Convolution"
      bottom: "data"
      top: "conv1"
      param {
        lr_mult: 1
        decay_mult: 1
      }
      param {
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 8
        pad: 1
        kernel_size: 3
        stride: 1
        weight_filler {
          type: "gaussian"
          std : 0.01
        }
      }
    }
    layer {
      name: "deconv1"
      type: "Deconvolution"
      bottom: "conv1"
      top: "deconv1"
      convolution_param {
        num_output: 2
        bias_term: false
        pad: 1
        kernel_size: 4
        group: 2
        stride: 2
        weight_filler {
          type: "bilinear"
        }
      }
    }

Assume that my data size is 3x256x256. Then the number of parameters for above network is

3x3x3x8+4x4x8=200 learned parameters

However, if I uncomment `group=2` in deconvolution layer, the number of parameters will be

3x3x3x8+4x4x2x8=328 learned parameters

In CAFFE, with the group=2, we can print the number of parameters as

    Layer-wise parameters: 
    [('conv1',(8, 1, 3, 3)), ('deconv1', (8, 1, 4, 4))]

It looks like not so fair to compute the number of learned parameters for both cases (with/without using group). My question is which is the correct number of parameters do we have for above network architecture?