I found an interesting thing in computing number of parameters. Let's take an example of Deconvolution layer as
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "./list.txt"
batch_size: 1
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 8
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std : 0.01
}
}
}
layer {
name: "deconv1"
type: "Deconvolution"
bottom: "conv1"
top: "deconv1"
convolution_param {
num_output: 2
bias_term: false
pad: 1
kernel_size: 4
group: 2
stride: 2
weight_filler {
type: "bilinear"
}
}
}
Assume that my data size is 3x256x256. Then the number of parameters for above network is
3x3x3x8+4x4x8=200 learned parameters
However, if I uncomment `group=2` in deconvolution layer, the number of parameters will be
3x3x3x8+4x4x2x8=328 learned parameters
In CAFFE, with the group=2, we can print the number of parameters as
Layer-wise parameters:
[('conv1',(8, 1, 3, 3)), ('deconv1', (8, 1, 4, 4))]
It looks like not so fair to compute the number of learned parameters for both cases (with/without using group). My question is which is the correct number of parameters do we have for above network architecture?