How to use randomly initial weight for FCN-32s?

john1...@gmail.com

unread,

Feb 7, 2017, 1:43:31 AM2/7/17

to Caffe Users

Hello all, in the FCN-32s, the author used pre-trained weighted to his model as follows:

weights = './fcn/voc-fcn16s/fcn16s-heavy-pascal.caffemodel'
solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)

It worked well in the Pascal dataset. However, in my worked, I used another dataset such as medical images. Hence, the pre-trained weight does not suitable for my task. How could we use randomly initial weighted, instead of pre-trained weight? Thank all

Przemek D

unread,

Feb 7, 2017, 2:27:48 AM2/7/17

to Caffe Users

Simply don't supply any weights and they will get randomly initialized for you. However, you will most likely fail the training with everything in the network going zero. It's rarely true that the pretrained weights are not suitable to your task. For data of the same modality (images, audio, text), fine-tuning almost always works better than training from random weights. Recommended reading: first the indispendable notes from CS231n, then Torrey and Shavlik, Transfer Learning.

john1...@gmail.com

unread,

Feb 7, 2017, 3:16:45 AM2/7/17

to Caffe Users

I tried and it fails. Even I test with VOC-Pascal (21 classes) or my databases (8 classes)( Note that I cannot use pre-trained weighted of FCN as fcn16s-heavy-pascal.caffemodel due to a number of the class are different).

This is my code what I modifed

#weights = './fcn/voc-fcn16s/fcn16s-heavy-pascal.caffemodel'
# init
caffe.set_device(int(sys.argv[1]))
caffe.set_mode_gpu()




solver = caffe.SGDSolver('solver.prototxt')

#solver.net.copy_from(weights)

The final result is zero when I test with an input image.

Vào 15:43:31 UTC+9 Thứ Ba, ngày 07 tháng 2 năm 2017, john1...@gmail.com đã viết:

Przemek D

unread,

Feb 7, 2017, 7:24:39 AM2/7/17

to Caffe Users

Please keep our conversation here istead of using private messages, for clarity and so others might learn something from it.

I cannot use pre-trained weighted of FCN as fcn16s-heavy-pascal.caffemodel due to a number of the class are different

This is wrong. You can and you should use pretrained weights - except the last layer, because only the last layer is modified. Rest of the network stays the same.

As to everything being zero: FCN readme explicitly tells you why that happens ("This is almost universally due to not initializing the weights as needed."). If you really really really wanted to train from scratch, you'd have to initialize weights somehow else. I checked the FCN prototxts - they do not include any weight_fillers. This means you created a layer and didn't fill it with anything (no pretrained weights, no random initializations). Inference on any image through such network will result in zeros everywhere. Consult caffe reference AlexNet model to see how weight_filler and bias_filler work.

What you want to do is rename the classifier layer (the one whose num_outputs equals to number of classes) so that its weights are not loaded. But as mentioned above - you must still initialize it by adding fillers. Otherwise you've successfully loaded all weights for feature extraction, but left the classifier at zero - so no matter what your convolutional layers detect, classifier outputs zero all the way.

john1...@gmail.com

unread,

Feb 8, 2017, 6:13:00 AM2/8/17

to Caffe Users

Thank you so much for your valuable comment. Let's parse it line by line

You can and you should use pretrained weights - except the last layer, because only the last layer is modified. Rest of the network stays the same.

In my case, I used 3 classes, hence, I modified the number of classes in `score_fr` and `upscore` from 21 to 3 classes. Because the `upscore` layer will not use the pre-train (different classes number), hence, we need an initial weight. I set as `weight_filler: { type: "bilinear" }`. Hence, from the above comment, we first modified the two last layer in FCN-32s as follows (Note that I did not rename the layers, it will rename in the next step):

layer {
name: "score_fr"
type: "Convolution"
bottom: "fc7"
top: "score_fr"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 3
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 3
bias_term: false
kernel_size: 64
stride: 32
group: 2
weight_filler: { type: "bilinear" }
}
}

What you want to do is rename the classifier layer (the one whose num_outputs equals to number of classes) so that its weights are not loaded. But as mentioned above - you must still initialize it by adding fillers. Otherwise you've successfully loaded all weights for feature extraction, but left the classifier at zero - so no matter what your convolutional layers detect, classifier outputs zero all the way.

In the second point, we need to rename the layer which corresponds to changed class number. As the first point, we modified the number of class in the `score_fr` and `upscore` layers, hence we will rename the layers name so that the caffe does not load the pre-trained weight for these layers. We will rename `score_fr` to `score_fr_3classes` and `upscore` to `upscore_3classes`. Finally, we obtained

layer {

type: "Convolution"

bottom: "fc7"

top: "score_fr_3classes"

param {

lr_mult: 1

decay_mult: 1

}

param {

lr_mult: 2

decay_mult: 0

}

convolution_param {

num_output: 3

pad: 0

kernel_size: 1

}

layer {

type: "Deconvolution"

bottom: "score_fr_3classes"

top: "upscore_3classes"

param {

lr_mult: 0

}

convolution_param {

num_output: 3

bias_term: false

kernel_size: 64

stride: 32

group: 2

weight_filler: { type: "bilinear" }

}

layer {

type: "Crop"

bottom: "upscore_3classes"

bottom: "data"

top: "score"

crop_param {

axis: 2

offset: 19

}

Am I correct? In additions, why we need to set `group: 2` in `upscore_3classes`, otherwise, the training loss does not decrease. Thank you

Vào 21:24:39 UTC+9 Thứ Ba, ngày 07 tháng 2 năm 2017, Przemek D đã viết:

Przemek D

unread,

Feb 13, 2017, 4:09:16 AM2/13/17

to Caffe Users

The group param is usually used with bilinear upsampling filter, so that each channel is convolved individually (this has been discussed on this group before). However, I doubt the bilinear filter is a good choice for you - try using gaussian initializer, with no group set.

weight_filler {
  type: "gaussian"
  std: 0.01
}