Fine-tuning an FCN for interactive object segmentation

270 views
Skip to first unread message

Ali

unread,
Oct 26, 2016, 5:59:14 AM10/26/16
to Caffe Users
Hi all,

I'm trying to implement the proposed model in a CVPR paper (Deep Interactive Object Selection) in which the data set contains 5 channels for each input sample:

1.Red
2.Blue
3.Green
4.Euclidean distance map associated to positive clicks
5.Euclidean distance map associated to negative clicks (as follows):


To do so, I should fine tune the FCN-32s network using "object binary masks" as labels:


As you see, in the first conv layer I have 2 extra channels, so I did net surgery to use pretrained parameters for the first 3 channels and Xavier initialization for 2 extras.

For the rest of the FCN architecture, I have these questions:

1. Should I freeze all the layers before "fc6" (except the first conv layer)? If yes, how the extra channels of the first conv will be learned? Are the gradients strong enough to reach the first conv layer during training process?

2. What should be the kernel size of the "fc6"? should I keep 7? I saw in "Caffe net_surgery" notebook that it depends on the output size of the last layer ("pool5").

3. The main problem is the number of outputs of the "score_fr" and "upscore" layers, since I'm not doing class segmentation (to use 21 for 20 classes and the background), how should I change it? What about 2? (one for object and the other for the non-object (background) area? 

4. Should I change "crop" layer "offset" to 32 to have center crops?

5. In case of changing each of these layers, what is the best initialization strategy for them? bilinear for "upscore" and "Xavier" for the rest?  

Any useful idea will be appreciated. 

PS:
I'm using Euclidean loss.

Jonathan R. Williford

unread,
Oct 27, 2016, 8:50:26 AM10/27/16
to Caffe Users
1. I would guess that it be better to not freeze the layers, so that the other layers could learn how to optimally use the additional information. You can always try both approaches.

2. I think either way would be fine. If pool5 increases size, but you keep the kernel size, you may need to do a pooling operation later on, like with Hong et al 2015 Decoupled Net (https://arxiv.org/abs/1506.04924).

3. Yes, you would use two layers and you would call softmax or softmax with loss on them, such that the layers add to 1.

4. Usually you train with random cropping and then use center crops for testing.

5. I'm not familiar enough with the paper to know what "upscore" is, but Xavier is usually a good bet.

Best,
Jonathan

Ali

unread,
Oct 28, 2016, 4:23:27 AM10/28/16
to Caffe Users
Thank you Jonathan,

Ali

unread,
Oct 28, 2016, 9:12:56 AM10/28/16
to Caffe Users
At the moment, I'm trying to over fit my net using just one input image (dropout is removed) to be sure about its learning ability, while:

1. I used "FCN-32s PASCAL" to start fine tuning.
2. I changed the output number of the "score_fr" and "upscore" layers to 2, so I should learn them from scratch:
 
layer {
  name: "My_score_fr"
  type: "Convolution"
  bottom: "fc7"
  top: "score_fr"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 2 #instead of 21 because we have just object and non object areas
    pad: 0
    kernel_size: 1
    ##########
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    ##########
  }
}
layer {
  name: "My_upscore"
  type: "Deconvolution"
  bottom: "score_fr"
  top: "upscore"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 2 #instead of 21 because we have just object and non object areas
    group: 2
    bias_term: false
    kernel_size: 64
    stride: 32
    ##########
    weight_filler {
      type: "bilinear"
    }
    ##########
  }
}

In my artificial over fitting process, convergence rate is low.
Is it normal? Can I trust my net and start leaning it using my training data?
Do you see any problem in these two layers?

Jonathan R. Williford

unread,
Oct 28, 2016, 9:51:26 AM10/28/16
to Ali, Caffe Users
Are you modifying an existing prototxt? What is your complete prototxt definition?

Cheers,
Jonathan

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/KCYps75JfHg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users+unsubscribe@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/a6996c79-e46e-4787-8ef7-54c2c5722d19%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Ali

unread,
Oct 28, 2016, 10:20:29 AM10/28/16
to Caffe Users, ash...@gmail.com
name: "Interactive-Segmentation-FCN-32s"
#################################################
layer {
  name: "Train-RGB"
  type: "Data"
  top: "RGB"
  top: "FakeLabel1"
  include {
    phase: TRAIN
  }
 transform_param {
    mean_value: 101.709287 #B
    mean_value: 110.372298 #G
    mean_value: 115.832027 #R
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Train_lmdb_files_Resized/Train_RGB_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Val-RGB"
  type: "Data"
  top: "RGB"
  top: "FakeLabel2"
  include {
    phase: TEST
  }
 transform_param {
    mean_value: 102.216802 #B
    mean_value: 110.865131 #G
    mean_value: 116.176926 #R
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Val_lmdb_files_Resized/Val_RGB_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Train-POS"
  type: "Data"
  top: "POS"
  top: "FakeLabel3"
  include {
    phase: TRAIN
  }
 transform_param {
    mean_value: 111.063883
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Train_lmdb_files_Resized/Train_POS_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Val-POS"
  type: "Data"
  top: "POS"
  top: "FakeLabel4"
  include {
    phase: TEST
  }
 transform_param {
    mean_value: 110.994911
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Val_lmdb_files_Resized/Val_POS_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Train-NEG"
  type: "Data"
  top: "NEG"
  top: "FakeLabel5"
  include {
    phase: TRAIN
  }
 transform_param {
    mean_value: 98.090021
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Train_lmdb_files_Resized/Train_NEG_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Val-NEG"
  type: "Data"
  top: "NEG"
  top: "FakeLabel6"
  include {
    phase: TEST
  }
 transform_param {
    mean_value: 98.150967
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Val_lmdb_files_Resized/Val_NEG_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Train-LAB"
  type: "Data"
  top: "LAB"
  top: "FakeLabel7"
  include {
    phase: TRAIN
  }
 transform_param {
    #mean_value: 24.934351
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Train_lmdb_files_Resized/Train_LAB_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
layer {
  name: "Val-LAB"
  type: "Data"
  top: "LAB"
  top: "FakeLabel8"
  include {
    phase: TEST
  }
 transform_param {
    #mean_value: 24.778228
    mirror: false
    #crop_size:128 
    #crop_size:227
  }
  data_param {
    source: "/Val_lmdb_files_Resized/Val_LAB_lmdb"
    
    batch_size: 8
    backend: LMDB
  }
}
#################################################
# to prevent having the value of 255 in white pixels of the label
layer {
name: "NormLAB"
type: "Power"
bottom: "LAB"
top: "LAB"
power_param { 
power:1 
        scale: 0.003921568627
        shift: 0
}
}
#################################################
layer {
  name: "concat1"
  type: "Concat"
  bottom: "RGB"
  bottom: "POS"
  bottom: "NEG"
  top: "RGBPOSNEG"
  concat_param {
    axis: 1
  }
}
#################################################
layer {
  name: "My_conv1_1"
  type: "Convolution"
  bottom: "RGBPOSNEG"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 100
    kernel_size: 3
    stride: 1
  }
}
#################################################
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu1_2"
  type: "ReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_2"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu2_1"
  type: "ReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu2_2"
  type: "ReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2_2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu3_1"
  type: "ReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu3_2"
  type: "ReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu3_3"
  type: "ReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3_3"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "pool3"
  top: "conv4_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu4_1"
  type: "ReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu4_2"
  type: "ReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu4_3"
  type: "ReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "pool4"
  type: "Pooling"
  bottom: "conv4_3"
  top: "pool4"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv5_1"
  type: "Convolution"
  bottom: "pool4"
  top: "conv5_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu5_1"
  type: "ReLU"
  bottom: "conv5_1"
  top: "conv5_1"
}
layer {
  name: "conv5_2"
  type: "Convolution"
  bottom: "conv5_1"
  top: "conv5_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu5_2"
  type: "ReLU"
  bottom: "conv5_2"
  top: "conv5_2"
}
layer {
  name: "conv5_3"
  type: "Convolution"
  bottom: "conv5_2"
  top: "conv5_3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu5_3"
  type: "ReLU"
  bottom: "conv5_3"
  top: "conv5_3"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5_3"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "Convolution"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 7 
    stride: 1
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "Convolution"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 1
    stride: 1
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "My_score_fr"
  type: "Convolution"
  bottom: "fc7"
  top: "score_fr"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 2 #instead of 21 because we have only object and background areas (binary labels)
    pad: 0
    kernel_size: 1
    ##########
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    ##########
  }
}
layer {
  name: "My_upscore"
  type: "Deconvolution"
  bottom: "score_fr"
  top: "upscore"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 2 #instead of 21 because we have only object and background areas (binary labels)
    group: 2
    bias_term: false
    kernel_size: 64
    stride: 32
    ##########
    weight_filler {
      type: "bilinear"
    }
    ##########
  }
}
layer {
  name: "score"
  type: "Crop"
  bottom: "upscore"
  bottom: "LAB"
  top: "score"
  crop_param {
    axis: 2
    offset: 19 # 32 ?  to have center cropping
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "LAB"
  top: "loss"
  loss_param {
    ignore_label: 255
    normalize: false
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "score"
  bottom: "LAB"
  top: "accuracy"
  include {
    phase: TEST
  }
}

Reply all
Reply to author
Forward
0 new messages