Finetuning Deeplab related issues ('cuda success 2 vs 0 error' or 'unexpected label 255' error)

Aisha urooj khan

unread,

Oct 24, 2016, 7:01:31 PM10/24/16

to Caffe Users

Hello everyone,

I am trying to finetune Deeplab for my data which has just 2 classes, hands and background. I edited the deeplabLargeFOV prototxt file and solver.prototxt. My input image size is 224x126 with same sized segmentation images which are binary images with values either 0 or 1 (1 indicates hands). There are a number of issues I am facing in fine-tuning the network:

1) If I use batch size greater than 3 for even such small images, it gives me either 'cuda success 2 vs 0 error' or 'unexpected label 255' error. The label value appears random numbers.

2) If I keep batch size 1, it starts training, but loss drops really quickly to much smaller value and after 500 iterations it gives me accuracy of 1. (which is not the correct behavior in training process)

3) I tried to test my trained network to see the results even if it was over fitted due to step 2, and I noticed my network produces all black images and gives all zeros as an output from fc8.

Can anybody help me where I am doing wrong?

Here are my GPU related details:

NVIDIA-SMI 361.77                 Driver Version: 361.77                    |
|-------------------------------+----------------------+----------------------+
| GPU Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0 GeForce GTX TIT... Off | 0000:01:00.0      On |                  N/A |
| 22%   42C    P8    18W / 250W |    449MiB / 12211MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

The only changes I made are in my input data layers and fc8 layer as shown below:

name: "deeplab_largeFOV"

layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
image_data_param {
    root_folder: ""
    source: "/media/aisha/Drive2/Research_Project/datasets/egohands_data/list/train.txt"
    label_type: PIXEL
    batch_size: 5
    shuffle: true
}
transform_param {
    mean_value: 104.008
    mean_value: 116.669
    mean_value: 122.675
    crop_size: 321
    mirror: true
}
include: {
    phase: TRAIN
}
}

layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
image_data_param {
    root_folder: ""
    source: "/media/aisha/Drive2/Research_Project/datasets/egohands_data/list/train.txt"
    label_type: PIXEL
    batch_size: 5
    shuffle: true
}
transform_param {
    mean_value: 104.008
    mean_value: 116.669
    mean_value: 122.675
    crop_size: 513
    mirror: true
}
include: {
    phase: TEST
    stage: "test-on-train"
}
}

layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
image_data_param {
    root_folder: ""
    source: "/media/aisha/Drive2/Research_Project/datasets/egohands_data/list/val.txt"
    label_type: PIXEL
    batch_size: 5
    shuffle: true
}
transform_param {
    mean_value: 104.008
    mean_value: 116.669
    mean_value: 122.675
    crop_size: 513
    mirror: true
}
include: {
    phase: TEST
    stage: "test-on-test"
}
}

----------------------
Also here is my fc8 layer:
layer {
bottom: "fc7"
top: "fc8_EgoHands"
name: "fc8_EgoHands"
type: "Convolution"
# strict_dim: false
param {
    name: "fc8_w"
    lr_mult: 10
    decay_mult: 1
}
param {
    name: "fc8_b"
    lr_mult: 20
    decay_mult: 0
}
#blobs_lr: 10
#blobs_lr: 20
#weight_decay: 1
#weight_decay: 0
convolution_param {
    num_output: 2 #only 2 classes 'hands' and 'background'
    kernel_size: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
}
}

and the rest of the network:

layer {
bottom: "label"
top: "label_shrink"
name: "label_shrink"
type: "Interp"
interp_param {
    shrink_factor: 8
    pad_beg: 0
    pad_end: 0
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8_EgoHands"
bottom: "label_shrink"
loss_param {
    ignore_label: 255

}
include: { phase: TRAIN }
}

layer {
name: "accuracy"
type: "SegAccuracy"
bottom: "fc8_EgoHands"
bottom: "label_shrink"
top: "accuracy"
seg_accuracy_param {
    ignore_label: 255

}
include: { phase: TEST }
}

Also, my solver is as follows:

net: "/media/aisha/Drive2/datasets/egohands_data/deeplabTrainonEgoHands.prototxt"

test_state: { stage: 'test-on-train' }
test_iter: 16
test_state: { stage: 'test-on-test' }
test_iter: 16
test_interval: 500
test_compute_loss: true

lr_policy: "step"
gamma: 0.1
stepsize: 2000
base_lr: 0.001

display: 10
max_iter: 8000
momentum: 0.9
weight_decay: 0.0005

snapshot: 2000
snapshot_prefix: "/media/aisha/Drive2/datasets/egohands_data/EgoHandsSnapshots/deeplab_ego_finetune"
solver_mode: GPU

I have tried changing ignore_label field to 0 in my loss and accuracy layer, but it kept on giving me that 'unexpected label' error. So I left these fields as is.

I am not sure, where am I doing mistake. Any help will be highly appreciated.

Thanks!

blow...@gmail.com

unread,

Sep 8, 2017, 11:41:50 AM9/8/17

to Caffe Users

Hello sir,

I use deeplab v2 to segment image recently. I test the segmentation group of the voc2012 imageset with the vgg16 net. However, the result image is all black, which mentioned in your post. Do you solve this problem finally? Could you give me some help? if so, I really appreciate your help.

Regards!

auk

unread,

Sep 26, 2017, 1:15:07 PM9/26/17

to Caffe Users

For problem 1 of 'unexpected label', I posted the solution here:

https://stackoverflow.com/questions/40370852/unexpected-labal-38-deeplab-error/44530258#44530258

For memory error, I used batch accumulation property of caffe, and set iter_size property in solver.prototxt file. See this: https://github.com/BVLC/caffe/issues/1929

Problem 3 was a consequence of problem 2, so it should be solved if network is trained properly.

I hope it helps!

Reply all

Reply to author

Forward