Hello everyone,
I am trying to finetune Deeplab for my data which has just 2 classes, hands and background. I edited the deeplabLargeFOV prototxt file and solver.prototxt. My input image size is 224x126 with same sized segmentation images which are binary images with values either 0 or 1 (1 indicates hands). There are a number of issues I am facing in fine-tuning the network:
1) If I use batch size greater than 3 for even such small images, it gives me either 'cuda success 2 vs 0 error' or 'unexpected label 255' error. The label value appears random numbers.
2) If I keep batch size 1, it starts training, but loss drops really quickly to much smaller value and after 500 iterations it gives me accuracy of 1. (which is not the correct behavior in training process)
3) I tried to test my trained network to see the results even if it was over fitted due to step 2, and I noticed my network produces all black images and gives all zeros as an output from fc8.
Can anybody help me where I am doing wrong?
Here are my GPU related details:
NVIDIA-SMI 361.77 Driver Version: 361.77 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:01:00.0 On | N/A |
| 22% 42C P8 18W / 250W | 449MiB / 12211MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
The only changes I made are in my input data layers and fc8 layer as shown below:
name: "deeplab_largeFOV"
layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
image_data_param {
root_folder: ""
source: "/media/aisha/Drive2/Research_Project/datasets/egohands_data/list/train.txt"
label_type: PIXEL
batch_size: 5
shuffle: true
}
transform_param {
mean_value: 104.008
mean_value: 116.669
mean_value: 122.675
crop_size: 321
mirror: true
}
include: {
phase: TRAIN
}
}
layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
image_data_param {
root_folder: ""
source: "/media/aisha/Drive2/Research_Project/datasets/egohands_data/list/train.txt"
label_type: PIXEL
batch_size: 5
shuffle: true
}
transform_param {
mean_value: 104.008
mean_value: 116.669
mean_value: 122.675
crop_size: 513
mirror: true
}
include: {
phase: TEST
stage: "test-on-train"
}
}
layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
image_data_param {
root_folder: ""
source: "/media/aisha/Drive2/Research_Project/datasets/egohands_data/list/val.txt"
label_type: PIXEL
batch_size: 5
shuffle: true
}
transform_param {
mean_value: 104.008
mean_value: 116.669
mean_value: 122.675
crop_size: 513
mirror: true
}
include: {
phase: TEST
stage: "test-on-test"
}
}
----------------------
Also here is my fc8 layer:
layer {
bottom: "fc7"
top: "fc8_EgoHands"
name: "fc8_EgoHands"
type: "Convolution"
# strict_dim: false
param {
name: "fc8_w"
lr_mult: 10
decay_mult: 1
}
param {
name: "fc8_b"
lr_mult: 20
decay_mult: 0
}
#blobs_lr: 10
#blobs_lr: 20
#weight_decay: 1
#weight_decay: 0
convolution_param {
num_output: 2 #only 2 classes 'hands' and 'background'
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
and the rest of the network:
layer {
bottom: "label"
top: "label_shrink"
name: "label_shrink"
type: "Interp"
interp_param {
shrink_factor: 8
pad_beg: 0
pad_end: 0
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8_EgoHands"
bottom: "label_shrink"
loss_param {
ignore_label: 255
}
include: { phase: TRAIN }
}
layer {
name: "accuracy"
type: "SegAccuracy"
bottom: "fc8_EgoHands"
bottom: "label_shrink"
top: "accuracy"
seg_accuracy_param {
ignore_label: 255
}
include: { phase: TEST }
}
Also, my solver is as follows:
net: "/media/aisha/Drive2/datasets/egohands_data/deeplabTrainonEgoHands.prototxt"
test_state: { stage: 'test-on-train' }
test_iter: 16
test_state: { stage: 'test-on-test' }
test_iter: 16
test_interval: 500
test_compute_loss: true
lr_policy: "step"
gamma: 0.1
stepsize: 2000
base_lr: 0.001
display: 10
max_iter: 8000
momentum: 0.9
weight_decay: 0.0005
snapshot: 2000
snapshot_prefix: "/media/aisha/Drive2/datasets/egohands_data/EgoHandsSnapshots/deeplab_ego_finetune"
solver_mode: GPU
I have tried changing ignore_label field to 0 in my loss and accuracy layer, but it kept on giving me that 'unexpected label' error. So I left these fields as is.
I am not sure, where am I doing mistake. Any help will be highly appreciated.
Thanks!