I'm doing a joints regression by using Caffe. I would like to predict the position of hand joints by given the depth image.
I modified the image_data_layer to accept multiple labels as a heatmap for one single joint. And also changed the EulicdeanLoss layer so the loss is divided by the total number of the blob rather than only the batch size.
Here is my train.prototxt
name: "HeatmapNet"
layer {
name: "data"
type: "HeatmapImageData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
is_rotate: true
angle: 20
}
heatmap_image_data_param {
source: "/home/zhi/caffe-cz/data/hands/data/80/result.txt"
root_folder: "/home/zhi/caffe-cz/data/hands/data/80/image/"
batch_size: 200
label_height: 40
label_width: 40
shuffle: true
}
}
layer {
name: "data"
type: "HeatmapImageData"
top: "data"
top: "label"
include {
phase: TEST
}
heatmap_image_data_param {
source: "/home/zhi/caffe-cz/data/hands/test_data/80/result.txt"
root_folder: "/home/zhi/caffe-cz/data/hands/test_data/80/image/"
batch_size: 20
label_height: 40
label_width: 40
shuffle: true
}
}
#########################################################
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 16
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 4
stride: 4
}
}
#########################################################
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
kernel_size: 4
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#########################################################
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 1600
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
#########################################################
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 1600
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
#########################################################
#########################################################
layer {
name: "loss_heatmap"
type: "EuclideanLossHeatmap"
bottom: "ip2"
bottom: "label"
top: "loss_heatmap"
loss_weight: 1
}
I tested my data layer, it looks good, however after I trained with 30k images, I always get the same position for each joint, even for the training image. The training loss decreases very fast and remains unchanged at a particular value. Then I tried to rotate the images before training. And now the training loss converge to a lower value but test loss still reaches a value quickly and then remain unchanged.
So I think my network didn't learn anything useful. I suspect maybe it's problem that every iteration I only feed in a small subset of images, so when I randomly rotate the training set, the training loss decrease but the test loss doesn't. However, I checked my data layer, I did read in every image, and this functional is given by caffe itself.
I tried different learning rate from 0.0002 to 0.2, I even tried to add dropout and it turns it's not overfitting.