I've trained training a conv. net on multiple tasks by adding an inner-product layer + loss layer for each task. I don't think that Caffe supports constructing the label vector manually and adding 1's to allow for two classes to coexist. But I'm not sure if this is going to be any better than the former setup with ip+loss layers per task.
I'm concerned about how the gradients are computed when a weight is influence by two loss functions. So the weights of the ip for each class is only influenced by the loss for this class. Once you go further backwards and arrive at a layer that is shared between both "branches" the update of the layer will be the summation of partial derivatives From tracing the backward pass, it seems that the gradient for a shared weight is updated for 'loss1', before propagating backwards further, it'll compute loss2 and propagate that backwards to that shared layer. Bascially adding another term to the updated weight.
Hi Vimal,
I'm not sure if I understanding you point correctly. But isn't the shuffling is set as false by default in caffe.proto (https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L1216)? I think for two sources (one for label, the other one for image), as long as you align them well in data preparation and set the batch sizes as the same, they should be good to use by default.
<snip>
--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/RuT1TgwiRCo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/568dcec1-648e-49d2-9279-d432ff56b858%40googlegroups.com.
lmdb_data_name = 'test_data_lmdb'
lmdb_label_name = 'test_score_lmdb'
Inputs = []
Labels = []
for line in fileinput.input(data):
entries = re.split(' ', line.strip())
Inputs.append(entries[0])
Labels.append(entries[1])
b_size = 4
print('Writing labels')
for idx in range(int(math.ceil(len(Labels)/(1.0*b_size)))):
in_db_label = lmdb.open(lmdb_label_name, map_size=int(1e12))
with in_db_label.begin(write=True) as in_txn:
for label_idx, label_ in enumerate(Labels[(b_size*idx):(b_size*(idx+1))]):
im_dat = caffe.io.array_to_datum(np.array(label_).astype(float).reshape(len(label_),1,1))
in_txn.put('{:0>10d}'.format(b_size*idx + label_idx), im_dat.SerializeToString())
string_ = str(b_size*idx+label_idx+1) + ' / ' + str(len(Labels))
sys.stdout.write("\r%s" % string_)
sys.stdout.flush()
in_db_label.close()
print('')
print('Writing image data')
for idx in range(int(math.ceil(len(Inputs)/(1.0*b_size)))):
in_db_data = lmdb.open(lmdb_data_name, map_size=int(1e12))
with in_db_data.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(Inputs[(b_size*idx):(b_size*(idx+1))]):
im = caffe.io.load_image(in_)
im_dat = caffe.io.array_to_datum(im.astype(float).transpose((2, 0, 1)))
in_txn.put('{:0>10d}'.format(b_size*idx + in_idx), im_dat.SerializeToString())
string_ = str(b_size*idx+in_idx+1) + ' / ' + str(len(Inputs))
sys.stdout.write("\r%s" % string_)
sys.stdout.flush()
in_db_data.close()
print('')
I1128 17:21:14.422446 25826 solver.cpp:236] Iteration 0, loss = 4557.88
I1128 17:21:14.422489 25826 solver.cpp:252] Train net output #0: loss = 4557.88 (* 1 = 4557.88 loss)
I1128 17:21:14.422513 25826 sgd_solver.cpp:106] Iteration 0, lr = 1e-05
I1128 17:21:16.656311 25826 solver.cpp:236] Iteration 20, loss = 3805.06
I1128 17:21:16.656376 25826 solver.cpp:252] Train net output #0: loss = 3805.06 (* 1 = 3805.06 loss)
I1128 17:21:16.656393 25826 sgd_solver.cpp:106] Iteration 20, lr = 1e-05
I1128 17:21:18.886127 25826 solver.cpp:236] Iteration 40, loss = 346.539
I1128 17:21:18.886193 25826 solver.cpp:252] Train net output #0: loss = 346.539 (* 1 = 346.539 loss)
I1128 17:21:18.886209 25826 sgd_solver.cpp:106] Iteration 40, lr = 1e-05
I1128 17:21:21.115128 25826 solver.cpp:236] Iteration 60, loss = 290.139
I1128 17:21:21.115190 25826 solver.cpp:252] Train net output #0: loss = 290.139 (* 1 = 290.139 loss)
And the same rate of change for loss is when I do finetuning
I1128 17:12:29.984871 25734 solver.cpp:288] Learning Rate Policy: step
I1128 17:12:30.073063 25734 solver.cpp:236] Iteration 0, loss = 5421.64
I1128 17:12:30.073132 25734 solver.cpp:252] Train net output #0: loss = 5421.64 (* 1 = 5421.64 loss)
I1128 17:12:30.073166 25734 sgd_solver.cpp:106] Iteration 0, lr = 1e-05
I1128 17:12:32.307703 25734 solver.cpp:236] Iteration 20, loss = 3074.11
I1128 17:12:32.307770 25734 solver.cpp:252] Train net output #0: loss = 3074.11 (* 1 = 3074.11 loss)
I1128 17:12:32.307796 25734 sgd_solver.cpp:106] Iteration 20, lr = 1e-05
I1128 17:12:34.540082 25734 solver.cpp:236] Iteration 40, loss = 305.52
I1128 17:12:34.540153 25734 solver.cpp:252] Train net output #0: loss = 305.52 (* 1 = 305.52 loss)
I1128 17:12:34.540177 25734 sgd_solver.cpp:106] Iteration 40, lr = 1e-05
I1128 17:12:36.772572 25734 solver.cpp:236] Iteration 60, loss = 293.924
...
0618 07:43:18.221647 21547 net.cpp:141] Setting up data
I0618 07:43:18.221740 21547 net.cpp:148] Top shape: 1 3 227 227 (154587)
I0618 07:43:18.221766 21547 net.cpp:148] Top shape: 1 21 (21)
...
I0618 07:43:19.005381 21547 net.cpp:141] Setting up myfc8
I0618 07:43:19.005405 21547 net.cpp:148] Top shape: 1 21 (21)
...
I0618 07:43:19.005576 21547 layer_factory.hpp:77] Creating layer loss
F0618 07:43:19.005725 21547 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (1 vs. 21) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
I0618 07:48:56.920414 21684 net.cpp:141] Setting up data
I0618 07:48:56.920501 21684 net.cpp:148] Top shape: 1 3 227 227 (154587)
I0618 07:48:56.920518 21684 net.cpp:148] Top shape: 1 21 1 1 (21)
...
I0618 07:48:57.701524 21684 net.cpp:141] Setting up myfc8
I0618 07:48:57.701550 21684 net.cpp:148] Top shape: 1 21 (21)
...
I0618 07:48:57.701715 21684 layer_factory.hpp:77] Creating layer loss
F0618 07:48:57.701869 21684 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (1 vs. 21) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
layer {
name: "loss"
type: "SigmoidCrossEntropy"
bottom: "myfc8"
bottom: "label"
top: "loss"
}