Dear my colleagues,
I have a question why the gradient of entropy is different from the calcuation and the implementation in softmax_loss_layer.cpp.
From Calculation:
Entropy = sum(-logPi(x=li));
deltaEntropy = -1 + Pi(x=li);
where 'i' represents pixel index and li is the i pixel ground truth label.
From Implementation (in "softmax_loss_layer"):
Dtype* bottom_diff =(*bottom)[0]->muitable_cpu_diff();
for (int i = 0; i < num; ++i) {
for (int j =0; j < spatial_dim; ++j) {
bottom_diff[ i*dim+static_cast<int>(label[ i * spatial_dim + j]) * spatial_dim + j] -= 1;
}
}
But I don't know where is the Pi(x=li) term in the implementation.
Give me any hint please!