From softmax_loss_layer.cpp, the derivative of the loss backpropagated to the previous layer is
bottom_diff[i * dim + label_value * inner_num_ + j] -= 1;
In other words, the value of the derivative in corresponding pixel is reduced by 1 in the backward stage. In such case I don't understand, how it is possible to get positive derivatives using this loss function in the last layer. Yet I observe this all the time. What am I missing?