Hi all,
I have a problem with the SparseCategoricalCrossEntropy loss function and need a good idea to overcome this.
For an input example, the output of my model is a matrix, say of shape 2x3, each row is a probability vector obtained by the softmax layer
z =
z11 z12 z13
z21 z22 z23
The target is a vector of length 2 contain two integer indices representing the collect class label (in the range 0,1,2)
y=
y1
y2
Since targets are indices, I use the SparseCategoricalCrossEntropy loss. It computes correctly if being used independently:
loss.forward(z, y)
However, when we have the mini-batch dimension, say a batch size of 16, then each input to the loss is a 3-d tensor Z of shape 16x2x3, and each target Y is of shape 16x2. Then the loss function fails to compute:
loss.forward(Z, Y). It raises the error:
===
Caused by: java.lang.IllegalArgumentException: ClassNLLCriterion:
The input to the layer needs to be a vector(or a mini-batch of vectors);
===
I understand that this is because the current ClassNLLCriterion does not support 3-d input (including the batch dimension).
But I have not found any possible way to overcome this. A Reshape layer to change z to one dimension cannot work because we need independent probability distributions. We cannot change the batch dimension.
A SplitTensor to split z into two vectors will make the output a Table which is not compatible with the target y, so the loss fails to compute.
Can anyone suggest a way to solve this problem?
Thank you very much,
Phuong