MultinomialLogisticLoss with softmax activations

Raúl Gombru

unread,

May 16, 2018, 7:35:13 AM5/16/18

to Caffe Users

Hi,

I want to use MultinomialLogisticLoss (cross-entropy) with softmax activations (which is usually called softmax loss) for multilabel classification.

I know that intuitively it doesn't make sense to use softmax activations for a multi-label problem, but Facebook stated is his paper "Exploring the Limits of Weakly Supervised Pretraining" that they work better than Sigmoid activations.

But I have found that either the MultinomialLogisticLoss and the SoftMaxWithLoss layers require integers (class labels indices) as targets, while I need to use real numbers. The only cross-entropy loss layer using real numbers targets is the SigmoidCrossEntropyLoss, but I don't want Sigmoid activations, but Softmax.

¿Is there any solutions to do that with the default Caffe?

In the stackoverflow question someone proposes to modify the MultiModalLogisticLoss layer, but I'd like to avoid that https://stackoverflow.com/questions/38070104/how-do-i-go-about-having-a-cross-entropy-layer-in-caffe?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

Thanks

Przemek D

unread,

Aug 20, 2018, 10:15:54 AM8/20/18

to Caffe Users

Since this seems to be an experiment-level work, you might try creating a custom Python layer for that. Make sure to use numpy whenever possible and it shouldn't be terribly slow. But this gives you the most flexibility you may need.

Raúl Gombru

unread,

Aug 20, 2018, 10:29:17 AM8/20/18

to Caffe Users

I implemented it, and link it here in case anyone is interested: https://gist.github.com/gombru/53f02ae717cb1dd2525be090f2d41055

In this blogpost I explain how it works and why I wanted to test it, as Facebook did: https://gombru.github.io/2018/05/23/cross_entropy_loss/

Thanks,

Closing

Tung Hoang

unread,

Aug 20, 2018, 3:07:15 PM8/20/18

to Caffe Users

Should we use SoftmaxWithLossLayer ?

/T

Raúl Gombru

unread,

Aug 21, 2018, 4:08:32 AM8/21/18

to Caffe Users

SoftMaxWithLoss layers require integers (class labels indices) as targets, while I need to use real numbers. SoftMaxWithLoss is only valid for Multi-Class clasification, and not for Multi-Label ( see https://gombru.github.io/2018/05/23/cross_entropy_loss/).

Reply all

Reply to author

Forward