Cross-entropy loss with masked output

954 views
Skip to first unread message

Raghav Goyal

unread,
Mar 15, 2016, 12:22:48 PM3/15/16
to lasagne-users
Hi,

I've used LSTM to decode an output using a mask.
I need to calculate the cross-entropy loss between output and target distribution. 

Should I proceed normally with the following ?

loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)

But, wouldn't it add the loss for repeated indices (where mask == 0) ?

I need to avoid that.

Can you help with a principled approach ?

Thanks in advance,
Raghav

goo...@jan-schlueter.de

unread,
Mar 15, 2016, 1:58:56 PM3/15/16
to lasagne-users
But, wouldn't it add the loss for repeated indices (where mask == 0) ?

I need to avoid that.

Can you help with a principled approach ?

The solution is simple, you just need to mask the loss with the same mask before summing/averaging it to a scalar. You can write it out as (loss * mask_var).mean(), or use use lasagne.objectives.aggregate(loss, mask_var): http://lasagne.readthedocs.org/en/latest/modules/objectives.html#aggregation-functions

Best, Jan

Raghav Goyal

unread,
Mar 15, 2016, 5:28:54 PM3/15/16
to lasagne-users, goo...@jan-schlueter.de
Hi Jan,

Thanks for the answer. It works.
Just a random thought, how do you check for correctness of the model, for Theano code in general ?

It might be the case that a bug is unknowingly doing things that you don't want.

Visualising computational graph won't help because it gets too complicated.

Thanks again,
Raghav

goo...@jan-schlueter.de

unread,
Mar 16, 2016, 5:53:49 AM3/16/16
to lasagne-users
Just a random thought, how do you check for correctness of the model, for Theano code in general ?

It might be the case that a bug is unknowingly doing things that you don't want.

Visualising computational graph won't help because it gets too complicated.

If visualising the graph is too complicated, you can still try returning as many intermediate results as possible (e.g., the hidden representations of each layer before and after the nonlinearity, the network output and targets, the loss, the gradients and updates). Just add expressions for what you need to the list of outputs computed by the training function (compiled with a theano.function() call).

Best, Jan
Reply all
Reply to author
Forward
0 new messages