How are sample weights handled in Keras?

1,475 views
Skip to first unread message

suni...@gmail.com

unread,
May 15, 2017, 12:27:57 PM5/15/17
to Keras-users
The Keras loss functions (or objectives from 1.x) return the mean loss for the mini-batch.
For example:

def hinge(y_true, y_pred):
return K.mean(K.maximum(1. - y_true * y_pred, 0.), axis=-1)

In Model.compile(..., loss_fn, ...), a call is made to `_weighted_masked_objective(loss_fn)` to convert unweighted loss functions to weighted loss functions, which does the following:

score_array = fn(y_true, y_pred)
...
score_array *= weights
score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
weighted_loss = K.mean(score_array)

Take a mini-batch of 2, with sample losses [x1, x2] and non-zero sample weights [w1, w2].
The loss function will return:  loss = mean([x1, x2]) = (x1 + x2)/2
This will be converted into the following 'weighted loss function':

weighted_loss = mean(loss * [w1, w2] / 1.0) = (w1*loss + w2*loss)/2
= ( w1*(x1 + x2)/2 + w2*(x1 + x2)/2 ) / 2
= x1*(w1 + w2)/4 + x2*(w1 + w2)/4

This is not quite the same as the mean weighted loss I expected:

weighted_loss = mean([w1*x1, w2*x2]) = (w1*x1 + w2*x2)/2

Did I misunderstand something, or is there a different interpretation of sample weights here?


Tomasz Melcer

unread,
May 15, 2017, 1:11:12 PM5/15/17
to Keras-users
On 05/15/2017 06:27 PM, suni...@gmail.com wrote:
> The Keras loss functions (or objectives from 1.x) return the mean loss
> for the mini-batch.

Actually, it returns an array of per-sample losses.


> For example:
>
> def hinge(y_true, y_pred):
> return K.mean(K.maximum(1. -y_true *y_pred, 0.), axis=-1)

I have a suspicion that you did not notice the `axis=-1` here, but I
might be wrong. This makes the mean work over features of the last
layer, it does not compute mean across samples.


--
Tomasz Melcer

suni...@gmail.com

unread,
May 15, 2017, 1:56:16 PM5/15/17
to Keras-users
Thanks, Tomasz.
It turns out that even though y_true is passed in as a vector, Keras promotes it to a matrix (ndim = 2) with a single column. Similarly, the model returns y_pred also with ndim = 2. My misunderstanding was to think y_true, y_pred had ndim = 1.

_
Sunil
Reply all
Reply to author
Forward
0 new messages