How are sample weights handled in Keras?

suni...@gmail.com

unread,

May 15, 2017, 12:27:57 PM5/15/17

to Keras-users

The Keras loss functions (or objectives from 1.x) return the mean loss for the mini-batch.

For example:

def hinge(y_true, y_pred):

return K.mean(K.maximum(1. - y_true * y_pred, 0.), axis=-1)

In Model.compile(..., loss_fn, ...), a call is made to `_weighted_masked_objective(loss_fn)` to convert unweighted loss functions to weighted loss functions, which does the following:

score_array = fn(y_true, y_pred)

...

score_array *= weights

score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))

weighted_loss = K.mean(score_array)

Take a mini-batch of 2, with sample losses [x1, x2] and non-zero sample weights [w1, w2].

The loss function will return: loss = mean([x1, x2]) = (x1 + x2)/2

This will be converted into the following 'weighted loss function':

weighted_loss = mean(loss * [w1, w2] / 1.0) = (w1*loss + w2*loss)/2

= ( w1*(x1 + x2)/2 + w2*(x1 + x2)/2 ) / 2

= x1*(w1 + w2)/4 + x2*(w1 + w2)/4

This is not quite the same as the mean weighted loss I expected:

weighted_loss = mean([w1*x1, w2*x2]) = (w1*x1 + w2*x2)/2

Did I misunderstand something, or is there a different interpretation of sample weights here?

Tomasz Melcer

unread,

May 15, 2017, 1:11:12 PM5/15/17

to Keras-users

On 05/15/2017 06:27 PM, suni...@gmail.com wrote:
> The Keras loss functions (or objectives from 1.x) return the mean loss
> for the mini-batch.

Actually, it returns an array of per-sample losses.

> For example:
>
> def hinge(y_true, y_pred):
> return K.mean(K.maximum(1. -y_true *y_pred, 0.), axis=-1)

I have a suspicion that you did not notice the `axis=-1` here, but I
might be wrong. This makes the mean work over features of the last
layer, it does not compute mean across samples.

--
Tomasz Melcer

suni...@gmail.com

unread,

May 15, 2017, 1:56:16 PM5/15/17

to Keras-users

Thanks, Tomasz.

It turns out that even though y_true is passed in as a vector, Keras promotes it to a matrix (ndim = 2) with a single column. Similarly, the model returns y_pred also with ndim = 2. My misunderstanding was to think y_true, y_pred had ndim = 1.