Siamese network with L1 distance and log loss

953 views
Skip to first unread message

zajac....@gmail.com

unread,
Jun 18, 2016, 12:16:59 PM6/18/16
to Keras-users
Hi,

I would like to modify the MNIST Siamese example (mnist_siamese_graph.py) to use L1 distance + sigmoid + log loss, as in this paper:
Siamese Neural Networks for One-shot Image Recognition (http://www.cs.toronto.edu/~rsalakhu/papers/oneshot1.pdf)

Here are the relevant parts of my inept attempt:

def flattened_l1_distance( vects ):
    x
, y = vects
   
return K.sigmoid( K.sum( K.abs(x - y), axis = 1, keepdims = True ))

distance
= Lambda(flattened_l1_distance, output_shape=eucl_dist_output_shape)([processed_a, processed_b])

model
= Model(input=[input_a, input_b], output=distance)

# train
rms
= RMSprop()
model
.compile(loss= 'binary_crossentropy', optimizer=rms )

It doesn't work - the objective stays at 0.6931. Please explain how to do it right.

Mikael Rousson

unread,
Jun 21, 2016, 5:22:22 PM6/21/16
to Keras-users, zajac....@gmail.com
It looks like you are missing the alpha_i n the l1 sum. The simplest way to implement that is probably to keep only K.abs(- yin the l1 function and add a fully connected layer afterward.

Note that there is also a regularization in the cross entropy loss in the paper.

Mikael

zajac....@gmail.com

unread,
Jun 23, 2016, 3:06:39 PM6/23/16
to Keras-users, mikael....@gmail.com
Thank you, Mikael. I'd like further help if possible. Here's the snippet updated for weighting distance components:

def get_abs_diff( vects ):
    x
, y = vects
   
return K.abs( x - y )  

def eucl_dist_output_shape(shapes):
    shape1
, shape2 = shapes
   
return (shape1[0], 1)

abs_diff
= Lambda(get_abs_diff, output_shape = eucl_dist_output_shape)([processed_a, processed_b])
flattened_weighted_distance
= Dense(1, activation = 'sigmoid')(abs_diff)

model
= Model(input=[input_a, input_b], output = flattened_weighted_distance)


Unfortunately, something's not right with the sizes:

ValueError: ('shapes (128,128) and (1,1) not aligned: 128 (dim 1) != 1 (dim 0)', (128L, 128L), (1L, 1L))
Apply node that caused the error: Dot22(Elemwise{Abs}[(0, 0)].0, dense_29_W)
Toposort index: 79
Inputs types: [TensorType(float32, matrix), TensorType(float32, matrix)]
Inputs shapes: [(128L, 128L), (1L, 1L)]
Inputs strides: [(512L, 4L), (4L, 4L)]
Inputs values: ['not shown', array([[ 1.61876023]], dtype=float32)]
Outputs clients: [[Elemwise{Composite{scalar_sigmoid((i0 + i1))}}[(0, 0)](Dot22.0, InplaceDimShuffle{x,0}.0)]]

How to fix it?

About regularized cross entropy loss, Keras doesn't have it implemented, does it?

Zygmunt

Mikael Rousson

unread,
Jun 24, 2016, 6:02:00 AM6/24/16
to Keras-users, mikael....@gmail.com, zajac....@gmail.com
I guess you need to change the output shape of the lambda layer. It's not doing a sum across the channels anymore.
It should probably look something like:

def
 eucl_dist_output_shape(shapes):
    shape1
, shape2 = shapes
    return shape1 

Regarding weight regularization, you can add that directly to the layer's definition: http://keras.io/regularizers/

zajac....@gmail.com

unread,
Jun 26, 2016, 10:55:01 AM6/26/16
to Keras-users, mikael....@gmail.com
It seems to work. Thank you!

zajac....@gmail.com

unread,
Jun 28, 2016, 11:16:12 AM6/28/16
to Keras-users, mikael....@gmail.com, zajac....@gmail.com
One more thing. The training loss is going down nicely (although not as nicely as with the original example), but there's a surprise in the end:

In [1]: run mnist_siamese_graph.py
Using Theano backend.
Train on 108400 samples, validate on 17820 samples
Epoch 1/20
108400/108400 [==============================] - 34s - loss: 0.3370 - val_loss: 0.2466
Epoch 2/20
108400/108400 [==============================] - 39s - loss: 0.1671 - val_loss: 0.1641
Epoch 3/20
108400/108400 [==============================] - 41s - loss: 0.1125 - val_loss: 0.1328
Epoch 4/20
108400/108400 [==============================] - 41s - loss: 0.0858 - val_loss: 0.0947
Epoch 5/20
108400/108400 [==============================] - 46s - loss: 0.0695 - val_loss: 0.0982
Epoch 6/20
108400/108400 [==============================] - 38s - loss: 0.0589 - val_loss: 0.0911
Epoch 7/20
108400/108400 [==============================] - 35s - loss: 0.0508 - val_loss: 0.0811
Epoch 8/20
108400/108400 [==============================] - 39s - loss: 0.0450 - val_loss: 0.0828
Epoch 9/20
108400/108400 [==============================] - 35s - loss: 0.0390 - val_loss: 0.0873
Epoch 10/20
108400/108400 [==============================] - 43s - loss: 0.0345 - val_loss: 0.0778
Epoch 11/20
108400/108400 [==============================] - 46s - loss: 0.0305 - val_loss: 0.0796
Epoch 12/20
108400/108400 [==============================] - 45s - loss: 0.0283 - val_loss: 0.0835
Epoch 13/20
108400/108400 [==============================] - 42s - loss: 0.0264 - val_loss: 0.0783
Epoch 14/20
108400/108400 [==============================] - 46s - loss: 0.0243 - val_loss: 0.0820
Epoch 15/20
108400/108400 [==============================] - 33s - loss: 0.0228 - val_loss: 0.0821
Epoch 16/20
108400/108400 [==============================] - 33s - loss: 0.0212 - val_loss: 0.0879
Epoch 17/20
108400/108400 [==============================] - 34s - loss: 0.0204 - val_loss: 0.0790
Epoch 18/20
108400/108400 [==============================] - 40s - loss: 0.0201 - val_loss: 0.0819
Epoch 19/20
108400/108400 [==============================] - 36s - loss: 0.0186 - val_loss: 0.0811
Epoch 20/20
108400/108400 [==============================] - 34s - loss: 0.0164 - val_loss: 0.0792
* Accuracy on training set: 0.32%
* Accuracy on test set: 3.02%

Histogram of predictions shows that most are close to zero or one, so I suspect it's a matter of flipping the probability. I wonder why things are that way, though, and how to correct it?

Here's the complete file:
https://gist.github.com/zygmuntz/30e6a72e13ecf9b26fddf7cc10204847

Message has been deleted

Mikael Rousson

unread,
Jul 1, 2016, 4:43:28 PM7/1/16
to Keras-users, mikael....@gmail.com, zajac....@gmail.com
That looks good!

Regarding the accuracy, you shouldn't use the implementation in the example but simply the one in keras.metrics.
You can set it directly at compilation time and it will show up during the fitting:

# train
rms = RMSprop()
model.compile(loss = 'binary_crossentropy', metrics=['accuracy'], optimizer=rms)


You can then remove all the lines after the call to fit.

BTW, the accuracy in the original siamese example looks buggy. I will try to make PR to fix that.

Mikael

mannasi...@gmail.com

unread,
Jun 6, 2018, 11:29:03 AM6/6/18
to Keras-users
I am trying to implement the same network using Tensorflow and I am getting the same error. The cost remains at 0.6932 in my case. Can't figure out the problem. I also connected a fully connected layer after the L1 dist calculation, but still gettin same error
Reply all
Reply to author
Forward
0 new messages