Database for siamese network

Paul

unread,

Feb 17, 2016, 3:23:44 AM2/17/16

to Caffe Users

Hello,

I want to create a siamese network for face recognition.
At the moment, I have a database of many faces with the label 0 if the face is of someone I am not looking for and 1 if it is someone I am looking for.

It is similar to this:

"/path/image1.JPEG 0
...
/path/imageN.JPEG 1"

I know I have to create a pair_data database with two images and the label 1 if it is the same person and the label 0 if it is a different person (I am not sure for the label, is it 1 and 0 or something else ?). In a first time, I want to apply this newly created database to the siamese example given for the mnist database.

I have two questions :

1) I have a database of around 50 000 faces. How should I build my pair_data database ? Should I create pairs comparing each image with every other ? The problem being that it would generate a database of around 50000*49999/2 = 1,249,975,000 pairs. This is largely too big. Which strategy should I apply for making pairs ?

2) I guess (?) I could keep train.txt and test.txt files for the learning if I specify "ImageData" in the data layer of my *train_test.prototxt file. There I have two problems: Supposing that I specify "ImageData" so that I don't make any leveldb database, how should I write my train.txt and test.txt files ? Something like:

"/path/image1.JPEG /path/image2.JPEG 0
/path/imageK.JPEG /path/imageM.JPEG 1"

would do the job?

Ideally, I would like to build a leveldb or lmdb database with these pairs. I searched the siamese network on Caffe's documentation but in the example used, the leveldb files are generated from ubyte files which I can not figure out how to generate myself. I would like to know how to generate a leveldb database with pair_data as discussed above that would ensure compatibility with the given siamese network example.

Thanks a lot.

Muneeb Shahid

unread,

Feb 17, 2016, 11:37:01 AM2/17/16

to Caffe Users

Hey Paul,

I had the very same questions when I started using siamese networks, I am sharing what I did but if someone else have better suggestion then please do share.

1- I would recommend you read this paper "Learning a Similarity Metric Discriminatively, with Application to Face Verification". Essentially what you do is that, you generate same amount of negative pairs as there are positive pairs. In order to get a better coverage of negative pairs, I used bootstrapping that is I trained with one set of randomly generated negative pairs and then after some fixed iterations generated them again and fined tuned the already trained network. (there might be better approaches, I would highly appreciate if someone has some suggestions) p.s you might want to have a look at the triplet loss.

2-I use four data layers. Two for training and two for testing. Each of the two files contain one of the image of the image pair.

i.e

train1.txt

pos_pair_image1 1

neg_pair_image1 0

train2.txt

pos_pair_image2 1

neg_pair_image2 0

You only need labels from one of the files, so you can discard one of the labels later on using a silence layer

To create lmdbs generate the txt files as I just suggested and then use the convert lmdb tool that comes with caffe to generate a lmdb. if you want to use just one lmdb or leveldb source for training instead of two then you will need to write your own code, there are some examples out there.

layer {
  name: "data_1"
  type: "ImageData"
  top: "data_1"
  top: "sim_labels"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "mean.binaryproto"
  }

  image_data_param {
    source: "train1.txt"
    batch_size: 32 
  }
}

layer {
  name: "data_2"
  type: "ImageData"
  top: "data_2"
  top: "dummy_labels"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "mean.binaryproto"
  }
  image_data_param {
    source: "train2.txt"
    batch_size: 32 
  }
}

layer {
  name: "data_1"
  type: "ImageData"
  top: "data_1"
  top: "sim_labels"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "mean.binaryproto"
  }
  image_data_param {
    source: "test1.txt"
    batch_size: 320 
  }
}

layer {
  name: "data_2"
  type: "ImageData"
  top: "data_2"
  top: "dummy_labels"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "mean.binaryproto"
  }
  image_data_param {
    source: "test2.txt"
    batch_size: 320
  }
}

Paul

unread,

Feb 17, 2016, 2:22:31 PM2/17/16

to Caffe Users

That is an extremely interesting answer, thank you very much Muneeb. I will try as soon as possible and I will let you know.

Paul

unread,

Feb 24, 2016, 5:48:47 AM2/24/16

to Caffe Users

Your idea works perfectly fine. Thank you again Muneeb.

Muneeb Shahid

unread,

Feb 25, 2016, 5:04:19 AM2/25/16

to Caffe Users

great, and you're welcome

Christopher Turnbull

unread,

Feb 25, 2016, 11:33:30 AM2/25/16

to Caffe Users

Muneeb, what you said sounds very interesting, but could you help me understand it by defining "negative vs positive pairs", "bootstrapping" and "triplet loss" ?

Message has been deleted

Paul

unread,

Feb 26, 2016, 3:00:33 AM2/26/16

to Caffe Users

Hi Muneeb,

Thanks to your help, I could finish the learning part of my neural network.

I am now trying to define a threshold to have a value of the energy over which I should consider two pictures as different and under which I should consider them as representing the same person. My goal is obviously to test the accuracy of the network similarly to the article you shared. Ultimately, determining a percentage of false-positive and so on.

For this purpose I am writing a python script.

However this work is pretty laborious. The python documentation for caffe is almost non-existent (or else i would love to see it !) and after many unexpected and not-so-understandable errors I could manage to create a script which puts my test set in the half-network (ie the "left arm" of the siamese: similar to the file example/siamese/mnist_siamese.prototxt file). Unhappily, the "left arm"'s output is always (0,0). Until now I could not figure out why (my learning is a failure ? my script is a failure ?).

Do you have any solution for this siamese tests/accuracy determination issue? Maybe you could share a script you previously used and I may take inspiration from?

Thanks again

Muneeb Shahid

unread,

Feb 26, 2016, 5:39:26 AM2/26/16

to Caffe Users

each negative pair corresponds to two dissimilar images i.e of two different persons

each positive pair corresponds to two similar images i.e image of the same person taken in different conditions e.g lighting, angle

bootstrapping is a technique where you sample instances (with replacement in my case) from a large datatset, google for some examples

for triplet loss have a look at this paper, the code hasn't been merged in bvlc caffe yet. Here is the PR. if you want to use it anyway you will have to merge it to your code yourself. This triplet loss pr doesn't use siamese structure and thus uses is much less memory and is more scalable. Go through the pr thread to understand how to feed it your data.

Muneeb Shahid

unread,

Feb 26, 2016, 6:06:14 AM2/26/16

to Caffe Users

Hi,

the accuracy thing is a bit more tricky and yes i used python for it.

To be clear I had two types of test error:

One was a small subset of the total test set, it had all positive images and same number of randomly generated negative images. This was fed to the network just like the training data after fixed intervals just to get an idea if the loss does indeed go down. (no accuracy or anything just contrastive loss)
After I was done training or wanted to get the full accuracy on a snapshot, I used python. To be precise i plotted precision recall curves. Following are the steps:

Convert your train_val.prototxt to a deploy.prototxt. i.e drop the loss layer and data layers.
Use python to feed in your test data

Just get the activations of whichever layer you want, in your case would be the last layer. Dump all the activations.
Create a similarity matrix using the dumped activations for each image against all other images.

You may want to normalize the similarity matrix in someway, for me normalizing each row with its norm individually seemed to work the best.

Now plot pr curves. As for threshold start with a threshold that gives you precision of 1.0 and then slowly increase it until you reach recall of 1.0.

For learning how to use python go through the ipynb examples on caffe website, especially this one.

I have created a gist for feature extraction that you may want to look at here

Paul

unread,

Apr 7, 2016, 11:05:16 AM4/7/16

to Caffe Users

Hi Muneeb,
Your advise were extremely useful! I do not know how to thank you, but thanks a lot! Hope I can help you back in the future!
I believe I am almost done now with this experiment with the siamese network. I could write a python code which I hope is working, but I still have a problem. My network always gives (0,0) as an output for each image, after training. I do not understand why but I believe it is something simple I do not get. I mark this topic as finished, but I opened an other one asking about this new issue in particular. if you have any idea why this happens, it would be amazing if you could tell me!

Thank you again!

baby tang

unread,

Jun 10, 2016, 9:56:39 PM6/10/16

to Caffe Users

Hi Paul

I have done the same thing with you ,but I have really a big problem in writing the deploy.prototxt.Could you give me an example?

在 2016年4月7日星期四 UTC+8下午11:05:16，Paul写道：

changlin xiao

unread,

Oct 27, 2016, 9:11:40 PM10/27/16

to Caffe Users

Hi tang,

Have you figured out how to write deploy.prototxt file? and how to input the images?

Reply all

Reply to author

Forward