Initialize a Siamese network with pre-trained weights

Swami

unread,

Jun 18, 2015, 3:29:05 PM6/18/15

to caffe...@googlegroups.com

I am creating a siamese network and I want to initialize the two halves with weights from another pre-trained network. How do I do this in Caffe ?

Floris Gaisser

unread,

Jun 18, 2015, 9:52:11 PM6/18/15

to caffe...@googlegroups.com

To load a network with the same structure as the other pre-trained network and continue training is called fine-tuning. Here you can find a tutorial on this.

Summarizing:
by using the -weights [filename.caffemodel] you can start training again using only the weights from the model
by using the -snapshot [filename.solverstate] you can continue training from a previous state using the weights and the learning parameters from this solverstate (the solver has to be the same!)

P.S. With a search in this group, the answer could have been found too.

Swami

unread,

Jun 18, 2015, 10:31:41 PM6/18/15

to caffe...@googlegroups.com

Hi Floris,

I understand the concept of fine-tuning and have trained several networks that way. My question was a bit more specific than that: I have a siamese network where each half of the network has a similar structure - call them A1 and A2.

Now, I have a pre-trained model A whose structure is the same as A1/A2. I want to initialize A1 and A2 using the weights from A and start training the siamese network from that point.

I hope this clarifies things.

Floris Gaisser

unread,

Jun 18, 2015, 10:51:01 PM6/18/15

to caffe...@googlegroups.com

Hi Swami,

Ahh, that makes it much more clear. Curious what you are using such an architecture for.

I don't know if this would be possible though. Because as far as I know it is not possible to load more than one file without changing some of the code of caffe itself.
You do have to make sure that while training A1 and A2 you use different names for the weights. like:
A1:
a1_conv1_w
a1_conv1_b
...

A2:
a2_conv1_w
a2_conv1_b
...

You could try, just opening both (binary) *.caffemodel files in an editor that can display binary data and copy the data from one file and append to the other file and safe as a new file. But do watch out for headers.

Gavin Hackeling

unread,

Jun 19, 2015, 12:16:15 AM6/19/15

to Floris Gaisser, caffe...@googlegroups.com

I understood your problem differently. The layers of A1 and A2 should have different layer names, but the same names for their weights. These weight names should be the same as the names of the layers of the network you are fine tuning. That is, you only need to load one model, the weights of which will be shared by both sides of the Siamese network.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/0f1d520c-4ecd-4232-9c56-b21a0bdf621c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

swami

unread,

Jun 19, 2015, 12:58:24 AM6/19/15

to Gavin Hackeling, Floris Gaisser, caffe...@googlegroups.com

My situation is almost exactly what Gavin describes. Right now, the siamese n/w architecture shares the weights between the two parts. Hence, assuming I have a pre-trained network A, whose weights have the same names as the ones in A1/A2, how do I use the weights in A to initialize the layers in A1/A2 and start training from that point.

I think the easier (or harder may be) way is what Floris has suggested ie to perform a hard-coded network surgery by looking at the binary files but I was looking out for a more nicer solution. Here is what I want to acheive: Some parameter in the solver file, say 'preload_net' where I would specify the pretrained network file which would then initialize the siamese n/w with these weights. This then begs the question: Can we achieve this parameter type specification for initializing a normal case, ie assuming the pre-trained n/w and the new n/w have the same structure ? I ask this since If we could acheive this, we could then create an ad-hoc hack to perform it for specific siamese type architectures maybe ?

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/jc0Jx4JjWwo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/CA%2BdcAtzC6GV-kMa0VcO6_Vhxxo%2Bckr2AkaX%2Bdh%2BF-i5f4EEeuA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

--Swami

Floris Gaisser

unread,

Jun 19, 2015, 1:13:28 AM6/19/15

to caffe...@googlegroups.com, f.ga...@gmail.com, gavinha...@gmail.com

Not sure if I understand it correctly then.
Let's see:

You have now a network trained called PT:
PT_Layer1:
PT_L1_w

PT_Layer2:
PT_L2_w

You want to start training a new network of the same architecture (num layers and size of the filters) but 'siamese' (two weight-sharing networks in parallel) called FT, but use the weights from the PT:
FT_Layer1:
PT_L1_w

FT_Layer2:
PT_L2_w

FT_Layer1_p
PT_L1_w

FT_Layer2_p:
PT_L2_w

I think the only thing you have to do, is to load the file PT.caffemodel with the weights option and make sure the layers weights are the same.

swami

unread,

Jun 19, 2015, 10:18:49 AM6/19/15

to Floris Gaisser, caffe...@googlegroups.com, Gavin Hackeling

Yes, you got the idea right. I will try it out, i thought the preloaded network needed to have the exact architecture as the new one. But what you say seems to make more natural sense. I will post back after my trials.

-Swami

Steve Schmugge

unread,

Jul 6, 2015, 6:53:55 PM7/6/15

to caffe...@googlegroups.com, f.ga...@gmail.com, gavinha...@gmail.com

I'm a little unsure in how to intialize weights from a pre-trained net(bvlc_reference_model) into a Siamese net.

I thought the layer name had to have the same name in order for the weights to be transferred over from the pre-trained net. If I want the weights to be intialized the same for conv1 and conv1_p in the new siamese network, it seems only one of them would get the pre-trained weights because only conv1 is defined in the pre-trained net? Is there a way to specify that both conv1 and conv1_p in the new Siamese network get the same pre-trained weights?

-Steve

Floris Gaisser

unread,

Jul 6, 2015, 9:34:13 PM7/6/15

to Steve Schmugge, caffe...@googlegroups.com, gavinha...@gmail.com

I'm not 100% sure aswell, but I have tried this:

naming of the weights is the same for both conv1 and conv1_p: conv1_w and conv1_b
But I hadn't named the layers the same as in the model.

Hopefully you get it working.

Diwakar Ganesan

unread,

Jul 10, 2015, 11:00:07 PM7/10/15

to caffe...@googlegroups.com

Hello,

I am also looking to train a Siamese network for face recognition. So far, I've been able to use the tutorial given in http://caffe.berkeleyvision.org/gathered/examples/siamese.html on my own dataset. I made a few changes to the train/val prototxt, whereby I specified two ImageData layers for input into the network, rather than by using a LevelDB database. This configuration seemed to train well, as the loss looked like it was going down. However, I am having some trouble extracting features from this network, as I'm not able to figure out how to get the python wrapper to accept two images as input into a network. Can you give me some insight into this?

swami

unread,

Jul 11, 2015, 12:05:34 AM7/11/15

to Diwakar Ganesan, caffe...@googlegroups.com

You need not pair the images for extracting features - since the two halves of the network are identical you can pass each image through only the upper half and extract the features and compute the loss using them.

On Fri, Jul 10, 2015 at 11:00 PM, Diwakar Ganesan <diwakar...@gmail.com> wrote:

Hello,

I am also looking to train a Siamese network for face recognition. So far, I've been able to use the tutorial given in http://caffe.berkeleyvision.org/gathered/examples/siamese.html on my own dataset. I made a few changes to the train/val prototxt, whereby I specified two ImageData layers for input into the network, rather than by using a LevelDB database. This configuration seemed to train well, as the loss looked like it was going down. However, I am having some trouble extracting features from this network, as I'm not able to figure out how to get the python wrapper to accept two images as input into a network. Can you give me some insight into this?

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/jc0Jx4JjWwo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/c3ee1040-38c8-458f-83aa-add45d89a009%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

--Swami

林可昀

unread,

Sep 9, 2015, 2:36:59 AM9/9/15

to Caffe Users

Hi all,

I also have the same question.

I want to initialize the two streams with another pre-trained model weights.

Could anyone please guild me about this?

Can we use "shareweights" to tackle this problem?

Many thanks,

Kevin

Swami於 2015年6月19日星期五 UTC+8上午3時29分05秒寫道：

Tambet Matiisen

unread,

Sep 9, 2015, 4:18:04 AM9/9/15

to Caffe Users

My current understanding is, that your pre-trained layers must have the same name as one of the branches in Siamese network. NB! Layer names, not param names or top names! I would expect, that those weights are automatically shared with other branch, but I haven't really tested it.

Tambet

PS. You can use pre-trained weights in two ways: -snapshot takes .solverstate file as argument and allows you to resume training with the same model (iterations and learning rate schedule continue where they left off); -weights takes .caffemodel file as argument and allows you to continue training with different model (iterations and learning rate schedule start from beginning). Probably you want the latter in this case.

PPS. Comment in DEBUG := 1 in Makefile.config, to see diagnostic information about which layers were copied and which were ignored. If you have trouble with that, just change DLOG to LOG in this function: https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp#L834

Evan Shelhamer

unread,

Sep 9, 2015, 4:11:57 PM9/9/15

to Tambet Matiisen, Caffe Users

Right, for fine-tuning the parameters are resolved by the layer names (and NOT param names or top names). Since the streams of a Siamese net have tied weights, you need to define the net (1) with one stream having the layer names for fine-tuning and (2) with weight sharing between the paired layers of each stream. That is, define one stream with layer names equal to the original net but shared param names between the streams.

Then fine-tune as usual through the command line or pycaffe.

Evan Shelhamer

--

You received this message because you are subscribed to the Google Groups "Caffe Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ea68507d-fd22-49d5-bdb3-37affb05f389%40googlegroups.com.

林可昀

unread,

Sep 10, 2015, 12:45:25 AM9/10/15

to Caffe Users, tambet....@gmail.com

Hi Evan,

Thanks a million!! It is working now.

I attached my networks as follows. Hope it also helps.

This networks consist of two alexnets, then the top of this two streams are connected with the "euclidean loss" objective function.

Best,

Kevin

Evan Shelhamer於 2015年9月10日星期四 UTC+8上午4時11分57秒寫道：

kevin_finetune_alexnets.sh

two_alexnets_finetune.prototxt

dmytro....@buddyguard.io

unread,

Dec 6, 2015, 7:59:07 AM12/6/15

to Caffe Users, tambet....@gmail.com

Hi Kevin,

I am wondering how did you use your architecture for siamese training. I can see that you use Euclidian Loss layer, thus you must feed the network with pictures of the same person only. Such loss does not encounter similarity label.

The second question: is it necessary to name and to share parameters of the ReLU and pooling layers?

Henry Gao

unread,

Jan 14, 2016, 8:30:05 PM1/14/16

to Caffe Users, tambet....@gmail.com

Hi Kevin,

In your *.prototxt, I noticed that the parameters are shared. And I still have a problem:

Are the parameters shared in the in the initialization or the full training stage?

Best,

Junyu Gao.

在 2015年9月10日星期四 UTC+8下午12:45:25，林可昀写道：

Muneeb Shahid

unread,

Jan 25, 2016, 1:11:16 PM1/25/16

to Caffe Users, tambet....@gmail.com

For future references. in case of siamese networks using eulcidean distance or any other distance for that matter is just plain wrong, unless the loss also includes a contrastive term. i.e contrastive loss

On Thursday, September 10, 2015 at 6:45:25 AM UTC+2, 林可昀 wrote:

Muneeb Shahid

unread,

Jan 25, 2016, 1:15:52 PM1/25/16

to Caffe Users, tambet....@gmail.com

sharing parameters for layers that do not learn anything, such as RELU, is pointless so you donot need to share their params. and only use a loss with a contrastive term for training a siamese i.e contrastive loss, because for siamese any loss with out a contrastive term can be satisfied by learning just a constant.

Muneeb Shahid

unread,

Jan 25, 2016, 1:17:06 PM1/25/16

to Caffe Users, tambet....@gmail.com

can you elaborate what issue are you facing? you only need to share the params in prototxt, caffe takes care of the rest.

元玉书淋风

unread,

Apr 12, 2016, 5:54:06 AM4/12/16

to Caffe Users

在 2015年6月19日星期五 UTC+8上午3:29:05，Swami写道：

I am creating a siamese network and I want to initialize the two halves with weights from another pre-trained network. How do I do this in Caffe ?

I am wondering whether you solved the initialization problem of the siamese-network ? I also have the same doubt as the example in the caffe just initialize the weights of the 2 sub-network seperately !!

Xianzhi Du

unread,

Apr 21, 2016, 5:13:23 PM4/21/16

to Caffe Users

I want to use two trained networks to initialize the weights of the two channels of siamese network separately. In this case, the weights of the two channels of siamese network is unsharable. Does anyone know how to do this?

Shahnawaz Grewal

unread,

Feb 15, 2017, 11:56:11 AM2/15/17

to Caffe Users, tambet....@gmail.com

Can we change euclidean_loss to ContrastiveLoss. I am also trying to use AlexNet for Siamese Network.

曹浩宇

unread,

Apr 10, 2017, 9:47:37 PM4/10/17

to Caffe Users

do you sovled it ? help~ 5555~

在 2016年4月22日星期五 UTC+8上午5:13:23，Xianzhi Du写道：

Reply all

Reply to author

Forward