Triplet Net, Parallel table and weights sharing

1,088 views
Skip to first unread message

Bartosz Ludwiczuk

unread,
Oct 12, 2015, 12:10:30 PM10/12/15
to torch7
Hi,
I just started working with torch and I would like to learn Metric Network. As a example I take Triplet Network (three same network in parallel).
I found repository, where everything works. But the implementation was strange (does not use ParallelTable, which is highly recommend for SiameseNet). So I implement same stuff by:
CreateTriplet = function(Net)
  prl = nn.ParallelTable()
  convNetPos = Net:clone('weight', 'bias', 'gradWeight', 'gradBias')
  convNetNeg = Net:clone('weight', 'bias', 'gradWeight', 'gradBias')

  -- Parallel container
  prl:add(Net)
  prl:add(convNetPos)
  prl:add(convNetNeg)
  return prl
end
But the result are not the same. I checked, that weight and bias are not share between parallel nets.

After little debugging, I check that after calling: 
local Weights, Gradients = TripletNet:getParameters()
I get parameters from each parallel Net independently (same with gradients). This should not happen if I want to learn Metric Network.

Could be explain if I am doing sth wrong? Or I should just follow the implementation from Elad Hoffer (If yes, is parallel table suitable for Siamese Network only?)

Regards,
Bartosz
Message has been deleted

Bartosz Ludwiczuk

unread,
Oct 12, 2015, 1:01:19 PM10/12/15
to torch7
1. The code for creating the TripletNet is following:
local EmbeddingNet = require(opt.network)
local TripletNet = CreateTriplet(EmbeddingNet)
local Loss = nn.TripletEmbeddingCriterion()
TripletNet:cuda()
Loss:cuda()

local Weights, Gradients = TripletNet:getParameters()
So, parameters are taken after creating the TripletNet

2. For optimalization I am using Elad tools (which are great for the begginer like me). Here we have:
self.Model:zeroGradParameters()
So everything is ok.

I think that here is a problem with "getParameters()". It return all weight independently from each parallel nets. But this situation make does not explain why so many people use "ParalleTable" for SiameseNet, where is does not work at all.

alban desmaison

unread,
Oct 12, 2015, 1:32:03 PM10/12/15
to torch7
Concerning Elad tools, it does not use "ParallelTable" because he creates something similar using nngraph's "gmodule".
You can use "ParallelTable" to create SiameseNets without using nngraph but you need to be careful since you have shared weights (and grads).

In Torch when you share (or clone) some properties of a network, they actually point to the exact same memory element. The problem is that if you use a method that modify the position in memory of your shared element, you break the sharing. Unfortunately, "getParameters" is one of these methods. You should be careful to have the latest version of torch because until recently ":float()" and ":cuda()" were one of these too

So you will need to call "getParameters" before sharing the weights (and grads) and also send to the GPU before just to be sure (don't forget to put your "ParallelTable" on the GPU too):

local EmbeddingNet = require(opt.network)
local Weights, Gradients = EmbeddingNet:getParameters()
EmbeddingNet:cuda()
local TripletNet = CreateTriplet(EmbeddingNet)
local Loss = nn.TripletEmbeddingCriterion()
Loss:cuda()

Bartosz Ludwiczuk

unread,
Oct 12, 2015, 2:50:07 PM10/12/15
to torch7
Thanks, it works now. 
I should read more about breaking the sharing between weights.

soumith

unread,
Oct 12, 2015, 2:54:27 PM10/12/15
to torch7 on behalf of Bartosz Ludwiczuk
getParameters should preserve sharing after a recent set of PRs by Adam Lerer. Just make sure you get the latest version of nn.
If there's a test case that unties sharing, I'm happy to investigate.

On Mon, Oct 12, 2015 at 2:50 PM, Bartosz Ludwiczuk via torch7 <torch7+APn2wQdITeLgHiXSUo-nrkMlo...@googlegroups.com> wrote:
Thanks, it works now. 
I should read more about breaking the sharing between weights.

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at http://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Bartosz Ludwiczuk

unread,
Oct 14, 2015, 10:53:49 AM10/14/15
to torch7
Today I update torch and nn. 
Now my first version of code works fine, so for a parallel container:
local Weights, Gradients = TripletNet:getParameters()
can be used. 



Badri Narayana Patro

unread,
Feb 27, 2017, 2:28:43 PM2/27/17
to torch7

Hello,
I have designed triplet network as  following and trained with SGD woth batch size -:200, and learning rate .0001 ans also chanaged different batch size and lr but getting same loss curve as above

Network :

-----------------------------------------------------------------------
-- Network definition
-------------------------------------------------------------------------------
    convNet = nn.Sequential()

    convNet:add(nn.Linear(4096, 2000))     
    convNet:add(cudnn.BatchNormalization(2000))
      convNet:add(cudnn.ReLU())  
    convNet:add(nn.Dropout(0.25))                       
    convNet:add(nn.Linear(2000, 1000))    

   
    -- Cost function
    criterion = nn.TripletEmbeddingCriterion()
       
-------------------------------------------------------------------------------
 --get parameter
-------------------------------------------------------------------------------
params, grad_params =getparameters(convNet)

-------------------------------------------------------------------------------
 --Create triplet
-------------------------------------------------------------------------------

CreateTriplet = function(Net)
  prl = nn.ParallelTable()
  convNetPos = Net:clone('weight', 'bias', 'gradWeight', 'gradBias')
  convNetNeg = Net:clone('weight', 'bias', 'gradWeight', 'gradBias')

  -- Parallel container
  prl:add(Net)
  prl:add(convNetPos)
  prl:add(convNetNeg)
print(b('Fresh-embeddings-computation network:')); print(prl)
  return prl
end

local triplet_model = CreateTriplet(convNet)
-----------------------------------------------------------
--some code  is missing
--------------------------------------------------------------
local prediction = triplet_model:forward({aImgs, pImgs, nImgs})  -- this will need 200*4096
-----------------------------------------------------------------------------------------------------
--criterion: Triplet loss;
  
        local loss = criterion:forward(prediction)
-----------------------------------------------------------------------------
-- backward pass
----------------------------------------------------------------------------------
        dloss = criterion:backward(prediction)  --errGrad
    ------------------------------------------------------------------------------------
    --classifier back prop
        dprediction = triplet_model:backward({aImgs, pImgs, nImgs}, dloss)
-----------------------------------------------------
I have refereed following link for triplet net
https://github.com/Atcold/torch-TripletEmbedding


 Please let me know what i have to change in order  to converge the network.

Francesco App

unread,
Oct 8, 2017, 6:14:29 AM10/8/17
to torch7
Thank you so much for all the replies. Can I ask something more "philosophical"? What is the correct procedure to update weights for a triplet network? Let me explain, we want at the output of the train procedure only one net, so we are sharing the weights, making the three network  have the same numbers (techincally we do this cloning and exposing parameters, as you have described flawlessly); but, how we deal with the different dE_dX_i ? We sum the updates of all the weights and average them? We make 3 iteration of weight update using the most recent as a starting point? This is the point not clear to me. Can you suggest documentation, paper? All I found is "weight shared" or a reply on stack overflow saying "Just share the weights and TensorFlow will take care of it".

The second question is more technical: I'm using the approach described here with the parallel table and distancerationcriterion, that for instance is even published in the DistanceRatioCriterion page of nn. Thus, can you confirm me that this approach is correct to train an embedding in one net?
Thank you so much
Francesco

Bartosz Ludwiczuk

unread,
Oct 12, 2017, 12:52:49 AM10/12/17
to torch7
Thank you so much for all the replies. Can I ask something more "philosophical"? What is the correct procedure to update weights for a triplet network? Let me explain, we want at the output of the train procedure only one net, so we are sharing the weights, making the three network  have the same numbers (techincally we do this cloning and exposing parameters, as you have described flawlessly); but, how we deal with the different dE_dX_i ? We sum the updates of all the weights and average them? We make 3 iteration of weight update using the most recent as a starting point? This is the point not clear to me. Can you suggest documentation, paper? All I found is "weight shared" or a reply on stack overflow saying "Just share the weights and TensorFlow will take care of it".
About weight update, there is no paper about it, it is more technical stuff. When you are using Weight Sharing, then you have just single accumulator for gradient for each variable (so if your base, single network have 1M parameters, your accumulator is exactly the same). Then, after forward pass, in backward the gradients are added to accumulator. Before applying chosen optimizer and updating the weights, the accumulator is divided by  batch_size (as we have stochastic gradient descent). So, we have just single update procedure, just by averaging the gradient. But the final value of gradient in accumulator are 3x bigger than normal as we have 3 networks but we do not divide gradients by 3. We could do it or  apply smaller LR by 3x if network does not diverge.


The second question is more technical: I'm using the approach described here with the parallel table and distancerationcriterion, that for instance is even published in the DistanceRatioCriterion page of nn. Thus, can you confirm me that this approach is correct to train an embedding in one net?
What do you mean  'in one net'?  For sure it is good approach for learning TripletNet, if it is right coded. 

But I prefer other approach, which I used in OpenFace, described here: http://bamos.github.io/2016/01/19/openface-0.2.0/
Then we use just single network.

Bartosz Ludwiczuk

unread,
Oct 12, 2017, 12:59:57 AM10/12/17
to torch7
Most of the code look good, but there is a 'DropOut' layer, which should not be used? Why?
As you compare the 'features', not the 'probability' as usual, lack of some features may increase the euclidean difference in features significantly. 
So at first time, I would remove the DropOut and see if it works better. If yes, you can try add DropOut to prevent overprinting but much higher values like 0.7-0.9.
If it does not work either, check data preparation, try to add BatchNorm (it helps if weight init does not work well) or check the gradient values.
Reply all
Reply to author
Forward
0 new messages