Can optimize model manual, but optim gives size mismatch error

57 views
Skip to first unread message

rpraw

unread,
Oct 4, 2015, 1:06:55 PM10/4/15
to torch7
Hi All,

I've got a model that I can optimize manually, like this:

function train()
    for r = 1,numRuns do
        model:zeroGradParameters()

        local pred = model:forward(inputs)

        local err = crit:forward(pred, outputs)
        local grad = crit:backward(pred, outputs)

        model:backward(inputs, grad)
        model:updateParameters(0.01)
    end
end

But when I try to use optim, like this,

function optTrain()
   params, paramGrads = model:getParameters()

    local feval = function(x)
        if x ~= params then params:copy(x) end
       
       paramGrads:zero()

        local pred = model:forward(inputs)
        local err = crit:forward(pred, outputs)

        local grad = crit:backward(pred, outputs)
        model:backward(inputs, grad)

        return err, paramGrads
    end
    optimMethod(feval, params, optimState)
end

I get this error:

inconsistent tensor size at /torch/pkg/torch/lib/TH/generic/THTensorMath.c:424

Any ideas for what I should fix? Do I need to resize the param vector for optim somehow?

Thanks!

Francisco Vitor Suzano Massa

unread,
Oct 4, 2015, 1:57:53 PM10/4/15
to torch7
Do you share the weights/bias of your model ?
if that's the case, you should also share gradWeight/gradBias as well

rpraw

unread,
Oct 4, 2015, 2:03:18 PM10/4/15
to torch7
Hi Francisco,

How do you mean share here - between layers?

Francisco Vitor Suzano Massa

unread,
Oct 4, 2015, 2:15:32 PM10/4/15
to torch7
Yes, like in a siamese network.
The problem you mentioned usualy happens when one have shared weights between layers, but the gradWeights and gradBias are not shared.
For example, see
https://groups.google.com/forum/#!topic/torch7/f2k_DQA8ZWk
https://github.com/torch/DEPRECEATED-torch7-distro/issues/156

rpraw

unread,
Oct 4, 2015, 2:23:24 PM10/4/15
to torch7
Ah, got it. I do have some shared weights , but I share the gradients too:

bottom.weight:set(top.weight)
bottom.gradWeight:set(top.gradWeight)

I don't share biases, since one layer (for embeddings) is a LookupTable, while the other (for mapping out of embedding space) is a linear layer.

What's confusing is that I don't get any error in the manual training - I would have thought that any size mismatch would also cause problems in the updateParameters() step in the manual version.

Francisco Vitor Suzano Massa

unread,
Oct 4, 2015, 2:42:22 PM10/4/15
to torch7
Could you verify the dimensions of params and gradParams ?
also, check as well the sizes of model:parameters()

rpraw

unread,
Oct 4, 2015, 3:02:31 PM10/4/15
to torch7
It looks like:

params:         269025 [torch.LongStorage of size 1]
gradParams: 125424 [torch.LongStorage of size 1]

(So different by a factor of 2.14).

And then model:parameters() is a pair of big tables, each with 24 each entries. Are these the parameters and their gradients? The sizes of the tensors in the tables are identical.

Thanks for your help!

Francisco Vitor Suzano Massa

unread,
Oct 4, 2015, 3:06:06 PM10/4/15
to torch7
So, that shows that you forgot to share some gradWeights/gradBias.

And yes, the tables correspond to the parameters and gradParameters.

rpraw

unread,
Oct 4, 2015, 6:20:34 PM10/4/15
to torch7
Yah, I had a clone('weights', ...) rather than clone('weight'...)    :/  

Thanks for your help!
Reply all
Reply to author
Forward
0 new messages