Weights are not being updated

AHMED IMAM SHAH

unread,

Dec 20, 2021, 8:27:32 AM12/20/21

to COMP541

Dear all,

I am having a problem training my model. My error metric remains the same throughout. I used the @gcheck to confirm if the gradient calculation is correct but the test give me a false. I am not sure what to do next? It indicates that the problem is at the very first layer.

I am using the following command to train the model:

adam!(loss, [(SlotAttentionModel, batch)], params=params(SlotAttentionModel), lr = 0.0004

And here is the @gcheck:

macro gcheck1(ex); esc(:(@gcheck $ex (delta=0.000001, nsample=2, rtol=0.05, atol=0.001, verbose=2))); end
@gcheck1 loss(SlotAttentionModel, first(clevrDataset))

Any suggestions?

Best Regards,

Ahmed Imam Shah

MS. Computer Science and Engineering

Koç University, Istanbul, Turkey

ILKER KESEN

unread,

Dec 20, 2021, 10:21:29 AM12/20/21

to AHMED IMAM SHAH, COMP541

Ahmet hi,

You can use the built-in AutoGrad.gcheck function/macro, no need to define. What's your first layer? Could you please paste the error backtrace of the gradient check? Additionally, could you debug your model's gradient tape by following these instructions?

Best.

- ilker

PS: Side effects can corrupt gradient calculation, check whether your forward calculation includes a side effect (e.g. initiating an empty array and then setting its values).

--
You received this message because you are subscribed to the Google Groups "COMP541" group.
To unsubscribe from this group and stop receiving emails from it, send an email to COMP541+u...@ku.edu.tr.
To view this discussion on the web visit https://groups.google.com/a/ku.edu.tr/d/msgid/COMP541/CA%2Bsbv-jUZnhG4MT8OO_Vv2HDyK31PvaQSpi%2BDZkBJHzwYh%3DCVA%40mail.gmail.com.

AHMED IMAM SHAH

unread,

Dec 20, 2021, 2:22:17 PM12/20/21

to ILKER KESEN, COMP541

Hello Ilker,

Thank you for your reply. I have reduced the problem to one layer the first layer of the encoder. Just a single convolutional layer defined as:

struct Conv; w; b; pad; f; p; end
(c::Conv)(x) = c.f.(conv4(c.w, dropout(x,c.p), padding=(c.pad,c.pad), stride=1) .+ c.b)
Conv(w1::Int,w2::Int,cx::Int,cy::Int,pad::Int, f=relu;pdrop=0) = Conv(param(w1,w2,cx,cy), param0(1,1,cy,1), pad, f, pdrop)

Here is my smallest working example. The forward pass works but the gradient check fails.

macro gcheck1(ex); esc(:(@gcheck $ex (delta=0.000001, nsample=2, rtol=0.05, atol=0.001, verbose=2))); end

function newloss(model, input_batch)
recon = model(input_batch)
loss = sum(recon)
return loss
end

checklayer = Conv(5,5,3,32,2)
checklayer(first(clevrDataset))

@gcheck1 newloss(checklayer, first(clevrDataset))

The output of the check is:

(pa, xi, f0, nd, ad) = ("16×16×3×64 Param{KnetArray{Float32,4}}", -0.96862745f0, 88469.3f0, 5.409804f8, -1.5159675f0)
false

The debugging link that you sent, I am not able to understand the steps of that.

And I am not initializing with zeros or nulls and later changing that I have double-checked that.

You can find the full notebook here: https://github.com/a-imamshah/slotattention.jl/blob/main/SlotAttention_FirstResults.ipynb

Thank you again for your help!

Best,

Ahmed

ILKER KESEN

unread,

Dec 21, 2021, 7:04:50 AM12/21/21

to AHMED IMAM SHAH, COMP541

Ahmed hi,

This snippet produces an error for me also as well. If you change your loss function (e.g. replace sum with mean) or lower your dimensions (e.g. filter size, output channels = 1), then it passes the gradient checking. I suspect this issue is related to floating point number presentation. Could you check whether the gradients are zero or not? Just call @diff macro (J = @diff loss_function(args...); grad(J, model.some_layer.w)). By the way, do you know how the original implementation initializes weights?

Best.

- ilker

AHMED IMAM SHAH

unread,

Dec 21, 2021, 8:46:39 AM12/21/21

to ILKER KESEN, COMP541

Hello again,

Yes, it works with output channel = 1. But doesn't really solve my problem. It doesn't work for channel size = 2 or more: the @gcheck fails. The gradients are not zeros I have checked using the macro you have given in the previous email.

And lastly, the original paper initializes the weights by Xavier initialization.

Also, I stated to get CUDNN errors by decreasing the filter size when I call sgd! or adam! function. I don't know if all these problems are related.

Best,

Ahmed

ILKER KESEN

unread,

Dec 21, 2021, 9:03:26 AM12/21/21

to AHMED IMAM SHAH, COMP541

Ahmed hi,

It seems that your model uses the reparameterization trick [1] (learned_mu .+ sampled_noise * exp.(learned_log_sigma)) and I suspect this might be the cause. Could you please set seed just before the sampling operation in your model? I think this will fix the gradient check problem.

Best.

- ilker

[1] https://github.com/google-research/google-research/blob/master/slot_attention/model.py#L77

AHMED IMAM SHAH

unread,

Dec 21, 2021, 9:22:47 AM12/21/21

to ILKER KESEN, COMP541

For my current implementation, I am not using the reparametrization trick. I am just initializing the slots with random normal weights. But the problem is that the check fails at the first layer way before the slot attention module. The check even fails with just a convolution layer with more than 1 filter. Moreover, I have already set the random seed just before the sampling. Still the same results and same problem. even with wrong weights my model should at least change the results may be bad but it should change (it is not). Any suggestions?

I am sorry for sending so many emails and thank you so much for helping.

Best,

Ahmed

ILKER KESEN

unread,

Dec 21, 2021, 9:25:23 AM12/21/21

to AHMED IMAM SHAH, COMP541

Ahmed hi,

Could you please also check your implementation by using double precision (Float64) instead of single precision (Float32)?

Best.

- ilker

AHMED IMAM SHAH

unread,

Dec 21, 2021, 11:05:33 AM12/21/21

to ILKER KESEN, COMP541

Hello again!

So I changed it to Float64, now the checks are true. Now I will try to train the model and update you if anything comes up. Thank you so much for your quick responses.