The problem is not in grad computation but updating using Adam, or possibly using any other momentum based scheme. The following one with SGD works:
for epoch=1:epochs
for (x, y) in dtrn
fval = @diff loss(θ,ϕ,x)
for param in params(fval)
∇param = grad(fval, param)
update!(param, ∇param) # SGD default - OK
#update!(param, ∇param, SGD(lr=0.001)) # Also works
end
end
end
I could not make Adam updates to work. In fact, I wonder how a momentum based update scheme, when used in the for loops above, would remember (keep track of) the previous updates. I can write the update equations explicitly in the inner for loop , but curious if the "update!" function or the "adam()" can be used directly.
burak