I've been looking at how Caffe implements its Stochastic Gradient Descent optimizer and don't fully understand what it does.
I
n
void SGDSolver<Dtype>::ApplyUpdate(). Before the call to ComputeUpdateValue, which seems to calculate the update value (V_(t+1)) as described in http://caffe.berkeleyvision.org/tutorial/solver.html, there are calls to Normalize & Regularize. I don't understand what these two functions do.
Normalize has a comment saying that it 'Scales gradient to counterbalance accumulation", but I don't know what that means. Likewise,
Regularize seems to perform some sort of localized weight decay, but I can't tell for certain.
Does anyone know what tese functions do & why they were implemented?
Thanks!