caffe python manual sgd

AV

unread,

Apr 6, 2016, 2:25:43 PM4/6/16

to Caffe Users

Hello everybody,

I am trying to implement the SGD functionality to update weights in python manually in caffe python instead of using solver.step() function. The goal is to match the weight updates after doing solver.step() and that by manually updating the weights.

The setup is as follows: Use MNIST data. Set the random seed in solver.prototxt as: random_seed: 52. Make sure momentum: 0.0 and, base_lr: 0.01, lr_policy: "fixed". Above is done so that, I can simply implement the SGD update equation (with out momentum, regularization etc.). The equation is simply: W_t+1 = W_t - mu * W_t_diff

Following are the two tests:

Test1: Using caffe's forward() and backward() to calculate the forward propagation and backward propagation. For each layer that contain weights I do:

    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr # weights
        solver.net.layers[k].blobs[1].diff[...] *= lr # biases

Next, update the weight/biases as:

        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

I run this for 5 iterations.

Test2: Run caffe's solver.step(5).

Now, what I expect is the two tests should yield exactly same weights after the two iterations.

I save the weights values after each of the above tests and calculate the norm difference between the weight vectors by the two tests, and I see that they are not bit-exact. Can some one spot something that I might be missing?

Following is the entire code for reference:

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01
momentum = 0.

# Get layer types
layer_types = []
for ll in solver.net.layers:
    layer_types.append(ll.type)

# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr
        solver.net.layers[k].blobs[1].diff[...] *= lr
        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)

The last line that compares the weights with the two tests produces:

after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05 where as I expect this difference to be 0.0

Any ideas?

Jan

unread,

Apr 15, 2016, 9:57:50 AM4/15/16

to Caffe Users

Is weight_decay zero as well?

You won't be able to achieve bit-exactness, since you are working with float numbers here, so rounding errors do occur and cause some divergence, but it should not be as high as the values you gave.

Jan

AV

unread,

Apr 15, 2016, 12:53:35 PM4/15/16

to Caffe Users

Thanks for the reply.

Yes, the weight_decay was set to zero. You are right, I should not have mentioned bit-exactness, what I mean is consistent outputs with no accumulation of errors. Which is what was happening.

So I figured out why this is happening by digging little bit further in the C++ code.

In the step function (https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp#L194), there is a call to ClearParamDiffs() (https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp#L203), before backward-propagation.

This is what was missing. so when I set the diffs to zero before backward() in python I get what is expected!

Now, as a next step, I wanted to add the momentum term also, which is where I am at right now. I implement the momentum term as follows:

prev_updates = momentum*prev_updates - lr*solver.net.layers[1].blobs[0].diff

solver.net.layers[1].blobs[0].data[...] += prev_updates

prev_updatesB = momentum*prev_updatesB - lr*solver.net.layers[1].blobs[1].diff

solver.net.layers[1].blobs[1].data[...] += prev_updatesB

First to confirm that above piece of code gives me results as expected without momentum, first I set momentum = 0.0. In this case I get matching weights between the two tests.

Now, when I set the momentum to say 0.9 in both cases, I again see some non-trivial discrepancies between the weights between the two tests.

I suspect, something similar to above issue of setting the diffs to 0.0 before calling backprop is taking place even in this case with momentum, but as I am new to Caffe, I am not able to get to what it is yet..

If some one more experienced with the Caffe code can point me in right direction, that would be awesome...