pycaffe backward() update

514 views
Skip to first unread message

Swami

unread,
Jul 19, 2016, 11:49:23 AM7/19/16
to Caffe Users
Does the pycaffe solver backward method update the gradients ? If not, is there any other way to do this automatically ?

Something like by calling solver.step with fixed input and output ?

Swami

unread,
Jul 20, 2016, 4:43:59 PM7/20/16
to Caffe Users
To add to this: I find that there are forward and backward methods like solver.net.forward() or solver.net.backward() but the backward method does not update the weights but only computes the diff. values

I do know that the solver.step() method does a forward pass and updates the weights but I want to decouple the parameter update step from the forward pass. 

I think this should be possible - just wondering if its already implemented or is there a way to do it ?

Daivik Swarup

unread,
Nov 6, 2016, 10:30:22 AM11/6/16
to Caffe Users
Hi!
I too have encountered the same issue. Did you find a way to update the parameters in Pycaffe?

Nathan Ing

unread,
Apr 26, 2017, 8:23:40 PM4/26/17
to Caffe Users
Hi, I think I want to do something similar to what you and @Swami have described

Have you found a solution yet? I'm thinking about doing something described here:

https://github.com/BVLC/caffe/issues/3959

so far with no success.

Any progress on this issue?

Thanks!

Nathan Ing

unread,
Apr 26, 2017, 11:19:33 PM4/26/17
to Caffe Users
Not to spam this list, but here is a more complete description of what I hope to do, with reference to https://github.com/BVLC/caffe/issues/3959 

I have the net defined as in the LeNet example notebook:

def lenet(lmdb, batch_size):
    n = caffe.NetSpec()
    
    n.data, n.labels = L.Data(batch_size=batch_size, backend=P.Data.LMDB,
                            source=lmdb, transform_param=dict(scale=1./255), ntop=2)
    
    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.fc1   = L.InnerProduct(n.pool2, num_output=20, weight_filler=dict(type='xavier'))
    n.relu1 = L.ReLU(n.fc1, in_place=True)
    n.fc2   = L.InnerProduct(n.fc1, num_output=10, weight_filler=dict(type='xavier'))
    n.relu2 = L.ReLU(n.fc2, in_place=True)
    n.score = L.InnerProduct(n.relu2, num_output=10, weight_filler=dict(type='xavier'))
    n.loss = L.SoftmaxWithLoss(n.score, n.labels)
    return n.to_proto()

I define a solver:
caffe.set_mode_cpu()
solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')

And now enter training. 
To train, There are two options: 
1) use the solver.step() 
2) divide the forward(), backward() and update operations and perform each explicitly. 

We define weight_layer_idx to be the indices of layers that have weights and biases, i.e. convolutions and inner product layers.
for it in range(niter):
    # 1)
    solver.step(1)

    # 2)
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr
        solver.net.layers[k].blobs[1].diff[...] *= lr
        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff
        solver.net.layers[k].blobs[0].diff[...] *= 0
        solver.net.layers[k].blobs[1].diff[...] *= 0

Method 1) works. The weights train and the test accuracy rapidly approaches 95%.

Method 2) does not work. As I understand it, this is a simplified version of the SGDSolver Update() function, omitting momentum, batch accumulation, and the like. The weights either do not update at all, or they update randomly such that the accuracy never increases and remains close to random.

I've tried to use the Python SoftmaxLossLayer from here: https://github.com/BVLC/caffe/issues/4023 but there's an error during backprop():
ValueError: could not broadcast input array from shape (64,10) into shape (64)


If any experienced user can help me get this toy example to work, that would be super cool :)

Nathan Ing

unread,
Apr 27, 2017, 1:14:16 AM4/27/17
to Caffe Users
Update # 2:

Thanks to a nice suggestion by @Swami, I think I've linked the ApplyUpdate() function from sgd_solver.cpp to the python Solver class. Yet, a second issue has come up where the training iteration won't update. I see there's an internal attribute ('iter_') that keeps track of this... I want to do
solver.iter+=1

Of course it's not so easy.

Calling solver.step() ends with a ++iter in the C++ code. I wonder if there's a way to have this happen automatically when training this way through Python as well.

Thanks for any advice, I'm learning a lot by struggling through this problem.
Reply all
Reply to author
Forward
0 new messages