pycaffe loss layer using mask of fully connected layer outputs

Tobias Stahl

unread,

Mar 3, 2017, 9:16:02 AM3/3/17

to Caffe Users

I want to create a caffe python layer that gets the outputs of a fully connected layer, labels and an arithmetic function as input and then computes the loss with only the outputs of the fully connected layer needed for the funtion.
a function contains of math symbols and indices and looks like this: '+' 3 '+' 7 '+ '14
the loss is then (y - (3+7+14) )**2
my problem is that i need the inputs of the indices of the previous layer, the fully connected layer (FC), in order to compute the gradients.
My gradient should be 2 * (y - (3+7+14)) * ( FC.bottom.data[3] + FC.bottom.data[7] + FC.bottom.data[14] ).
Is that possible anyhow, or does the fully connected layer even take care of that?

Message has been deleted

Tobias Stahl

unread,

Mar 3, 2017, 1:29:19 PM3/3/17

to Caffe Users

Am Freitag, 3. März 2017 18:22:21 UTC schrieb Tobias Stahl:

EDIT:
So I thought of something like that:

    def forward(self, bottom, top):
        score = np.zeros(20)
        predictions = bottom[0].data
        level_functions = bottom[1].data
        labels = bottom[2] # shape 20 x 1
        for level_funtion in level_functions:
          level_score = np.zeros(20)
          for term in level_funtion:
            if term[0] == '+':
              level_score += predictions[term[1]]
            else:
              level_score -= predictions[term[1]]
          score += level_score
        top[0].data[...] = np.mean((score - labels)**2)

    def backward(self, top, propagate_down, bottom):
    #This method implements the backpropagation, it propagates the
    #gradients from top to bottom. propagate_down is a Boolean vector
    #of len(bottom) indicating to which of the bottoms the gradient
    #should be propagated
        diff = np.zeros(feature_size)
        for level_funtion in level_functions:
          level_diff = np.zeros(feature_size)
          for term in level_funtion:
            if term[0] == '+':
              level_diff += bottom[0].data[term[1]]
            else:
              level_diff -= bottom[0].data[term[1]]
          diff += 2*(top[0].diff) * level_diff
        bottom[0].diff[...] = np.mean(diff)

where level_functions contain the functions as explained above, predictions are the intputs from the fully connected layer. So my forward pass is clear, but i don't know how to backprop the gradient that needs to access the features, i.e. the inputs of the fully connected layer. Is that possible or do I need to create another layer that computes the w'x instead of the fully connected layer?

Reply all

Reply to author

Forward