Building a net with fc layers between labels and head

Raúl Gombru

unread,

May 19, 2017, 6:07:31 AM5/19/17

to Caffe Users

I'm trying to build a net that has fc layers between labels and the head. It's a multilabel regression net.
Something like:

image --> CNN -->
<-- loss
labels --> FC -->

To do it in the prototxt I'm defining a fully connected layer that has as bottom "label", and then I'm defining the loss (SigmoidCrossEntropyLoss) having as bottoms the last layer of the CNN and the FC layer. The idea is to train the CNN and the fully connected layers all toguether.

When I call the train, I get this error:

F0519 12:05:26.682916 29287 sigmoid_cross_entropy_loss_layer.cu:13] SigmoidCrossEntropyLoss Layer cannot backpropagate to label inputs.

¿What am I doing wrong?

Przemek D

unread,

May 19, 2017, 6:45:02 AM5/19/17

to Caffe Users

It's hard for me to understand what are you trying to do. What are you putting in and what do you expect your network to learn?

The idea of supervised training is that you have some form of supervision, a ground truth you want your model to learn from. You put in data X (image or not) and corresponding label Y (what would be a correct answer to this input), that is assumed to be absolutely true and hence constant. The network's answer to input X (Y' for example) is off from the truth by some notion of error e=E(Y',Y) where E is your error function. A key notion is that the loss is calculated with respect to something - it assumes Y is constant and true, and outputs error by which Y' was different from it.
Now, that diagram that you drew looks like you consider both images and labels as equal input that your network processes. The question is: what should the loss function consider the truth? The error you get tells you exactly that: "I cannot backpropagate (errors) to label inputs because I assumed they are absolutely true (free of any errors)". (Actually I think the direct reason for this error is the fact that FC expects the layer above to produce gradients or error, but the main thing is that SCEL assumes its first input is the network output and it only propagates to that.)

Raúl Gombru

unread,

May 19, 2017, 7:00:52 AM5/19/17

to Caffe Users

Thanks for your answer.

It is a regression problem. When I had no fc layers after the labels, the net learnt to regress the label values from images. Now, putting this fc layers affter labels, I want the net to learn mapping the images and the labels to the same space, and learn that the mapping of an image and the mapping of its associated labels should be the same. So the Error should be computed in that space, and propagated trough both the CNN and the FC layers to change the weights in order to make the mapping equal.

Maybe the error is calling the values to regress labels. Maybe I should consider a net with two inputs in a siamese style, and then compute the loss using the mapping of those two inputs.

Message has been deleted

Przemek D

unread,

May 19, 2017, 8:50:26 AM5/19/17

to Caffe Users

You want to make a mapping X->S and Y->S, where S is an unknown space that you wish that your network invented - correct?

If this is the case, then I see at least one problem with it (beyond the fact that I don't understand of what use would those mappings be). Even if caffe loss layers allowed that, your network lacks the notion of reference of what the S is. From what you say, there is no direct correspondence between X and Y in your scenario, so why shouldn't the network learn the simplest possible answer in which S is a degenerate space containing only 1 point [0,0,0,...,0], then map all images and all labels to that point?

As a follow up question: what is the expected output of your network? Maybe give us some toy example, without details of your work, to better explain what you're trying to accomplish?

PDV

unread,

May 20, 2017, 1:04:56 AM5/20/17

to Caffe Users

For a simple regression, you put a FC layer after the CNN, the FC output the values you want to estimate. You can use EuclideanLoss as your loss function, which takes your label as one of the bottom inputs.

Raúl Gombru

unread,

May 22, 2017, 11:59:26 AM5/22/17

to Caffe Users

Yes, correct.

A toy example would be:
X is an image containing plants.
Y is a vector of 100 reals containing climatological data about the places where the images where taken.
I want to infer climatological conditions from the plants.
I could do a multi-label classification, or I could regress directly the 100 reals. But I want to map both images and th 100dim vector to the same space S, because that way the relations between the 100 numbers of climatological data are also learn.

I don't understand what you say about mapping all images to the same point. The idea is top back-propagate the difference of the image and the other data in S though both networks, to X and Y, so the net learns to map the images and its associated labels to the same point in the space S.

The expected output of the network is a vector of dim dim(S). So could be able to input an image and I got a vector in S, and I can also input a vector of dim(Y) and I get a vector in dim (S). In the toy example, both vectors would be similar if we input an image and its associated climatological conditions and the net has learn correctly.

Thank you

Raúl Gombru

unread,

May 22, 2017, 12:03:39 PM5/22/17

to Caffe Users

Yes I know, but I don't want a simple regression. I want to insert some fully connected layers that learn relations between the associated data.

Przemek D

unread,

May 23, 2017, 3:22:56 AM5/23/17

to Caffe Users

the net learns to map the images and its associated labels to the same point in the space S

Do you know what S is? Is this space in any way interpretable? Can you provide a label s in space S for each pair (x,y) belonging to X and Y? My point is that S is meaningless to work with - as an output it provides no information (so why would you want to obtain it), and it's not possible to come up with a loss function relating (X,Y) and S.

I have a feeling that you want to do two things here. First is regress Y from X - this should be easily solved by a network of a structure similar to:
X->(conv)->(fc)->Y
Notice that this network does learn correspondencies between elements of Y - why shouldn't it? After all, you're backpropagating through a dense layer, transforming the entire vector.

Another thing you seem to be doing is similarity. This sounds like a variation on siamese networks, where you input two images and expect the net to tell it for example whether they show the same person. Only in your case you would show a plant image and climate vector and ask whether they "match". Networks like that can be trained using ContrastiveLoss (also see this example on image-image nets). Your image X and vector Y become input data, but you still need to provide appropriate label - that is, a similarity metric for each (X,Y) pair. In caffe implementation it is a scalar in 0-1 range, 1 meaning the same thing and 0 total dissimilarity.

I don't know whether those tasks can be trained jointly, but in none of them you need to use space S directly, in my opinion.

Raúl Gombru

unread,

May 23, 2017, 4:13:59 AM5/23/17

to Caffe Users

Thanks for your help Przemek.

I have already solved the regression task X->(conv)->(fc)->Y, and it works well.
As you say, I think that what I need to relate X and Y in the way I want is something like a siamese network, but relating a vector and an image instead of two images. I don't know exactly how they work, so I'll read more about them first.
I will post here when I have any conclusion.

Przemek D

unread,

May 23, 2017, 5:22:06 AM5/23/17

to Caffe Users

I would imagine something like:
image->(conv)->(fc)
vector->(fc)
and both outputs meeting in a contrastive loss layer, with the addition of a similarity label.
But yes do research that first. I will not be able to help a lot since I've never used contrastive siamese networks myself.

Raúl Gombru

unread,

May 23, 2017, 6:47:43 AM5/23/17

to Caffe Users

Me neither but I will research it, sound good!

Reply all

Reply to author

Forward