SoftmaxWithLoss for per pixel classificaiton/segmentation

Wei Liu

unread,

Apr 17, 2016, 9:22:17 PM4/17/16

to Caffe Users

Can anyone give me a hint for the shape of the input label if I use SoftmaxWithLoss layer for per pixel classification (e.g "Fully Convolutional Models for Semantic Segmentation").

The API document of this layer only shows the input shape for conventional classification, where each image has one label. But for image segmentation, each pixel has a label. How should I reshape the label input? samples X labels X Height X width? Each entry is a binary value with 1 for pixel being in this class?

Thanks for help. ( searched hard online but couldn't find the latest information)

Zhao.Kai

unread,

Apr 17, 2016, 10:42:25 PM4/17/16

to Caffe Users

Hi,

for a per-pixel classification task, label has a shape the same with input image, i.e. value of each pixel indicates the class of the pixel.

Evan Shelhamer

unread,

Apr 17, 2016, 11:11:45 PM4/17/16

to Zhao.Kai, Caffe Users

That's right: for SoftmaxWithLoss the label shape is the same as the output shape except that the channel dimension is 1, since the labels are class indices.

Check out the fully convolutional network reference site: http://fcn.berkeleyvision.org/, and in particular take a look at how the labels are loaded in the Python data layer for PASCAL VOC https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/layers.py#L108-L116.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/5d960af7-c368-4106-80b7-e7a4a9a23880%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Wei Liu

unread,

Apr 18, 2016, 10:20:14 AM4/18/16

to Caffe Users, zhao...@gmail.com

Thanks Evan and Kai's replies. I must have missed something obvious. The "fully convolutional..." paper states that "We append a 1 x 1 convolution with channel dimension 21 to predict scores for each of the PASCAL classes (including background) at each of the coarse output locations, followed by a deconvolution layer to bilinearly upsample the coarse outputs to pixel-dense outputs". That seems to say the input image blob of SoftMaxWithLoss should be: BatchSize x channel size (21) x H x W.

I also check the code in 'voc-fcn32s/net.py' and found:

n.score_fr = L.Convolution(n.drop7, num_output=21, kernel_size=1, pad=0,

param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])

n.upscore = L.Deconvolution(n.score_fr,

convolution_param=dict(num_output=21, kernel_size=64, stride=32,

bias_term=False),

param=[dict(lr_mult=0)])

n.score = crop(n.upscore, n.data)

n.loss = L.SoftmaxWithLoss(n.score, n.label,

loss_param=dict(normalize=False, ignore_label=255))

which also seem to say the same thing. Then, if label has shape batchSize x 1 x H x W, then label shape does not match input image shape.

Actually I tried both cases for label shape: 1) batchSize x 1 x H x W (each entry is a integer for class), and 2) batchSize x numClasses x H x W (entry is binary, one hot). Both give me huge amount of "test net output #" screen logs then out of memory errors.

Any insights? ( guess I can change the output verbose level somehow but that's another question)

Reply all

Reply to author

Forward