CUDA errors with Softmax loss in segmentation task, what is the problem?

72 views
Skip to first unread message

Ilya Zhenin

unread,
Jan 20, 2017, 6:00:26 AM1/20/17
to Caffe Users
In after few iterations I get at math_functions.cu:121 CUBLAS STATUS MAPPING ERROR 11 vs 0, last time I got at syncedmem.cpp:56 "4 vs 0 unspecidied launc failure".

I checked dimension of data blob and label blob that I feeding into the network, the all the same and correct(but they are not the same in different iterations, it is fully convolutional network).

As I said a few iterations runs fine. I don't know coincedence it is or not, but error is happining at the blob with biggest dimension than any in previous iterations(not a memory problem, there is still a lot of free memory).

And if I replace Softmax loss with SigmoidCrossEntropy, there is no errors. 

Ah yeah, if it matters, it is binary problem, as label I use matrix with (1, 1, ROWS, 224), where ROWS from, [400, 1100] interval. And only values in this blob are 0s and 1s.


Filip K

unread,
Jan 20, 2017, 8:03:12 AM1/20/17
to Caffe Users
Had a similar error, but managed to resolve it using: amount of labels + 1.

Moreover, that also seems to be the case for Pascal Context, where they have 59 labels, but some layers have num_output set to 60.

Ilya Zhenin

unread,
Jan 20, 2017, 9:44:13 AM1/20/17
to Caffe Users
You shift your labels classes? 1s instead of zero, 2 instead of 1?
Or 3 output channels instead?

пятница, 20 января 2017 г., 16:03:12 UTC+3 пользователь Filip K написал:

Filip K

unread,
Jan 20, 2017, 10:01:39 AM1/20/17
to Caffe Users
Well, in attachments you can see my sample labels.And in  code I do:

    label_400 = scipy.io.loadmat('{}/trainval/{}.mat'.format(self.context_dir, idx))['LabelMap']
        label = np.zeros_like(label_400, dtype=np.uint8)
        for idx, l in enumerate(self.labels_21):
            idx_400 = self.labels_400.index(l) + 1
            label[label_400 == idx_400] = idx
        label = label[np.newaxis, ...]
labels.txt
21_labels.txt

Evan Shelhamer

unread,
Jan 20, 2017, 12:32:01 PM1/20/17
to Ilya Zhenin, Caffe Users
If you compile in debug mode then Caffe will check the label range for you, or you can check it yourself. I seem to remember seeing this error reported when there are labels out of range of the net (like a label 100 for a 10 class net).

Evan Shelhamer





--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users+unsubscribe@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ac4028dc-3157-4545-a72f-c2b969930ed2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Filip K

unread,
Jan 20, 2017, 1:17:04 PM1/20/17
to Caffe Users, inf.s...@gmail.com
@Evan
That is actually interesting, because I have just looked at my 21_labels, and in fact I have tried to fine-tune pascalContext-fcn8s, so I have modified all the places, where you used num_output 60 to num_output 21, and I was getting exactly same error as @Ilya. However, when I increased the number to 22, it didn't seem to produce any errors ( but then I am having a loss issue https://groups.google.com/forum/#!topic/caffe-users/nx50WaN21TI ). 

I have also tested it using smaller number (16) and it was crashing with the same error, so your argument might be correct. However, given my code that I presented in previous post, it seems to be the case that I always map everything to range 0-20,which is just 21 labels (yet it doesn't work with 21).

PS. I just realized that in my code, whenever there is a label that is not in my list, it will be labelled as ground(0), which is incorrect (value of 0 because of the array initialization). So maybe use a different label.( Is it allowed to use 255 even if it is in my SoftmaxWithLoss as ignore label)?
 

W dniu piątek, 20 stycznia 2017 17:32:01 UTC użytkownik Evan Shelhamer napisał:
If you compile in debug mode then Caffe will check the label range for you, or you can check it yourself. I seem to remember seeing this error reported when there are labels out of range of the net (like a label 100 for a 10 class net).

Evan Shelhamer





On Fri, Jan 20, 2017 at 3:00 AM, Ilya Zhenin <inf.s...@gmail.com> wrote:
In after few iterations I get at math_functions.cu:121 CUBLAS STATUS MAPPING ERROR 11 vs 0, last time I got at syncedmem.cpp:56 "4 vs 0 unspecidied launc failure".

I checked dimension of data blob and label blob that I feeding into the network, the all the same and correct(but they are not the same in different iterations, it is fully convolutional network).

As I said a few iterations runs fine. I don't know coincedence it is or not, but error is happining at the blob with biggest dimension than any in previous iterations(not a memory problem, there is still a lot of free memory).

And if I replace Softmax loss with SigmoidCrossEntropy, there is no errors. 

Ah yeah, if it matters, it is binary problem, as label I use matrix with (1, 1, ROWS, 224), where ROWS from, [400, 1100] interval. And only values in this blob are 0s and 1s.


--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages