Test loss becomes small, but accuracy does not change?

572 views
Skip to first unread message

Antony Zebraski

unread,
May 21, 2015, 4:42:31 PM5/21/15
to caffe...@googlegroups.com
I wanted to try my hand at a binary classification problem.
I used the MNIST dataset as a starting point; and just for the heck of it,
I labeled all the "5"s as class '1', and the rest of the digits as class '0'.
Thus I have 54579 0's and 5421 1s; hence guessing '0' all the time would give an
accuracy of 0.90965

I'm using ADAGRAD solver, to save the effort of fine-tuning the learning rate.

Anyways: the network's test loss quickly becomes very low, but the accuracy still
remains about the same? How is that possible?
Here are the logs of the results on the validation set:

Iteration 0, Testing net (#0)
    Test net output #0: accuracy = 0.910508
    Test net output #1: loss = 0.697884 (* 1 = 0.697884 loss)
Iteration 1000, Testing net (#0)
    Test net output #0: accuracy = 0.911094
    Test net output #1: loss = 0.0205829 (* 1 = 0.0205829 loss)
Iteration 2000, Testing net (#0)
    Test net output #0: accuracy = 0.910586
    Test net output #1: loss = 0.0139908 (* 1 = 0.0139908 loss)
Iteration 3000, Testing net (#0)
    Test net output #0: accuracy = 0.911016
    Test net output #1: loss = 0.0100374 (* 1 = 0.0100374 loss)
Iteration 4000, Testing net (#0)
    Test net output #0: accuracy = 0.91043
    Test net output #1: loss = 0.00888367 (* 1 = 0.00888367 loss)

Obviously, training loss also goes down even more quickly. I'm using SigmoidCrossEntropyLoss for
the loss function.

Any ideas what could be going on here?

Antony Zebraski

unread,
May 21, 2015, 7:08:18 PM5/21/15
to caffe...@googlegroups.com
Argh! I think I found the problem. I was using SigmoidCrossEntropy Loss, as I figured I had
a binary classification problem.

I changed it to SoftmaxWithLoss, and (more importantly) set the output of the last layer to
have *2* outputs (I guess because of 2 classes?). And guess what? The network reached 99.7%
test accuracy in no time!

Why doesn't the SigmoidCrossEntropy loss function work? What does one use for a binary
classification problem? And how come Softmax will take as input a layer with 2 outputs as
well as just 1?  Ugh, too many questions. :-(
Reply all
Reply to author
Forward
0 new messages