I wanted to try my hand at a binary classification problem.
I used the MNIST dataset as a starting point; and just for the heck of it,
I labeled all the "5"s as class '1', and the rest of the digits as class '0'.
Thus I have 54579 0's and 5421 1s; hence guessing '0' all the time would give an
accuracy of 0.90965
I'm using ADAGRAD solver, to save the effort of fine-tuning the learning rate.
Anyways: the network's test loss quickly becomes very low, but the accuracy still
remains about the same? How is that possible?
Here are the logs of the results on the validation set:
Iteration 0, Testing net (#0)
Test net output #0: accuracy = 0.910508
Test net output #1: loss = 0.697884 (* 1 = 0.697884 loss)
Iteration 1000, Testing net (#0)
Test net output #0: accuracy = 0.911094
Test net output #1: loss = 0.0205829 (* 1 = 0.0205829 loss)
Iteration 2000, Testing net (#0)
Test net output #0: accuracy = 0.910586
Test net output #1: loss = 0.0139908 (* 1 = 0.0139908 loss)
Iteration 3000, Testing net (#0)
Test net output #0: accuracy = 0.911016
Test net output #1: loss = 0.0100374 (* 1 = 0.0100374 loss)
Iteration 4000, Testing net (#0)
Test net output #0: accuracy = 0.91043
Test net output #1: loss = 0.00888367 (* 1 = 0.00888367 loss)
Obviously, training loss also goes down even more quickly. I'm using SigmoidCrossEntropyLoss for
the loss function.
Any ideas what could be going on here?