A net with one output is more like a regression, presumably with a 0-1 range. I understand that you then make a binary decision basing on the regression output, picking one of two options.
Unfortunately this is not how Accuracy layer works, or any of Caffe if we're talking about classification, actually. If you're classifying between N classes, the net has to have N outputs. The Accuracy layer basically checks whether index of the most active of those N outputs is the same as the label.
Could you please tell us how do you make the binary decision from the single output? It's rather difficult to give you any concrete hints without knowing this. I'm tempted to say "change your net to 2 outputs and a softmax" but I assume there is a strong reason to not do it this way.