That's indeed very strange, can you please give an example about how it is expected to behave?
Sadly, I'm not able to give you deep insights about model optimization as I for myself don't work with Caffe that long. Therefore, I only can give you following advice:
1) I can't say for sure of course, but I think the problem might actually be in the model architecture itself (my second guess was the dataset, please take a closer look on it to confirm everything is fine). Generally, it is best to use recent models rather than old ones, which may not have been maintenanced over time and, additionally, typically don't have much popularity because of bad functionality. Choose young models with community support.
2) The only project I tackled with Caffe so far is object detection with MobileNetSSD. You can have a closer look to these repos (the first one is the original, the second just an adaptation to the v2 of the model):
There, it is clearly described the goal is to achieve a loss around 2.5-1.5 within 100000 iterations (I managet to get around 3.0). I trained the model on my own dataset and it turned out that in terms of accuracy, it was doing very bad (around 2%), but on real data, it detected just fine. Try to evaluate your model on data that isn't in your dataset.
3) Also, the reason could be overfitting, because the network achieves the least loss when just sorting into 2 categories (that's why I suggested you take a look on your data).
4) In the links I provided you, you find all necessary files you can look at. That would be solutions that really work. Other than that, I don't know what kind of source files you're interested in.
Best of luck!
Tamas