Looking over your network is difficult as it is pasted right into the post - next time please attach the prototxt as a file, it's easier to navigate over that way.
Now for some hints: I would be careful about the loss function advice. The choice of loss function is dictated by the task that you're trying to learn, not architecture of the network you're training. Contrastive loss is used to train Siamese nets, yes, but this is for the task of identification - network trained this way learns how similar two images are (e.g. "do those two pictures represent the same person?"). I successfully trained a similar architecture using SoftmaxWithLoss for classification of several images at once.
I agree with Akash's advice to up the learning rate though. Usually it's a good idea to increase it a lot and try the highest setting at which the network doesn't diverge. Looks like you're using AlexNet, so lr=0.01 would be a good starting point.
In general, there can be dozens of reasons why your network fails to learn anything. Maybe your data is bad, or maybe you're loading it wrong? If this is a classification task and you're using 4 views of a single object, it is crucial that each 4 columns received images of the same object - otherwise the task doesn't make sense, so do check that. An interesting check to make would be to take just one column and see if it can learn anything. Maybe the task is just too hard or not learnable at all? Or maybe you don't have enough examples?
It might also be helpful if you posted your output log (as above, please don't paste it into the post but attach), maybe there's something obvious in it.