It should be possible. I think you might need to tell caffe explicitly that the python layer you're using is a loss layer and not an output blob, but that's easy, you just need to add the following line to your layer definition:
layer {
name: "pyloss"
type: "Python"
...
loss_weight: 1
}You will notice that this works slower than using built in, CUDA-accelerated layers though.
Also, many things can be accomplished by smart combinations of existing layers, see
this for example - you don't always have to write an entire new layer.