I'm using the network architecture
U-Net, the paper mentions that a 6GB Titan X was used to train it, but on my GPU it takes around 12GB, it barely fits into my GPU.
So the moment I started to tweak the network parameters (add a few layers), there's not enough memory. Also a theano implementation takes less than 6GB in GPU.
I'm wondering what could have gone wrong in my setting. Can it be float percision ?