Trying to train(even overfitting will do) FCN using FCN-8 architecture on VOC2012. All the time - zero output, so loss ocsillates around big value. I looked one by one output of all convolutional and deconvolutional layers - through layers conv1_1 to conv3_2(about 6 convolutional layers) I can see that net learns some logical filters, they give quite understandable result, though for deeper layers there is more filters that give zero or low output.
Starting from layer conv 3_3 network, I think, saturates(ReLU activations according to architecture), and all convolution result in zero, and after that layer there is no learning happening at all.
I've initialized all convolutional layers with Gaussian with std 0.01. Deconvolutional layers according to solve.py script initialized with surgery to weights all equal to 1.0.
So... Is here a way to overcome saturation without using weight initialization from other trained model?