Hello,
I looked for an earlier answer related to my question. However I couldn't find an exact answer. Please warn me if there is an already given answer.
I'm new to neural networks and caffe. Thus, sorry if my questions sound trivial.
I'm trying to understand the FCN implementation together with the paper. So actually, my question will be mainly related to the paper. I hope this is the right place to ask.
In the FCN paper, there is a line saying
"Training from scratch is not feasible considering the time required to learn the base classification nets. (Note that the VGG is trained in stages, while we initialize from the full 16-layer version)."
- Does "training from scratch" mean that initializing all the weights in each layer randomly and running back-propagation and optimization until the weights converge? Instead, do they initialize all the weights from trained VGG network and when they train with this initialization, they run back-propagation for only last layer and the optimization takes less time as well since the weights are initialized at a good starting point? Am I correct?
- "What does "trained in stages" mean? Where do they implement this "training in stages" strategy in the code?
- VGG is trained on RGB images. So are these weights still a good initialization for RGB-D data? For instance to train the network for RGB-D and HHA data of NYUDv2 in the paper, did they use the same trained VGG network's weights?