The notion of resuming is fine, you save a snapshot, and stop training for some reason, then some time later, you resume your training.
fine-tuning how-ever means, you take a pretrained model, and then do your changes to the architecture, (be it network architecture with or without optimization parameters) and start training.
I noticed when someone tries to use snapshot for resuming and also changing network architecture, the new layers are initialized, and training continues from where it was stopped. the user however can not change the optimization parameters! for example I noticed if you previously trained your model with rmsprop, then saved a snapshot, and then tried to resume it with a new optimization configuration, lets say instead of rmsprop, you used ada delta this time, it will crash!, however, introducing new layers, is fine here!
This problem does not happen when you use the model switch with either caffemodel or solverstate! when you use model, the training starts from the very beginning.(iteration 0) and adding/removing/ layers, changing solver settings, all are fine.
I guess the weight switch was exactly like the model's switch.
So the question is while model acts exactly like weight as it seems. and with snapshot, the same thing (except solver settings changes) is true, what is the difference between all of these switches? and
what is the significance of each of these against the others ?