Hello everyone
I'm trying to fine tune the reference model on my data.
My data is made up of 6000 images (tr: 1800, ts: 4200) each assigned a soft label (value between 0 and 1) reflecting its membership to a single class (total number of classes = 1)
I ran fine tuning and was surprised that loss became NaN after iteration #20. To solve this problem, I reduced the learning rate of fc8 to a much smaller value. Fortunately it worked and loss is not NaN anymore.
However, I'm getting some strange loss results while testing and I'd like to know what it means. See bellow please:
(a) Testing started with a small loss value?!
Iteration #0:
Test net output: loss = 0.0963507
QUESTION: Isn't this loss so small for the first test?!
(b) Loss dropped rapidly!
Iteration #90,000
Test net output: loss = 7.98e-06
QUESTION: Is it possible that the fine-tuned network is able (after 90,000 iterations) to predict the image label with 0.00000798 loss?!
Any help is very much appreciated!