Difference in validation accuracy and loss during training vs after training (evaluate)

504 views
Skip to first unread message

haava...@gmail.com

unread,
Nov 12, 2016, 8:33:03 AM11/12/16
to Keras-users
Hello! 
Im fine tuning a vgg16 model using my custom dataset and im getting different values for val_loss and val_acc during training vs after, i'm wondering if this is expected behaviour?

My code is very similar to the fine tuning example from fchollet:


During training i save the weights with best val loss, which in this case was:
Epoch 12/20
7940/7940 [==============================] - 1062s - loss: 0.0662 - acc: 0.9874 - val_loss: 0.0640 - val_acc: 0.9804


Afterwards i run evaluate_generator with the same validation data and i get these results:
[0.081565297748011883, 0.97784491440080568]

Has anyone experienced the same problem or have any idea what might cause this?


Message has been deleted

haava...@gmail.com

unread,
Nov 12, 2016, 8:35:36 AM11/12/16
to Keras-users, haava...@gmail.com
I know its a small difference, but its pretty significant at this stage.

mhubri...@gmail.com

unread,
Nov 16, 2016, 2:52:58 PM11/16/16
to Keras-users, haava...@gmail.com
Hey,

I also noticed something strange regarding this issue.

I trained a model on GPU. Afterwards, I evaluated it on CPU. In both cases I used the same validation set. On CPU I got a higher loss and smaller accuracy. Then I changed to GPU again and evaluated it there. And suddenly everything was fine and I got the same results as during training.


On Saturday, November 12, 2016 at 2:33:03 PM UTC+1, haava...@gmail.com wrote:

haava...@gmail.com

unread,
Nov 16, 2016, 3:51:31 PM11/16/16
to Keras-users, haava...@gmail.com, mhubri...@gmail.com
I'm doing everything on GPU, so i don't think that's the issue. 

I'm loading the model architecture and weights exactly the same way as before starting to train. Loading and using the validation data exactly the same (tested with both validation generator and no generator with same batch size). 
Compiling with the same hyperparameters. ModelCheckpoint works correctly on other models, so that is not the cause.

I'm using l2 weight regularization on my dense layers, maybe that could be affecting the results in training vs. evaluation?
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages