Difference in validation accuracy and loss during training vs after training (evaluate)

haava...@gmail.com

unread,

Nov 12, 2016, 8:33:03 AM11/12/16

to Keras-users

Hello!

Im fine tuning a vgg16 model using my custom dataset and im getting different values for val_loss and val_acc during training vs after, i'm wondering if this is expected behaviour?

My code is very similar to the fine tuning example from fchollet:

https://gist.github.com/fchollet/7eb39b44eb9e16e59632d25fb3119975

During training i save the weights with best val loss, which in this case was:

Epoch 12/20

7940/7940 [==============================] - 1062s - loss: 0.0662 - acc: 0.9874 - val_loss: 0.0640 - val_acc: 0.9804

Afterwards i run evaluate_generator with the same validation data and i get these results:

[0.081565297748011883, 0.97784491440080568]

Has anyone experienced the same problem or have any idea what might cause this?

Message has been deleted

haava...@gmail.com

unread,

Nov 12, 2016, 8:35:36 AM11/12/16

to Keras-users, haava...@gmail.com

I know its a small difference, but its pretty significant at this stage.

mhubri...@gmail.com

unread,

Nov 16, 2016, 2:52:58 PM11/16/16

to Keras-users, haava...@gmail.com

Hey,

I also noticed something strange regarding this issue.

I trained a model on GPU. Afterwards, I evaluated it on CPU. In both cases I used the same validation set. On CPU I got a higher loss and smaller accuracy. Then I changed to GPU again and evaluated it there. And suddenly everything was fine and I got the same results as during training.

On Saturday, November 12, 2016 at 2:33:03 PM UTC+1, haava...@gmail.com wrote:

haava...@gmail.com

unread,

Nov 16, 2016, 3:51:31 PM11/16/16

to Keras-users, haava...@gmail.com, mhubri...@gmail.com

I'm doing everything on GPU, so i don't think that's the issue.

I'm loading the model architecture and weights exactly the same way as before starting to train. Loading and using the validation data exactly the same (tested with both validation generator and no generator with same batch size).

Compiling with the same hyperparameters. ModelCheckpoint works correctly on other models, so that is not the cause.

I'm using l2 weight regularization on my dense layers, maybe that could be affecting the results in training vs. evaluation?

Reply all

Reply to author

Forward

Message has been deleted