Training Loss increases while fine-tuning from VGG-16 cafffe converted keras model

1,067 views
Skip to first unread message

Md Atiqur Rahman

unread,
Mar 20, 2016, 1:09:32 PM3/20/16
to Keras-users
Hi all,

I am using this https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3 keras model which is provided by @Lorenzo Baraldi equivalent of vgg16 network trained with caffe. I am trying to fine-tune this network with caffe-trained weights (as available on that link) for a fewer number of classes on a total number of 15,000 images. But, it seems that, after a few batches (batch_size = 100), the whole training breaks, as the training loss starts to increase rather decrease. This behaviour remains the same even when I start training from scratch, instead of fine-tuning.

Could you please advise what I am doing wrong?

Thanks

Md Atiqur Rahman

unread,
Mar 22, 2016, 1:09:19 AM3/22/16
to Keras-users
Ok, I found the catch.

I was actually using a data batch generator to generate my data batch-by-batch. After debugging a bit, I found that, I was shuffling the data at the end of every epoch, rather than at the beginning. Since my image file paths are saved in a text file in class order, and I have some class imbalanced data (1:10), the training was going down for couple of batches (so long as the same class images were being processed, i guess) and then jumping up to something very abnormal, (when it starts to process new class images, i guess) and was never going down to normal again. Once I modified my code to shuffle the data at the beginning of each epoch, everything started to work as usual, loss going down and not jumping to some very high value

Considering this behaviour of the optimizer (SGD) on un-shuffled data, I am concerned about the fact that, in the case where the training data are heavily imbalanced (in my case, it could go up to 1:20) and there exists a huge amount of training data (it could be 1,00,000 in my case), then even after shuffling the data at the beginning of each epoch, it might be the case that quite a good number of images from the majority class might appear consecutively, thereby, causing the above behaviour of breaking the training to recur.

Therefore, can anyone please advise me what might be the solution to tackle such numerical issues during training?

Thanks.
Atique

Kris Cao

unread,
Mar 22, 2016, 9:27:44 AM3/22/16
to Keras-users
Are you using fit_generator? You can specify a weight for each data point (read the API docs on the exact details), which downweights updates for imbalanced classes.

Md Atiqur Rahman

unread,
Mar 22, 2016, 1:48:33 PM3/22/16
to Keras-users
Thank you @Kris Cao for pointing to a solution to the problem. Yes, I am using fit_generator() method for training. I assume, I just need to pass a dictionary (class_weight, i guess) keyed by class indices with values to scale the loss during training. More specifically, in my case, I have 5 classes A, B, C, D, E where the class E contains as many as 10 times more images than the other 4, each of which has more or less the same number of images . Therefor, the dictionary in my case should look like this, i guess:

class_weight = {0:10, 1:10, 2:10, 3:10, 4:1}
// assuming classes are indexed from A to E

Could you please advise if I am getting it right or wrong?

Thanks.
Atique

Kris Cao

unread,
Mar 22, 2016, 5:29:03 PM3/22/16
to Keras-users
You can either do it that way, or directly apply a weight to each sample in your data iterator by returning a tuple of (input data, sample weight). Read the fit_generator API description, it tells you what to do pretty directly.

Md Atiqur Rahman

unread,
Mar 22, 2016, 6:00:48 PM3/22/16
to Keras-users
Thanks a lot Kris.
Reply all
Reply to author
Forward
0 new messages