Hello
I am a graduate student at the university of Ghent, Belgium; my research is about emotion recognition with deep convolutional neural networks.
Recently I've run into a problem concerning class imbalance. I'm using 9216 training samples, approx. 5% is labeled positively (1), the remaining samples are labeled negatively (0).
I'm using the
SigmoidCrossEntropyLoss layer to calculate the loss. When training, the loss decreases and the accuracy is extremely high after even a few epochs. This is due to the imbalance: the network simply always predicts negative (0).
To solve this problem, I would like to scale the contribution to the loss depending on the prediction-truth combination (punish false negatives severely). My mentor/coach has also advised me to use a scale factor when backpropagating through sgd: the factor would be correlated to the imbalance in the batch. A batch containing only negative samples would not update the weights at all.
I have only added one custom-made layer to Caffe: to report other metrics such as precision and recall. My experience with Caffe code is limited but I have a lot of expertise writing C++ code.
Could anyone help me or point me in the right direction on how to adjust the SigmoidCrossEntropyLoss and Sigmoid layers to accomodate the following changes:
1. adjust the contribution of a sample to the total loss depending on the prediction-truth combination (true positive, false positive, true negative, false negative).
2. scale the weight update performed by sgd depending on the imbalance in the batch (negatives vs. positives).
Thanks in advance!