training class imbalanced data

6,484 views
Skip to first unread message

Isaac Gerg

unread,
Mar 23, 2016, 7:23:18 PM3/23/16
to Keras-users
I have a dataset with 100,000 negative examples and 100 samples of positive examples.  My concern is of the detection of the positive examples. I am running the training on all 100100 samples.  I'm wondering what the best way to train this dataset would be using keras.  The inputs are 100x100 pixel images and output is a 1 hot vector of 2 elements indicating prob of negative or positive example.

I have tried to "balance" out the classes by setting the class_weight=class_weight={0:1, 1:100000}.  I figured this should make the loss on par with the negative examples and therefore prevent overfitting (i.e. making every input look like a positive example, false positives through the roof).  However, this isnt working.  I've tried fiddling with the batch size (im using adadelta), realizing that the loss gradient is averaged over the batch and that if all of the batches contain at least 1 positive example, the gradients will bias to overfit.  However, this does not seem to work. I would think a smaller batch size would remedy this but no luck.

Does anyone see where my flaw is?

Isaac

Md Atiqur Rahman

unread,
Mar 24, 2016, 11:28:07 AM3/24/16
to Keras-users
Hi Issac,

What problem do you face during training? I mean, is it that the training loss very rapidly goes down to ~0, but the validation loss/acc is very high/low? Or, it is the issue with gradients that explode (due to seeing many negative examples, and suddenly getting a positive one) thereby causing the whole training to break?

I am also in need of training with class-imbalanced data and am interested to know the details.

Thanks.
Atique

Isaac Gerg

unread,
Mar 24, 2016, 12:11:45 PM3/24/16
to Keras-users
Hi Antique,

What I am seeing is that my training loss gets stuck.  All inputs are mapped to the positive example class giving me a very low accuracy.  If you google for imbalanced data there are several papers and book chapters that seem useful.  

However, you gave me a good idea for how to divide and conquer the issue.  I should focus on getting the training loss to monotonically descend.  I think once I solve that problem, my other issue will solve itself.

Isaac

Md Atiqur Rahman

unread,
Mar 24, 2016, 1:33:13 PM3/24/16
to Keras-users
I think, the reason you being ended up predicting everything to be "positive" is your way too high weight to positive class. You have a class imbalance ratio of pos:neg = 1:1000, so your class_weight ratio should be reversed, that is - class_weight={0:1, 1:1000}, assuming index 0 for negative class, and 1 for positive class. But, you are weighting it 100 times more than it required. I assume, you are using "categorical_crossentropy" as the loss function.

Thanks.
Atique

Isaac Gerg

unread,
Mar 24, 2016, 1:39:03 PM3/24/16
to Keras-users
I think you have a typo in your email.

My class_weight={0:1, 1:100000}, you suggest setting it to: class_weight={0:1, 1:1000}?

Is this correct?

Isaac Gerg

unread,
Mar 24, 2016, 1:39:16 PM3/24/16
to Keras-users
binary_crossentropy

Md Atiqur Rahman

unread,
Mar 24, 2016, 1:54:09 PM3/24/16
to Keras-users
Yes, that is correct. I was wondering, using "binary_crosentropy" how could you get a two-element vector as output?

Thanks

Isaac Gerg

unread,
Mar 24, 2016, 2:54:05 PM3/24/16
to Md Atiqur Rahman, Keras-users
Apologies, i mistyped.  I am using mse.

--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/LYo7sqE75N4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/180f1710-4ac1-440e-80a3-0ab6d48153ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Md Atiqur Rahman

unread,
Mar 24, 2016, 2:57:38 PM3/24/16
to Keras-users, rat.c...@gmail.com
Humm.....It would be nice if you just let me know how this new weight setting worked for you.

Isaac Gerg

unread,
Mar 24, 2016, 2:58:37 PM3/24/16
to Md Atiqur Rahman, Keras-users
Running now.....

Sander Stepanov

unread,
Mar 27, 2016, 1:46:57 PM3/27/16
to Keras-users, rat.c...@gmail.com
all sounds great, but how to class_weight for loss='binary_crossentropy' when 3 classes are used?
for example
train_y_ohe[0:4]
array([[ 0.,  1.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.]])
for
len(train_y_ohe)
5435

and model
    model = Sequential()
    model.add(Embedding(max_features, 128, input_length=maxlen))
    model.add(LSTM(128))  # try using a GRU instead, for fun
    model.add(Dropout(0.5))
    #model.add(Dense(1))
    model.add(Dense(3))
    model.add(Activation('sigmoid'))
    #
    

    model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")  

    model.fit(X_train, train_y_ohe, batch_size=batch_size, nb_epoch= use_nb_epoch, validation_data=(X_test, test_y_ohe), show_accuracy=True)
        
   

    

krish...@pec.edu

unread,
Jul 17, 2016, 4:15:55 AM7/17/16
to Keras-users
I don't know if you already solved your problem but it might be helpful for new users who see this site. In your case, you have 3 classes which is a Multi class classification problem and hence you should use categorical cross entropy aa your loss function with softmax activation.

jishan...@gmail.com

unread,
Apr 25, 2017, 12:06:54 PM4/25/17
to Keras-users, rat.c...@gmail.com
Hi Sander,

I'm also working on a multiple classes prediction task with "binary_crossentropy" and "sigmoid" activation problem. I have tried to put 
"class_weight" into the model.fit() function. I have tried "class_wright = 'auto'". It seems to solve imbalance problem by mini-Batch training with balanced data with same number positive and negative instances. But, I did not find any documentation about this. I'm also working on inputing "class_weight" manually, e.g., something like
class_weight={0:1.0, 1:6.0}.

However, it does not work for multiple classes prediction task as what you said above. Could you provide me more updated information? How you solve your 3 classes problem?

Thanks,
Ao

Isaac Gerg

unread,
Apr 25, 2017, 12:18:17 PM4/25/17
to krish...@pec.edu, Keras-users
I was able to solve my problem.  Not sure if you are talking to Sander or myself.

On Sun, Jul 17, 2016 at 4:15 AM, <krish...@pec.edu> wrote:
I don't know if you already solved your problem but it might be helpful for new users who see this site. In your case, you have 3 classes which is a Multi class classification problem and hence you should use categorical cross entropy aa your loss function with softmax activation.
--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/LYo7sqE75N4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/4b6dee72-d6f1-4864-9bfe-487c382237ac%40googlegroups.com.

glenn....@googlemail.com

unread,
Jun 27, 2017, 5:47:52 PM6/27/17
to Keras-users, krish...@pec.edu
@Isaac Gerg: Would you mind sharing the solution you found? 

I have the same problem. Highly imbalanced binary dataset. Total of ~800K records and 1.9% with 1. Neural Net with Keras always tries to go for 98.1% accuracy instead of precision or recall. Tried many things but can't get it to work reliably. Thanks in advance.


On Tuesday, 25 April 2017 19:18:17 UTC+3, Isaac Gerg wrote:
I was able to solve my problem.  Not sure if you are talking to Sander or myself.
On Sun, Jul 17, 2016 at 4:15 AM, <krish...@pec.edu> wrote:
I don't know if you already solved your problem but it might be helpful for new users who see this site. In your case, you have 3 classes which is a Multi class classification problem and hence you should use categorical cross entropy aa your loss function with softmax activation.

--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/LYo7sqE75N4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users...@googlegroups.com.

Isaac Gerg

unread,
Jun 27, 2017, 6:31:32 PM6/27/17
to glenn....@googlemail.com, Keras-users, Kilari MURALI KRISHNA TEJA
My solution was to oversample minority class using shifts and flipud (specific to my problem).   I have not had good luck with weighting the error function by the class imbalance - to much parameter picking.

To unsubscribe from this group and all its topics, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/d1cb272d-2a55-4e57-aff5-4ee5d6480a85%40googlegroups.com.

Daπid

unread,
Jun 28, 2017, 2:43:27 AM6/28/17
to Isaac Gerg, glenn....@googlemail.com, Keras-users, Kilari MURALI KRISHNA TEJA
On 27 June 2017 at 23:47, glenn.neuber via Keras-users <keras...@googlegroups.com> wrote:

I have the same problem. Highly imbalanced binary dataset. Total of ~800K records and 1.9% with 1. Neural Net with Keras always tries to go for 98.1% accuracy instead of precision or recall. Tried many things but can't get it to work reliably. Thanks in advance.

One option here is to do hard negative mining. You start with a balanced set including all your positives, and after training, scan your negative set and include a few of the ones that have highest positive scores (the most wrong ones). Retrain and repeat.


On 28 June 2017 at 00:31, Isaac Gerg <isaac...@gergltd.com> wrote:
My solution was to oversample minority class using shifts and flipud (specific to my problem).   I have not had good luck with weighting the error function by the class imbalance - to much parameter picking.

What do you mean by too much parameter picking? This formula gives you the optimal values (and the reference to the paper) https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py#L13-L41

Isaac Gerg

unread,
Jun 28, 2017, 8:50:35 AM6/28/17
to Daπid, Glenn Neuber, Keras-users, Kilari MURALI KRISHNA TEJA
I have used the algorithm you mention and it didn't work.

skf.ge...@googlemail.com

unread,
Jul 24, 2017, 4:44:36 AM7/24/17
to Keras-users

On the topic of how to handle very unbalanced data sets:

With all the answers I see, they all seem to assume that the targets are categories (0 or 1 or ...) but I am wondering how to apply them if the target Y values are one-hot encoded vectors?

Say I have have ten categories that are encoded as
(1,0,0,0,0,0,0,0,0,0,0)
(0,1,0,0,0,0,0,0,0,0,0)
etc etc

and the first one is >99% of the training and test samples. And now assume we want to provide a custom class_weights dictionary that assigns a lower weight to the default 99% case and a relatively higher weight to the other cases. But it seems a dictionary does not accept a vector as a hash key.

So something like

class_weights = {(1,0,0,0,0,0,0,0,0,0): 0.1,
                          (0,1,0,0,0,0,0,0,0,0): 1.0,
                          (0,0,1,0,0,0,0,0,0,0): 1.0,
                          ...etc...}

is NOT allowed.

Therefore: Is there any way to set (in keras/Tensorflow) class_weight for cases where the targets are one hot encoded vectors? I  found one post (https://stackoverflow.com/questions/43481490/keras-class-weights-class-weight-for-one-hot-encoding) that suggests using sample _weigths instead (apparently sugesting that the samples with the rare event are weighted higher). Ok, but this too won't work in the case of my data: They are longer sequences in which one out of 100 tokens (or less) is of a rare non-default class, but in almost all samples there is such a rare class. So weighting samples differently does not work. I'd really need to find a way to assign a higher weight to these rareclasses (tokens) in my sequences.

Any suggestions?


dylan.b...@gmail.com

unread,
Aug 25, 2017, 10:23:13 AM8/25/17
to Keras-users, skf.ge...@googlemail.com
You can make a dict where the key is the index of the one-hot, i.e. in your example

{0: 0.1,
1: 1.0,
2: 1.0
...etc...}

If you look at the source code these are actually get converted downstream to a vector of class weights (i.e. [0.1, 1.0, 1.0...]) with the same shape as your one-hot vectors.

azmi....@gmail.com

unread,
Feb 25, 2018, 6:48:18 PM2/25/18
to Keras-users
Would this work for LSTMs as well?

ifi.n...@gmail.com

unread,
Nov 14, 2018, 11:15:36 AM11/14/18
to Keras-users
Hi,

For your multi-classification problem with 3 classes, you must define the last layer of your model as Dense(3, activation='softmax'). The loss function will be "categorical_crossentropy'. The class_weights will be a dictionary of 3 elements where keys are the integer code of one hot vectors

For example, class_weights = {1: 5, 2:3, 4:1} where 1,2 and 4 are the code of one hot vectors 100, 010, and 001 respectively
Reply all
Reply to author
Forward
0 new messages