Binary classification with heavily biased training set

Klemen Grm

unread,

Dec 9, 2015, 9:02:32 AM12/9/15

to Keras-users

I'm trying to train a Sequential model comprising convolutional and dense layers for a binary classification task (scalar output with 0.0 / 1.0 values) on image data. However, my training set is heavily biased in favour of negative matches (~10% of training set has outputs 1.0, with the rest 0.0). This leads to training converging towards a MAE of 0.1, with most positive matches in the test set falsely rejected. Is there a more appropriate loss function or model architecture I could use for such a problem?

Kris Cao

unread,

Dec 9, 2015, 9:24:15 AM12/9/15

to Keras-users

You could just restrict the number of negative samples you train on? I think that will probably work better than tweaking the model architecture.

Severin Bühler

unread,

Dec 9, 2015, 12:45:31 PM12/9/15

to Keras-users

1. Encode your output like that: Class 1 = [1,0], class 2 = [0,1]

I've had a similar task with ratio 1:9 not a long time ago and also two classes. It worked without adjustments.

2. Remove all your regularization (inclusive dropout!) and have a look if your model converges.

3. Be sure the two classes are set right. If you've loaded or shuffled your samples in a way that your the classes are wrong, your model is going to learn nothing.

4. Post your code here. All we can do without seeing a piece of code is guessing.

Message has been deleted

Amit Beka

unread,

Dec 10, 2015, 12:19:44 PM12/10/15

to Klemen Grm, Keras-users

If your training accuracy is 97% and test accuracy is 90%, it smells as if you are overfitting (which is good if you don't have regularization/dropout). Now try to add regularization and/or dropout and add an EarlyStopping callback so you can stop training when validation accuracy starts to drop

On Thu, Dec 10, 2015 at 3:34 PM, Klemen Grm <kleme...@gmail.com> wrote:

I took this advice, turned my output data into a categorical vetor, removed all forms of normalisation, and achieved 97% accuracy on training data, however, the testing dataset accuracy converges towards ~90%. My code is as follows:

import numpy as np import sys, os from keras.models import Sequential from keras.layers.core import Activation, Dense, Dropout, Flatten, Reshape from keras.layers.convolutional import Convolution2D, MaxPooling2D from keras.layers.advanced_activations import PReLU from keras.regularizers import l2 from keras.optimizers import SGD trX = np.load(base + "trX.npy") # training data, shape (120000, 3, 128, 128) trY = to_categorical(np.load(base + "trY.npy")).astype("float32") # training set outputs teX = np.load(base + "teX.npy") teY = to_categorical(np.load(base + "teY.npy")).astype("float32") print trX.shape print trY.shape print teX.shape print teY.shape print "compiling model." if params in os.listdir(base): model = VGG_like_SL(weights_path=base+params) print "Model parameters loaded." else: model = VGG_like_SL() print "Model parameters initialised." print model.layers[0].get_weights()[0].shape decay = 1e-4 sgd = SGD(lr=0.01, momentum=0.9, decay=decay) model.compile(loss="categorical_crossentropy", optimizer=sgd) for layer in model.layers: print layer.output_shape val_loss, val_acc = model.evaluate(teX, teY, show_accuracy=True) print "Initial (loss, acc): " + str((val_loss, val_acc)) print "Running gradient descent" for i in xrange(1000): lr = model.optimizer.lr.get_value() it = model.optimizer.iterations.get_value() rate = lr / (1. + decay*it) print "Epoch %02d, lr=%1.4f:" % (i, rate) print "Best validation loss so far: %1.4f" % val_loss model.fit(trX, trY, batch_size=128, nb_epoch=1, validation_data=(teX,teY), show_accuracy=True) new_loss, new_acc = model.evaluate(teX, teY, show_accuracy=True) if new_acc > val_acc: model.save_weights(base + params, overwrite=True) print "Model checkpoint saved." val_loss = new_loss val_acc = new_acc

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/79ce375f-d657-4cfe-9b45-4ca01edcf669%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

jerr...@gmail.com

unread,

Jul 31, 2018, 12:12:57 AM7/31/18

to Keras-users

Any reasons why we need to remove regularizations?

在 2015年12月9日星期三 UTC-5下午12:45:31，Severin Bühler写道：

Reply all

Reply to author

Forward