One Vs Rest Classifier in Keras

Omar

unread,

Feb 16, 2016, 11:43:39 AM2/16/16

to Keras-users

HI I have an MNIST-type of dataset with 10 classes and datasets for training/validation/testing.

In sklearn, I use the following function to train n- binary estimators for n classes:

http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

class sklearn.multiclass.OneVsRestClassifier(estimator, n_jobs=1)

where:

estimator: An estimator object implementing fit and one of decision_function or predict_proba.

Is there a way to implement this function with a Keras model?

Thank you.

Anuj Gupta

unread,

Feb 17, 2016, 1:25:08 AM2/17/16

to Keras-users

To increase the scope of what Omar is asking

is there a way to implement
class sklearn.svm.OneClassSVM http://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html

Omar

unread,

Feb 18, 2016, 9:30:15 AM2/18/16

to Keras-users

The neural networks require the labels to be in "categorical" form.

Does it mean that it is training 1 network per class? as in 1 vs all fashion?

Best,

Omar

daniel...@datarobot.com

unread,

Feb 19, 2016, 8:41:30 AM2/19/16

to Keras-users

Omar,

There are a number of ways you can set up your network(s) for this problem, but if you are using publicly available code, it almost certainly isn't creating ten networks.

Instead, it has a single network with some number of layers, and then the last layer is a 10-way softmax. So, the last layer (the Softmax) is what takes the information about the image that is encoded by the lower layers, and translates that into a prediction about how likely the image is to be in class 1 (the written number "1"), class 2, (the written number "2"), class 3 (the written number "3"), etc.

---

In practice, this is more computationally efficient than training ten separate networks. If you wanted to train 10 totally separate networks, the easiest way to do that would be with a for loop that iterates through the ten classes, creates a binary variable indicating which observations are in that class, and then builds a network to predict that binary variable.

But if I were in your shoes, I'd prefer what the conventional 10-way softmax is doing.

Dan

Omar

unread,

Feb 19, 2016, 9:48:58 AM2/19/16

to Keras-users, daniel...@datarobot.com

Thank you for your recommendations Daniel.

The problem I am working on requires binary classification from a multiclass problem. It requires building a model per class. I want to know the most efficient way to achieve this with Keras. The way that you mention seems the "brute force" option. Is there a better way? One option cold be to paralleling the loop for example.

Best,

Omar

Daniel Becker

unread,

Feb 19, 2016, 10:05:25 AM2/19/16

to Omar, Keras-users

Hi Omar,

I agree that the approach with the for-loop is inefficient and inelegant, which is why I recommended you not do it.

But I'm having trouble imagining a circumstance where it matters whether you built a separate model for each class vs building one network with a softmax at the end. In general, if you want to estimate probabilities for each class, the network with the softmax at the end is the way efficient way to do it.

Parallelizing the loop shouldn't help like you think, because computation for a single network parallelizes well. So I'd expect training a single network to use most of the CPU/GPU cycles you have in most cases.

Dan

Omar

unread,

Feb 19, 2016, 10:26:20 AM2/19/16

to Keras-users, omar.cost...@gmail.com, daniel...@datarobot.com

I think that the key point here is the layer before the softmax.

since depending of the dimensions of dense(X) is the way that the softmax will determine the n classes that it has to assign probabilities.

Maybe one way to optimize this is to train one network as multiclass and then change the last dense layer to do it binary according to each of the 10 classes. but I don't know if this would work since there are 10 different cases.

Any thoughts on how to do this?

Daniel Becker

unread,

Feb 19, 2016, 10:47:43 AM2/19/16

to Omar, Keras-users

Omar,

You get to choose the size of that last dense layer. In this case, adding a dense(10) layer will solve your problem.

For an example, see https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py

Sander Stepanov

unread,

Mar 27, 2016, 10:57:01 AM3/27/16

to Keras-users, omar.cost...@gmail.com, daniel...@datarobot.com

but if data strongly imbalanced then on vs rest may help

Reply all

Reply to author

Forward