One Vs Rest Classifier in Keras

2,570 views
Skip to first unread message

Omar

unread,
Feb 16, 2016, 11:43:39 AM2/16/16
to Keras-users
HI I have an MNIST-type of dataset with 10 classes and datasets for training/validation/testing.

In sklearn, I use the following function to train n- binary estimators for n classes:


class sklearn.multiclass.OneVsRestClassifier(estimatorn_jobs=1)

where:
estimator: An estimator object implementing fit and one of decision_function or predict_proba.

Is there a way to implement this function with a Keras model? 

Thank you.



Anuj Gupta

unread,
Feb 17, 2016, 1:25:08 AM2/17/16
to Keras-users
To increase the scope of what Omar is asking

is there a way to implement
class sklearn.svm.OneClassSVM
http://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html


Omar

unread,
Feb 18, 2016, 9:30:15 AM2/18/16
to Keras-users
The neural networks require the labels to be in "categorical" form.
Does it mean that it is training 1 network per class? as in 1 vs all fashion?

Best,
Omar

daniel...@datarobot.com

unread,
Feb 19, 2016, 8:41:30 AM2/19/16
to Keras-users
Omar,

There are a number of ways you can set up your network(s) for this problem, but if you are using publicly available code, it almost certainly isn't creating ten networks.

Instead, it has a single network with some number of layers, and then the last layer is a 10-way softmax.  So, the last layer (the Softmax) is what takes the information about the image that is encoded by the lower layers, and translates that into a prediction about how likely the image is to be in class 1 (the written number "1"), class 2, (the written number "2"), class 3 (the written number "3"), etc.

---

In practice, this is more computationally efficient than training ten separate networks. If you wanted to train 10 totally separate networks, the easiest way to do that would be with a for loop that iterates through the ten classes, creates a binary variable indicating which observations are in that class, and then builds a network to predict that binary variable.

But if I were in your shoes, I'd prefer what the conventional 10-way softmax is doing.
Dan

Omar

unread,
Feb 19, 2016, 9:48:58 AM2/19/16
to Keras-users, daniel...@datarobot.com
Thank you for your recommendations Daniel.
The problem I am working on requires binary classification from a multiclass problem.  It requires building a model per class. I want to know the most efficient way to achieve this with Keras. The way that you mention seems the "brute force" option. Is there a better way? One option cold be to paralleling the loop for example.

Best,
Omar

Daniel Becker

unread,
Feb 19, 2016, 10:05:25 AM2/19/16
to Omar, Keras-users
Hi Omar,

I agree that the approach with the for-loop is inefficient and inelegant, which is why I recommended you not do it.

But I'm having trouble imagining a circumstance where it matters whether you built a separate model for each class vs building one network with a softmax at the end. In general, if you want to estimate probabilities for each class, the network with the softmax at the end is the way efficient way to do it.

Parallelizing the loop shouldn't help like you think, because computation for a single network parallelizes well. So I'd expect training a single network to use most of the CPU/GPU cycles you have in most cases.

Dan

Omar

unread,
Feb 19, 2016, 10:26:20 AM2/19/16
to Keras-users, omar.cost...@gmail.com, daniel...@datarobot.com
I think that the key point here is the layer before the softmax.
since depending of the dimensions of dense(X) is the way that the softmax will determine the n classes that it has to assign probabilities.
Maybe one way to optimize this is to train one network as multiclass and then change the last dense layer to do it binary according to each of the 10 classes.  but I don't know if this would work since there are 10 different cases.
Any thoughts on how to do this?

Daniel Becker

unread,
Feb 19, 2016, 10:47:43 AM2/19/16
to Omar, Keras-users
Omar,

You get to choose the size of that last dense layer.  In this case, adding a dense(10) layer will solve your problem.

Sander Stepanov

unread,
Mar 27, 2016, 10:57:01 AM3/27/16
to Keras-users, omar.cost...@gmail.com, daniel...@datarobot.com
but if data strongly  imbalanced then on vs rest may help
Reply all
Reply to author
Forward
0 new messages