Multi-category Classifiers

Stephen Mallette

unread,

Aug 23, 2013, 7:30:13 AM8/23/13

to cognitiv...@googlegroups.com

First of all, as a first time poster to your mailing list, thanks for contributing this body of code to the open source community. I've been working with the Cognitive Foundry library for the past week or so and by looking at the source code and unit tests, I've been able to make some good progress. Thus far, I have been pleased with the results.

As I've been going through the algorithms, I noticed that some, like the SVM algorithms, are binary classifiers. Are there any plans in the near future to advance SVM to be multi-category classifiers?

Thanks,

Stephen

Justin Basilico

unread,

Aug 25, 2013, 11:58:58 PM8/25/13

to cognitiv...@googlegroups.com

Hi Stephen,

Thanks for the interest and feedback, we like to know that people out there are finding the Foundry to be useful. What are you using it for?

In terms of multi-class SVMs, we do provide some adapter classes like BinaryVersusCategorizer that adapts multi-class problems to be binary ones. This one takes the pairwise approach. I think there is a "winner take all" version too, though it may not directly interact with the SVM learners in a convenient way. Was there a particular approach to mutli-category SVMs that you were interested in?

Thanks, : )

Justin

--
You received this message because you are subscribed to the Google Groups "Cognitive Foundry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cognitive-foun...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Stephen Mallette

unread,

Aug 26, 2013, 8:30:18 AM8/26/13

to cognitiv...@googlegroups.com

>Thanks for the interest and feedback, we like to know that people out there are finding the Foundry to be useful. What are you using it for?

I'm a primary contributor to both the TinkerPop (http://www.tinkerpop.com/) and Aurelius (http://thinkaurelius.com/) graph technology stacks. I'm working on a graph-oriented feature extraction and machine learning toolkit that will become part of those stacks. This library will have different algorithmic providers that plug-in to it and I'd like Cognitive Foundry to be the default provider.

> In terms of multi-class SVMs, we do provide some adapter classes like BinaryVersusCategorizer that adapts multi-class problems to be binary ones. This one takes the pairwise approach. I think there is a "winner take all" version too, though it may not directly interact with the SVM learners in a convenient way. Was there a particular approach to mutli-category SVMs that you were interested in?

I will take a look at BinaryVersusCategorizer....thanks. I'm not sure If i'm looking for a particular multi-category SVM approach...perhaps I'm just looking for a reasonable approach to do so. Maybe some more information on my project would help...

I'm trying to build a generalized "Classifier" interface that will allow users to plugin naive bayes, decision tree, svm, etc., depending on their needs. I think that this is akin to the Classifier interface in Weka:

http://weka.sourceforge.net/doc.dev/weka/classifiers/Classifier.html

Given my current understanding of Cognitive Foundry, I don't immediately see how to generalize in that way as not all algorithms have the same behaviors. For example, I wanted to do something similar to that Classifier interface I referenced and provide two functions: predictMostLikely and predictDistribution. I was able to implement both functions for Naive Bayes with VectorNaiveBayesCategorizer. For decision tree, i was able to use CategorizationTree to do the predictMostLikely function but couldn't quite figure out how to do predictDistribution.

Any additional thoughts you might have would be helpful...thanks,

Stephen

Justin Basilico

unread,

Aug 31, 2013, 1:41:49 AM8/31/13

to cognitiv...@googlegroups.com

Cool. Those projects look interesting.

The most similar interface we have for doing multi-class classification is Categorizer that provides the evaluate (predictMostLikely in your case), though we do separate the learning algorithm from the learned object. We don't have a specific interface for predicting a distribution versus the maximum likelihood, though some of them do provide that capability, as you've found. There is the DiscriminantCategorizer, though that is mostly currently used for being able to produce rankings or confidence of outputs.

I don't think the CategorizationTree currently stores the category distribution per leaf node, though it is probably something we could add. I have been thinking that some applications want to store different data in the nodes of a tree besides just the label, so another approach might be to make it so that a mechanism for creating the tree nodes can be provided. That may be a little overkill in your case though.

It may be good to add an interface for this as well for a more standard way to get out the category distribution. The output could maybe either be a Vector or some kind of ScalarMap. I'd say a DataDistribution, though with something like a neural network you may get negative values for category estimates, which are more scores than probabilities.