Support Vector Machine: classification problem with probabilities?

952 views
Skip to first unread message

Elena García Peña

unread,
Aug 23, 2016, 7:12:46 AM8/23/16
to Java PMML API
Hi,

I've been making some research about jpmml and how the evaluate Classification method is made in Support Vector Machine. I found that it always returns a decision with a binary probability of the classification. Although I put in pmml file an "Output" like that:

<Output>
<OutputField name="Predicted_IS_A" feature="predictedValue"/>
<OutputField name="Probability_A" feature="probability" value="A"/>
</Output>

The field Probability_A always returns 1.0 or 0.0 (1 if the classification result is A, 0 if belongs to the other class)

There are some libraries, like e1071, able to assign probability to SVM results. For example:
fit.svm <- svm(A ~ .,
data=training,
kernel = "radial",
cost=10,
gamma=0.01,
probability = TRUE)

pr.svm <- predict(fit.svm, training, probability=TRUE)

With that input, SVM is able assign a class to each input value, and assign a probability of belonging to that class.

Is there any way to get probabilities with jpmml-evaluator library?

Thanks in advance!

Villu Ruusmann

unread,
Aug 23, 2016, 10:13:39 AM8/23/16
to Java PMML API
Hi Elena,

>
> I've been making some research about jpmml and how the evaluate
> Classification method is made in Support Vector Machine.

SVM models are not that suitable for handling classification tasks
where it is critical to get a "true" probability distribution.

By default, SVM models give you a voting-based "pseudo" probability
distribution. For example, if a SVM model contains 50 support vectors,
and 41 of them vote for class "A" and 9 vote for class "B", then it
would be reported as {A = 0.82, B = 0.18} "pseudo" probability
distribution.

You're seeing a binary probability, because your model contains only
two support vectors (ie. there are exactly two
SupportVectorMachineModel/SupportVectorMachine elements in your PMML
document), right?

>
> There are some libraries, like e1071, able to assign probability to SVM results. For example:
> fit.svm <- svm(A ~ .,
> data=training,
> kernel = "radial",
> cost=10,
> gamma=0.01,
> probability = TRUE)
>
> pr.svm <- predict(fit.svm, training, probability=TRUE)
>

Indeed, there are separate algorithms for estimating "truer"
probability distributions.

However, from a practical point of view, there are the following issues:
1) They need extra information (in addition to existing support
vectors). For example, svm(.., probability = TRUE) instructs the
preparation of such extra information.
2) They are executed separately from the main prediction logic. For
example, predict.svm(.., probability = TRUE) would first compute the
predicted class label using the "classical" SVM algorithm, and then
launch a separate computation for estimating the probability
distribution. It may happen that these two computations yield
contradictory results. For example, the "classical" SVM algorithm may
predict class "A", but the estimated probabilities may be {A = 0.45, B
= 0.55}. Then what?

PMML does not support such separate algorithms out-of-the-box.
Theoretically, they could be implemented by embedding the necessary
extra information into the SupportVectorMachineModel element using
PMML extension mechanism, and adding appropriate handlers at
JPMML-Evaluator library level. Practically, it doesn't seem like a
good investment.


VR

Elena García Peña

unread,
Aug 23, 2016, 10:55:54 AM8/23/16
to Java PMML API
Hi Villu,

First of all, thank you very much for your quick response. Just a doubt to understand well the response.

When you talk about "support vectors" vote, do you mean SupportVector PMML object, or SupportVectorMachine PMML object?

I have a pmml with 1 SupportVectorMachine object, and 1387 SupportVector objcts. In the jpmml-evaluator code, it seems see that SupportVectorMachine is the one who votes (debugging shows me that).

What do you mean when you say "it can be more than one SupportVectorMachine per PMML model"?

I will answer better to your whole answer, but I need to be sure to understand ok!

Thanks in advance!

Villu Ruusmann

unread,
Aug 23, 2016, 2:25:47 PM8/23/16
to Java PMML API
Hi Elena,

>
> When you talk about "support vectors" vote, do you mean
> SupportVector PMML object, or SupportVectorMachine PMML object?
>

Sorry about improper choice of terms. I meant the SupportVectorMachine
element (together with its children) when I was talking about "support
vectors".

>
> What do you mean when you say "it can be more than one SupportVectorMachine per PMML model"?
>

Most SVM implementations are based on the LibSVM library
(https://www.csie.ntu.edu.tw/~cjlin/libsvm/). LibSVM performs the
training of classification-type SVM models using the one-vs-one
strategy - in total there will be (n * (n - 1)) / 2
SupportVectorMachine elements generated, where n is the number of
target categories.

In your case (binary classification), n = 2 and there will be exactly
one SupportVectorMachine element (not two as suggested in my first
e-mail). In case of the Iris dataset (multi-class classification), n =
3 and there will be three SupportVectorMachine elements generated.

As the number of SupportVectorMachine elements grows, you'll start
seeing some variance in estimated class probability values. For the
Iris dataset, class probabilities are multiples of 0.333:
*) Iris model: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-rattle/src/test/resources/pmml/KernlabSVMIris.pmml
*) Iris "pseudo" probability distribution:
https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-rattle/src/test/resources/csv/KernlabSVMIris.csv


VR

Elena García Peña

unread,
Aug 25, 2016, 4:22:27 PM8/25/16
to Java PMML API

Hi Villu,

Thanks a lot for your really well explained answers. Now I understand perfectly. The extra algorithm that libSVM executed to get the probability is the key of the question: it seems that is the Platt Scaling in the output of the SVM.

I think you're right about that including in a PMML extension and extend jpmml-evaluator is not a good investment, so you should use another kind of models to classify with probabilities. However, if libSVM supports this, means that lots of people "rely on" this probability, and there could be scenarios with good results.

And I really appreciate your explanation about how many SVM machines are generated depending on the number of classes, and the pseudo-probability voting-based.

So thanks a lot again!

Best regards,

Elena.

Villu Ruusmann

unread,
Aug 26, 2016, 7:49:35 AM8/26/16
to Java PMML API
Hi Elena,

>
> However, if libSVM supports this, means that lots of people "rely on"
> this probability, and there could be scenarios with good results.
>

Truer probabilities are an advanced feature of (Lib)SVM, and they are
typically "off".

For example, in your R code, you had manually activated this feature
by setting "probability = TRUE". It was assumed that you had consulted
with R documentation about the existence of this option, and its
potential side-effects.

The same goes for Python/Scikit-Learn. All LibSVM-based classifiers
expect you to activate this feature manually. See the description (and
the associated warning "will slow down the method") of the
"probability" parameter:
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

It is not currently implemented, but the JPMML-SkLearn converter
should detect if the "probability" parameter was active, and if it
was, then refuse to convert the SVC class instance with an error
message: "The classifier requests the use of Platt Scaling-powered
probabilities, which is not supported in PMML. Please re-train the
classifier with probability=False".

Scikit-Learn provides alternative SVM classifier in the form of
LinearSVC: http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC
This class doesn't have predict_proba() method, so it cannot be
mis-used in this context.

If you absolutely need to have a SVM classifier with probabilities
(that is representable in PMML), then you could try splitting your
training dataset into n subsets, train a separate SVM classifier for
each subset, and aggregate their predictions (consensus method). For
example, if you have 10 subset-based SVM classifiers, and 8 of them
predict class A and 2 of them class B, then the probability
distribution would be {A = 0.8, B = 0.2}.


VR
Reply all
Reply to author
Forward
0 new messages