I've been making some research about jpmml and how the evaluate Classification method is made in Support Vector Machine. I found that it always returns a decision with a binary probability of the classification. Although I put in pmml file an "Output" like that:
<Output>
<OutputField name="Predicted_IS_A" feature="predictedValue"/>
<OutputField name="Probability_A" feature="probability" value="A"/>
</Output>
The field Probability_A always returns 1.0 or 0.0 (1 if the classification result is A, 0 if belongs to the other class)
There are some libraries, like e1071, able to assign probability to SVM results. For example:
fit.svm <- svm(A ~ .,
data=training,
kernel = "radial",
cost=10,
gamma=0.01,
probability = TRUE)
pr.svm <- predict(fit.svm, training, probability=TRUE)
With that input, SVM is able assign a class to each input value, and assign a probability of belonging to that class.
Is there any way to get probabilities with jpmml-evaluator library?
Thanks in advance!
First of all, thank you very much for your quick response. Just a doubt to understand well the response.
When you talk about "support vectors" vote, do you mean SupportVector PMML object, or SupportVectorMachine PMML object?
I have a pmml with 1 SupportVectorMachine object, and 1387 SupportVector objcts. In the jpmml-evaluator code, it seems see that SupportVectorMachine is the one who votes (debugging shows me that).
What do you mean when you say "it can be more than one SupportVectorMachine per PMML model"?
I will answer better to your whole answer, but I need to be sure to understand ok!
Thanks in advance!
Thanks a lot for your really well explained answers. Now I understand perfectly. The extra algorithm that libSVM executed to get the probability is the key of the question: it seems that is the Platt Scaling in the output of the SVM.
I think you're right about that including in a PMML extension and extend jpmml-evaluator is not a good investment, so you should use another kind of models to classify with probabilities. However, if libSVM supports this, means that lots of people "rely on" this probability, and there could be scenarios with good results.
And I really appreciate your explanation about how many SVM machines are generated depending on the number of classes, and the pseudo-probability voting-based.
So thanks a lot again!
Best regards,
Elena.