Hi Nayan,
>
> Can it be tweaked to output probabilities i.e 0.6 for class 1
>
> <Node score="1" recordCount="5.0">
> <True/>
> <ScoreDistribution value="0" recordCount="2.0"/>
> <ScoreDistribution value="1" recordCount="3.0"/>
> </Node>
>
The ScoreDistribution@recordCount is a required attribute according to
the PMML specification:
http://dmg.org/pmml/v4-3/TreeModel.html#xsdElement_ScoreDistribution
So, it's not allowed to "replace" ScoreDistribution@recordCount with
ScoreDistribution@probability. The best that can be done is to define
attributes:
<Node>
<True/>
<ScoreDistribution value="0" recordCount="2.0" probability="0.4"/>
<ScoreDistribution value="1" recordCount="3.0" probability="0.6"/>
</Node>
It's not a good idea to duplicate data like this, because it would
increase the size of the PMML file a lot (might not be an issue for
DecisionTreeClassifier, but will definitely be for
RandomForestClassifier).
If you have absolute record counts, then you can calculate
probabilities on the fly (but you can't do the opposite!):
https://github.com/jpmml/jpmml-evaluator/blob/1.5.1/pmml-evaluator/src/main/java/org/jpmml/evaluator/tree/TreeModelEvaluator.java#L356-L448
Perhaps this method should be extracted into a separate utility method
to make it easier to reuse, but that's another story.
I've explained my view on this ScoreDistribution issue before here:
https://github.com/jpmml/r2pmml/issues/59
VR