Probability score instead of class prediction Decision Tree Classifier sklearn2pmml

80 views
Skip to first unread message

NAYAN GUPTA

unread,
May 20, 2020, 7:12:24 AM5/20/20
to Java PMML API
Hi Villu, 

I am using sklearn2pmml to export a Decision Tree model as a PMML file and it is working. 
However  the PMML file does not generate the probability of class prediction, it generates class predictions directly. 

Is there a way to handle this in the library 

For example this is a leaf node from decision tree PMML . There are 5 records in the leaf node out of which 2 records belong to class 0 and 3 records belong to class 1. 
The tree predicts the score as "1"  . 
Can it be tweaked to output probabilities i.e 0.6  for class 1 

Thanks 
<Node score="1" recordCount="5.0">
<True/>
<ScoreDistribution value="0" recordCount="2.0"/>
<ScoreDistribution value="1" recordCount="3.0"/>
</Node>


Villu Ruusmann

unread,
May 20, 2020, 12:49:36 PM5/20/20
to Java PMML API
Hi Nayan,

>
> Can it be tweaked to output probabilities i.e 0.6 for class 1
>
> <Node score="1" recordCount="5.0">
> <True/>
> <ScoreDistribution value="0" recordCount="2.0"/>
> <ScoreDistribution value="1" recordCount="3.0"/>
> </Node>
>

The ScoreDistribution@recordCount is a required attribute according to
the PMML specification:
http://dmg.org/pmml/v4-3/TreeModel.html#xsdElement_ScoreDistribution

So, it's not allowed to "replace" ScoreDistribution@recordCount with
ScoreDistribution@probability. The best that can be done is to define
attributes:
<Node>
<True/>
<ScoreDistribution value="0" recordCount="2.0" probability="0.4"/>
<ScoreDistribution value="1" recordCount="3.0" probability="0.6"/>
</Node>

It's not a good idea to duplicate data like this, because it would
increase the size of the PMML file a lot (might not be an issue for
DecisionTreeClassifier, but will definitely be for
RandomForestClassifier).

If you have absolute record counts, then you can calculate
probabilities on the fly (but you can't do the opposite!):
https://github.com/jpmml/jpmml-evaluator/blob/1.5.1/pmml-evaluator/src/main/java/org/jpmml/evaluator/tree/TreeModelEvaluator.java#L356-L448

Perhaps this method should be extracted into a separate utility method
to make it easier to reuse, but that's another story.

I've explained my view on this ScoreDistribution issue before here:
https://github.com/jpmml/r2pmml/issues/59


VR
Reply all
Reply to author
Forward
0 new messages