r2pmml error on converting a ranger RF classification/prbability model

179 views
Skip to first unread message

Z Ye

unread,
Feb 27, 2017, 3:03:55 PM2/27/17
to Java PMML API
Hi, All,

When I tried to convert a ranger RF model to PMML using r2pmml package, the following error shows up. It is a classification tree but I'd like to have class probability instead of label as the output. Looks like the r2pmml conversion doesn't support the type w/ probability=T? Any idea how to get this work?

Thanks,
ZYe

x<-dtrc #training data
x$DD[x$DD=="C0"]<-0
x$DD[x$DD=="C1"]<-1
x$DD<-as.factor(x$DD)
cla_rf<-ranger(DD ~., data=x, num.trees=200, case.weights=sw, classification=TRUE, importance="impurity", mtry=6, write.forest=TRUE,
probability=TRUE)
r2pmml(cla_rf, variable.levels=sapply(x, levels), paste("./src/cla_rf_", reg, ".pmml", sep=""))


====================
Feb 27, 2017 2:45:57 PM org.jpmml.rexp.Main run
INFO: Parsing RDS..
Feb 27, 2017 2:45:57 PM org.jpmml.rexp.Main run
INFO: Parsed RDS in 34 ms.
Feb 27, 2017 2:45:57 PM org.jpmml.rexp.Main run
INFO: Initializing default Converter
Feb 27, 2017 2:45:57 PM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.RangerConverter
Feb 27, 2017 2:45:57 PM org.jpmml.rexp.Main run
INFO: Converting..
Feb 27, 2017 2:45:57 PM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException
at org.jpmml.rexp.RangerConverter.encodeModel(RangerConverter.java:136)
at org.jpmml.rexp.RangerConverter.encodeModel(RangerConverter.java:44)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:78)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException
at org.jpmml.rexp.RangerConverter.encodeModel(RangerConverter.java:136)
at org.jpmml.rexp.RangerConverter.encodeModel(RangerConverter.java:44)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:78)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, ...) : 1

Villu Ruusmann

unread,
Feb 27, 2017, 5:37:09 PM2/27/17
to Java PMML API
Hello,

>
> When I tried to convert a ranger RF model to PMML using r2pmml
> package, the following error shows up.
>
> java.lang.IllegalArgumentException
> at org.jpmml.rexp.RangerConverter.encodeModel(RangerConverter.java:136)
> at org.jpmml.rexp.RangerConverter.encodeModel(RangerConverter.java:44)
> at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:78)

Your exception points to a failing "sanity check" in the JPMML-R library:
https://github.com/jpmml/jpmml-r/blob/master/src/main/java/org/jpmml/rexp/RangerConverter.java#L130

I took the liberty to transfer your error report to a new GitHub issue:
https://github.com/jpmml/jpmml-r/issues/4

> It is a classification tree but I'd like to have class probability instead
> of label as the output. Looks like the r2pmml conversion doesn't support
> the type w/ probability=T? Any idea how to get this work?
>

The RangerConverter converter class has a difficulty determining the
type of your ranger model object based on the value of its
ranger$treetype attribute. Apparently, it's something else than
"Regression" or "Classification".

Could look up your "ranger" package version, and the value of the
ranger$treetype attribute, and append this information to the above
GitHub issue?

Looks like newer "ranger" package versions have introduced another
tree type. It's not mentioned in package documentation
(https://cran.r-project.org/web/packages/ranger/ranger.pdf), but your
error report sure points that way.


VR

Z Ye

unread,
Feb 27, 2017, 7:06:14 PM2/27/17
to Java PMML API

Hi Villu,

Many thanks for the quick response. I think the tree type is changed when the parameter "probability" is set as TRUE.

====================
Ranger result

Call:
ranger(DD ~ ., data = x, num.trees = 200, case.weights = sw, classification = TRUE, importance = "impurity", mtry = get(paste("cla_rf", reg, sep = "."))$bestTune$mtry, write.forest = TRUE, probability = TRUE)

Type: Probability estimation
Number of trees: 200
Sample size: 718
Number of independent variables: 47
Mtry: 7
Target node size: 10
Variable importance mode: impurity
OOB prediction error: 0.006251474


==========
If probability = FALSE, the Type becomes "Classification". So, I wonder if in r2pmml, you just need add another condition ("Probability estimation") on "Classification" branch to fix the issue?


Ranger result

Call:
ranger(DD ~ ., data = x, num.trees = 200, case.weights = sw, classification = TRUE, importance = "impurity", mtry = get(paste("cla_rf", reg, sep = "."))$bestTune$mtry, write.forest = TRUE, probability = FALSE)

Type: Classification
Number of trees: 200
Sample size: 718
Number of independent variables: 47
Mtry: 7
Target node size: 1
Variable importance mode: impurity
OOB prediction error: 0.00 %


Thanks,
ZYe

Villu Ruusmann

unread,
Feb 27, 2017, 7:38:30 PM2/27/17
to jpmml
Hi ZYe,

>
> I think the tree type is changed when the parameter "probability" is set as TRUE.
>

I searched the ranger's package documentation
(https://cran.r-project.org/web/packages/ranger/ranger.pdf) for term
"probability", and found the following paragraph:
<quote>
With the probability option (and factor dependent variable) a
probability forest is grown. Predictions are class probabilities for
each sample. In contrast to other implementations, each tree returns a
probability estimate and these estimates are averaged for the forest
probability estimate.
</quote>

In PMML terms it means two things.

First, every leaf Node element must encode the probability
distribution using ScoreDistribution elements:
<Node score="1">
<ProbabilityDistribution score="0" recordCount="100"/>
<ProbabilityDistribution score="1" recordCount="150"/>
</Node>

Second, the value of the Segmentation@multipleModelMethod attribute
must be changed from "majorityVote" to "average". Looks like ranger's
probability forests are structurally identical to Scikit-Learn's
random forest classification models.

This should be fairly easy to implement. If everything goes according
to plans, then I should be able to address this on Thursday. You can
subscribe to the above GitHub issue in order to receive a notification
when it's done.


VR

Z Ye

unread,
Feb 27, 2017, 7:51:47 PM2/27/17
to Java PMML API

Hi Villu,

That would be great! Thank you very much!


Best,
ZYe

Reply all
Reply to author
Forward
0 new messages