Support for Multiclass Classification Models in R (XGBoost)

633 views
Skip to first unread message

Josh Izzard

unread,
Mar 8, 2016, 2:24:26 PM3/8/16
to Java PMML API
I am trying to convert a multi class classification xgboost model into PMML, but I am not sure if this is supported yet.

When I follow along with the example here - https://github.com/jpmml/jpmml-xgboost - it all works fine, and the PMML document scores as I need it to. However when I try to accomplish the same thing with a multi class classification model it throws the following error:

Exception in thread "main" java.lang.IllegalArgumentException: multi:softprob
at org.jpmml.xgboost.Learner.load(Learner.java:66)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:34)
at org.jpmml.xgboost.Main.run(Main.java:92)
at org.jpmml.xgboost.Main.main(Main.java:85)


I am running "java -jar converter-executable-1.0-SNAPSHOT.jar --model-input xgboost.model --fmap-input xgboost.fmap --pmml-output xgboost.pmml" to produce this error. I have attached a sample of my dataset to see what it looks like.

Please advise on whether it is user error or whether {objective: "multi:softprob"} is not supported yet for xgboost in R.

toydata.csv

Villu Ruusmann

unread,
Mar 8, 2016, 4:12:03 PM3/8/16
to Java PMML API
Hi Josh,

> However when I try to accomplish the same thing with a multi class
> classification model it throws the following error:
>
> Exception in thread "main" java.lang.IllegalArgumentException: multi:softprob
> at org.jpmml.xgboost.Learner.load(Learner.java:66)
>

The nice thing about open source software is that you can always open
org/jpmml/xgboost/Learner.java in your favourite text editor and see
what is happening around line 66:
https://github.com/jpmml/jpmml-xgboost/blob/master/src/main/java/org/jpmml/xgboost/Learner.java

Indeed, class Learner knows about regression objectives "reg:linear"
and "reg:logistic", and a binary classification objective
"binary:logistic". It does not know anything about multi-class
classification objective "multi:softprob".

The JPMML-R library (which powers the r2pmml package) uses the
JPMML-XGBoost library for all the heavy lifting in this area, and does
not add any functionality to it. So, the conclusion is that
multi-class classification is not supported at the moment.

The good news is that it shouldn't be difficult to implement, because
it follows common logic (see method SoftmaxMultiClassObj::Transform()
around lines 90 through 123):
https://github.com/dmlc/xgboost/blob/master/src/objective/multiclass_obj.cc

Basically, you need to do the following:
1) Take the list of RegTree objects and slice it into n sublists,
where n is the number of classes.
2) Create a regression-type MiningModel element for every sublist that
you obtained in step #1. This is the "raw boosted value" for a
particular class.
3) Combine those n MiningModel elements that you obtained in step #2
to a classification-type MiningModel using the utility function
org.jpmml.converter.MiningModelUtil#createClassification(...).

The resulting workflow should look pretty much identical to the
encoding of Scikit-Learn's GBM models (see method
GradientBoostingClassifier#encodeModel(Schema) around lines 93 through
108):
https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/java/sklearn/ensemble/gradient_boosting/GradientBoostingClassifier.java


VR

Josh Izzard

unread,
Mar 9, 2016, 9:21:50 AM3/9/16
to Java PMML API
Thanks Villu, appreciate you providing the outline for the process. My team and I will chat about this in the next few days, and if we do end up implementing it we can chat about adding this functionality to r2pmml so that others can use it.

Villu Ruusmann

unread,
Apr 4, 2016, 4:46:05 AM4/4/16
to Java PMML API
Hi Josh,

> I am trying to convert a multi class classification xgboost model
> into PMML, but I am not sure if this is supported yet.
>
> Exception in thread "main" java.lang.IllegalArgumentException: multi:softprob
> at org.jpmml.xgboost.Learner.load(Learner.java:66)
>

The JPMML-XGBoost library now supports both "multi:softmax" and
"multi:softprob" objective functions:
https://github.com/jpmml/jpmml-xgboost/commit/f1670f69109b4d1bceb8342c3a4309b9c534224a

It is also possible to specify the name of the target field, and
rename target field categories from 0-based indexes (eg. "0", "1", ..,
String.valueOf(num_class - 1)) to human-readable strings:
https://github.com/jpmml/jpmml-xgboost/commit/03eb732cf37b95265d8ffc864a602084c791c90f

Additionally, the JPMML-R library now supports multi-class
classification GBM models:
https://github.com/jpmml/jpmml-r/commit/1caaa0741c4d9e17e1ebef08d05be9520fc7dc84

All these updates have been includes into the latest version of the
"r2pmml" package.


VR

Josh Izzard

unread,
Apr 8, 2016, 9:54:23 AM4/8/16
to Java PMML API
Villu,

> The JPMML-XGBoost library now supports both "multi:softmax" and
> "multi:softprob" objective functions:
> https://github.com/jpmml/jpmml-xgboost/commit/f1670f69109b4d1bceb8342c3a4309b9c534224a

Thanks very much for this! We do not have a good Java resource on my team so we hacked together a solution that allowed us to use the LocalTransformations functionality from the "pmml" R package together with jpmml's accurate implementation of the randomForest pmml conversion. I am excited to try out this new functionality.

Josh

Josh Izzard

unread,
May 3, 2016, 3:30:55 PM5/3/16
to Java PMML API
Hi Villu,

The PMML file generated by the .jar file for a multi class xgboost model has a field at the bottom of PMML file like so:

<RegressionModel functionName="classification" normalizationMethod="simplemax">
<MiningSchema>
<MiningField name="MaxOption" usageType="target"/>
<MiningField name="transformedValue_0"/>
<MiningField name="transformedValue_1"/>
</MiningSchema>
<Output>
<OutputField name="probability_0" feature="probability" value="0"/>
<OutputField name="probability_1" feature="probability" value="1"/>
</Output>
<RegressionTable intercept="0.0" targetCategory="0">
<NumericPredictor name="transformedValue_0" coefficient="1.0"/>
</RegressionTable>
<RegressionTable intercept="0.0" targetCategory="1">
<NumericPredictor name="transformedValue_1" coefficient="1.0"/>
</RegressionTable>
</RegressionModel>


It has normalizationMethod "simpleMax".

When I try to use this PMML file to score requests, I get this error (only top of stack included):
"stack": [
"org.jpmml.evaluator.InvalidFeatureException: RegressionModel",
"at org.jpmml.evaluator.RegressionModelEvaluator.normalizeClassificationResult(RegressionModelEvaluator.java:337)",
"at org.jpmml.evaluator.RegressionModelEvaluator.computeBinomialProbabilities(RegressionModelEvaluator.java:264)",
"at org.jpmml.evaluator.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:159)",

Looking in RegressionModelEvaluator, I see the following code:
switch(regressionNormalizationMethod){
case NONE:
return value;
case SIMPLEMAX:
throw new InvalidFeatureException(regressionModel);
case SOFTMAX:
if(classes != 2){
throw new InvalidFeatureException(regressionModel);
}
// Falls through
case LOGIT:
return 1d / (1d + Math.exp(-value));
case PROBIT:
return NormalDistributionUtil.cumulativeProbability(value);
case CLOGLOG:
return 1d - Math.exp(-Math.exp(value));
case LOGLOG:
return Math.exp(-Math.exp(-value));
case CAUCHIT:
return 0.5d + (1d / Math.PI) * Math.atan(value);
default:
throw new UnsupportedFeatureException(regressionModel, regressionNormalizationMethod);


Am I reading this wrong, or is the default normalization method not supported?

Thanks!

Villu Ruusmann

unread,
May 3, 2016, 5:09:36 PM5/3/16
to Java PMML API
Hi Josh,

>
> The PMML file generated by the .jar file for a multi class xgboost model has a field at the bottom of PMML file like so:
>
> <RegressionModel functionName="classification" normalizationMethod="simplemax">
> <MiningSchema/>
> <Output>
> <OutputField name="probability_0" feature="probability" value="0"/>
> <OutputField name="probability_1" feature="probability" value="1"/>
> </Output>
> <RegressionTable intercept="0.0" targetCategory="0">
> <NumericPredictor name="transformedValue_0" coefficient="1.0"/>
> </RegressionTable>
> <RegressionTable intercept="0.0" targetCategory="1">
> <NumericPredictor name="transformedValue_1" coefficient="1.0"/>
> </RegressionTable>
> </RegressionModel>

You're saying that you're doing multi-class classification, but the
above RegressionModel element shows that the target variable has only
two categories - "0" and "1".

So, you should update the value of the XGBoost objective function
argument from "multi:softmax" to "binary:logistic", re-train the
model, and everything should be okay.

>
> When I try to use this PMML file to score requests, I get this error (only top of stack included):
> "stack": [
> "org.jpmml.evaluator.InvalidFeatureException: RegressionModel",
> "at org.jpmml.evaluator.RegressionModelEvaluator.normalizeClassificationResult(RegressionModelEvaluator.java:337)",
> "at org.jpmml.evaluator.RegressionModelEvaluator.computeBinomialProbabilities(RegressionModelEvaluator.java:264)",
> "at org.jpmml.evaluator.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:159)",
>

The stack trace shows that the JPMML-Evaluator library is following
the "binary logistic regression" evaluation path. According to the
PMML specification (see http://dmg.org/pmml/v4-2-1/Regression.html,
right above the first "green box"), a classification-type
RegressionModel element with two RegressionTable elements represents a
special case.

It is my interpretation of the PMML specification that you shouldn't
use the "simplemax" regression normalization method in this special
case. Perhaps I am wrong, but I did't want to take chances with that
one.

Anyway, the lesson is that the JPMML-XGBoost library should be
validate the number of categories, and throw an
IllegalArgumentException (or similar) if you're attempting to use the
"multi:softmax" objective function with a binary variable.


VR
Reply all
Reply to author
Forward
0 new messages