sklearn2pmml: How to choose the PMML version when outputting SKLearn model?

1,993 views
Skip to first unread message

Dan

unread,
Dec 23, 2015, 12:48:53 PM12/23/15
to Java PMML API
Hi, I am using sklearn2pmml (version 0.5.2) to output my Logistic Regression model to PMML format.

I require that the PMML file be in format version 4.1 (not the latest 4.2.1).

How can I get sklearn2pmml to write my PMML using the schema for 4.1 or any specific version?

I don't see an option to pass a PMML version in the API:
def sklearn2pmml(estimator, mapper, pmml, verbose = False)

Thanks!

Villu Ruusmann

unread,
Dec 23, 2015, 2:05:22 PM12/23/15
to Java PMML API
Hi Dan,

> I am using sklearn2pmml (version 0.5.2) to output my Logistic Regression model to PMML format.
>

Are you working with regression- or classification-type models? It is
important to know, because they are encoded differently.

> I require that the PMML file be in format version 4.1 (not the latest 4.2.1).
>

Sklearn2pmml and its sister project r2pmml intentionally target the
latest version of PMML specification. The idea is to take advantage of
new PMML language features, so that it would be possible to express
the same concepts with less code.

PMML 4.2(.1) models can be back-ported to earlier versions by
identifying and re-encoding incompatible features.

> How can I get sklearn2pmml to write my PMML using the schema for 4.1 or any specific version?
>
> I don't see an option to pass a PMML version in the API:
> def sklearn2pmml(estimator, mapper, pmml, verbose = False)
>

There is no command-line option for that at the moment.

First, you could try manual back-porting by taking the following two actions:
1) On the first line of the PMML document, replace PMML 4.2 namespace
declaration with the PMML 4.1 one. For example:
<PMML xmlns="http://www.dmg.org/PMML-4_2" version="4.2"> should become
<PMML xmlns="http://www.dmg.org/PMML-4_1" version="4.1">
2) Go through all MiningSchema elements of the PMML document. If the
value of the "usageType" attribute of a MiningField element is
specified as "target", replace it with "predicted". For example:
<MiningField name="Species" usageType="target"/> should become
<MiningField name="Species" usageType="predicted"/>

After these modifications, is your PMML 4.1-compliant scoring engine
able to score this PMML document? If so, then simply write a small
Python function that does those string replacements for you next time.

Otherwise, please open an issue in the JPMML-SkLearn issue tracker,
and describe your use case (example Python code?) in more detail:
https://github.com/jpmml/jpmml-sklearn/issues


VR
Message has been deleted

Dan

unread,
Dec 23, 2015, 5:18:32 PM12/23/15
to Java PMML API
Hi Villu, thanks for the quick reply.

My model is classification type.

I am using the Java JPMML evaluator API to load the PMML and test my model. Funnily enough, I actually figured out that I needed to make those 2 changes, but since I encountered the exception below afterwards, I thought it might be because of some other more complicated 4.1/4.2.1 difference.

I am using version 1.0.8 of the jpmml-evaluator because I cannot use the latest version which has the Guava dependency due to integration reasons.

This is the exception:
Exception in thread "main" org.jpmml.evaluator.EvaluationException
at org.jpmml.evaluator.ParameterUtil.toDouble(ParameterUtil.java:651)
at org.jpmml.evaluator.ParameterUtil.cast(ParameterUtil.java:527)
at org.jpmml.evaluator.FunctionUtil.evaluate(FunctionUtil.java:64)
at org.jpmml.evaluator.FunctionUtil.evaluate(FunctionUtil.java:37)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:168)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:69)
at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:80)
at org.jpmml.evaluator.RegressionModelEvaluator.evaluate(RegressionModelEvaluator.java:54)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:194)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:109)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:50)

Apparently the field name 'value' has a value of null which is trying to be cast to double. It seems to be a problem with the DefineFunction in the TransformationDictionary in the PMML. Here it is:

<TransformationDictionary>
<DefineFunction name="logit" optype="continuous" dataType="double">
<ParameterField name="value" optype="continuous" dataType="double"/>
<Apply function="/">
<Constant dataType="double">1</Constant>
<Apply function="+">
<Constant dataType="double">1</Constant>
<Apply function="exp">
<Apply function="*">
<Constant dataType="double">-1</Constant>
<FieldRef field="value"/>
</Apply>
</Apply>
</Apply>
</Apply>
</DefineFunction>
</TransformationDictionary>

Should I go ahead and file this as an issue?

Thanks Villu!

Villu Ruusmann

unread,
Dec 23, 2015, 6:27:34 PM12/23/15
to Java PMML API
Hi Dan,

Sklearn2pmml is generating a correct DefineFunction element for the
"logit" field. Moreover, this DefineFunction element appears to be
fully PMML 4.1 compatible, because it uses basic math operators such
as "/", "+", "*" and "exp" that have been around forever.

When I look at your exception stack trace, then it's caused by broken
field value propagation (inside OutputUtil class) in JPMML version
1.0.8. In other words, you can't resolve this exception by rearranging
PMML content. You would need to fix the JPMML library itself.

>
> I am using version 1.0.8 of the jpmml-evaluator because I cannot use the latest version which has the Guava dependency due to integration reasons.
>

JPMML version 1.0.8 is a really-really outdated version. You shouldn't
be using it today, because it is incomplete and/or incorrect in
several key aspects.

Are you stuck with it because it's the latest version that didn't
depend on Google Guava library? There are easy workarounds available
for solving library mismatches (where your application stack needs one
version of Guava, and JPMML(-Evaluator) needs some other version of
Guava). Typically, this is solved by using the Apache Maven Shade
Plugin (https://maven.apache.org/plugins/maven-shade-plugin/) to
relocate conflicting classes to another package.

For example, JPMML-Spark example application uses this approach to
resolve multiple library conflicts between Apache Spark/Apache Hadoop
core and JPMML-Evaluator library. Please see
https://github.com/jpmml/jpmml-spark/blob/master/pmml-spark-example/pom.xml,
lines 56 -- 96. If you need more help with this, just ask me.


VR

Dan

unread,
Dec 24, 2015, 8:30:00 AM12/24/15
to Java PMML API
Hi Villu,

Yes, we are stuck with Guava version 11.0.2 and since JPMML is dependent on 19.0, our entire problem results from the fact that 19.0 is not backward compatible with 11.0.2 (Google removed classes along the way).

As per your suggestion, my team and I decided to use the Maven Shade plugin to relocate the conflicting Guava classes.

Did you intend that we must also use the ClassLoader to load the relocated classes from within our code?

Thanks.

Villu Ruusmann

unread,
Dec 24, 2015, 9:43:38 AM12/24/15
to Java PMML API
Hi Dan,

>
> As per your suggestion, my team and I decided to use the
> Maven Shade plugin to relocate the conflicting Guava classes.
>

Essentially, the Maven Shade Plugin walks through all Java bytecode
resources, and replaces all occurrences of "com.google.common" with
"com.google.common18_0". You may pick any package naming scheme that
makes sense for you and your team. For example, you might also use
"org.jpmml.com.google.common".

Please see Maven Shade Plugin documentation for extra configuration
options. For example, by defining the "artifactSet" child element it
is possible to narrow Java bytecode modification operations to
specific modules. The JPMML-Cascading example application uses this
approach to avoid messing with Cascading's Guava dependency. Please
see https://github.com/jpmml/jpmml-cascading/blob/master/pmml-cascading-example/pom.xml
lines 63 -- 68.

> Did you intend that we must also use the ClassLoader to load the relocated classes from within our code?
>

You don't need to change anything about your application code and/or
its execution environment.


VR

Dan

unread,
Dec 24, 2015, 10:00:44 AM12/24/15
to Java PMML API
Hi Villu,

Just to clear things up: our project is not directly dependent on Guava. We are dependent on JPMML which is dependent on Guava 19.0. But a parent project, which is strictly dependent on Guava 11.0.2, is dependent on our project too.

Ok, we tried the Shade example from the JPMML-Cascading example but our parent project still loads 11.0.2 only.

We noticed that the class folder com/google/common did indeed change to com/google/common19 in our JAR. However, is Shade supposed to be responsible for changing the imports in the JPMML classes too?
import com.google.common..... ==> import com.google.common19.....

Thanks.

Villu Ruusmann

unread,
Dec 24, 2015, 10:18:48 AM12/24/15
to Java PMML API
Hi Dan,

>
> However, is Shade supposed to be responsible for changing the imports in the JPMML classes too?
> import com.google.common..... ==> import com.google.common19.....
>

Exactly, the Maven Shade Plugin rewrites JPMML classes so that they
will import "com.google.common19_0" classes instead of default
"com.google.common" classes.

You can verify it experimentally by 1) unzipping your uber-JAR file,
2) deleting "common19_0" subdirectory and 3) zipping it again. If you
try to execute such uber-JAR file, then it would fail with a
java.lang.ClassNotFoundException stating that some
"com.google.common19_0" class cannot be found.

By default, the Maven Shade Plugin rewrites all classes in the
uber-JAR file. If you want to keep some classes unmodified, then you
can use this "artifactSet" mechanism to include/exclude specific
modules.


VR

Dan

unread,
Dec 30, 2015, 10:47:34 AM12/30/15
to Java PMML API
Hi Villu!

We eventually got it working using the Shade plugin and will stick with the latest JPMML version.

Sorry the topic deviated from sklearn2pmml to support for Maven Shade.

Thanks a lot!

Reply all
Reply to author
Forward
0 new messages