Converting Python to PMML using Scikit-Learn

1,269 views
Skip to first unread message

Cairu Liao

unread,
Jun 27, 2016, 6:26:19 PM6/27/16
to Java PMML API
Hi,

Firstly, sorry for any stupid questions since I'm really a new beginner in this area, but I do hope to get your help.

I've used Zementis's Py2PMML product before to convert models into pmml and then deploy them to ADAPA. Unfortunately, it failded since incompatibility of version between sklearn and Zementis.

That's why I'm right now focusing on converting sklearn models to pmml using jpmml. I've reviewed sklearn2pmml (https://github.com/jpmml/sklearn2pmml). But I can only find .jar files. So I'm wondering whether there are any source code of the real converting process available? Or, how can I make use of these .jar files?

Thanks,
Beset regard.

Cairu Liao

Cairu Liao

unread,
Jun 27, 2016, 7:49:55 PM6/27/16
to Java PMML API
Hi,

Continue asking silly questions:

Is this Java PMML API working for representing PMML schema from DMG in Java code?

Looking forward to a response.

Thanks!

Sincerely,
Cairu Liao

Villu Ruusmann

unread,
Jun 28, 2016, 4:54:04 AM6/28/16
to Java PMML API
Hi Cairu,

>
> I've reviewed sklearn2pmml (https://github.com/jpmml/sklearn2pmml).
> But I can only find .jar files. So I'm wondering whether there are any source
> code of the real converting process available?

The majority of SkLearn-to-PMML conversion logic is contained in
jpmml-sklearn-1.0-SNAPSHOT.jar and jpmml-converter-1.0.7.jar. Other
JAR files are there to provide various helper functions. For example,
pyrolite-4.12.jar handles the parsing of Python pickle files, and
jcommander-1.48.jar handles the parsing of command-line arguments.

The collection of JAR files in the sklearn2pmml/resources directory is
automatically managed by Apache Maven. For example, if you remove all
of them, then you can get them back by executing "mvn clean package"
in the root directory of the project.

You can find the source code of the JPMML-SkLearn library here:
https://github.com/jpmml/jpmml-sklearn

You can find the source code of all the other libraries in Maven
Central repository: http://search.maven.org/

For example, if you want to know what's inside
jpmml-converter-1.0.7.jar, then search the Maven Central repository
for "jpmml-converter", click on the search result, and choose the
"sources.jar" download option. This should redirect you to:
http://search.maven.org/remotecontent?filepath=org/jpmml/jpmml-converter/1.0.7/jpmml-converter-1.0.7-sources.jar

> Or, how can I make use of these .jar files?
>

The conversion logic is implemented in Java.

The 'sklearn2pmml()' function builds a command-line command, and
invokes java.exe with it using the 'subprocess.check_call()' function:
https://github.com/jpmml/sklearn2pmml/blob/master/sklearn2pmml/__init__.py

As a Python end user, this "Java connection" shouldn't affect you in
any way (provided that you have Java 1.7+ available on system path).
Simply install the 'sklearn2pmml' package, and start walking through
the Iris classification example as detailed in its README.md file:
https://github.com/jpmml/sklearn2pmml/blob/master/README.md


VR

Villu Ruusmann

unread,
Jun 28, 2016, 5:27:38 AM6/28/16
to Java PMML API
Hi Cairu,

>
> Is this Java PMML API working for representing PMML schema from DMG in Java code?
>

All JPMML producer and consumer APIs are based on the JPMML-Model
library (https://github.com/jpmml/jpmml-model), which provides direct
mapping between PMML schema and Java classes.

The current 1.2.X development line is fully interoperable with PMML
schema versions 3.0 through 4.2. I will kick off the 1.3.X development
line right after the PMML schema version 4.3 is published (should
happen in a months' time or so).

All JPMML converter libraries (including JPMML-SkLearn) return a live
instance of org.dmg.pmml.PMML class. Feel free to apply arbitrary
modifications to it before marshalling it to a file. For example,
adding a copyright statement to a model:

PMML pmml = ...;
Header header = pmml.getHeader();
header.setCopyright("Copyright (c) 2016 My Company");
JAXBUtil.marshalPMML(pmml, new StreamResult(System.out));

If you have time and interest, then I would definitely suggest you to
study the Visitor API layer. There are many Visitor implementation
classes available in the 'org.jpmml.model.visitors' package:
https://github.com/jpmml/jpmml-model/tree/master/pmml-model/src/main/java/org/jpmml/model/visitors

For example, class org.jpmml.model.visitors.MiningSchemaCleaner
performs the cleaning/optimization of MiningSchema elements, which is
crucial for conversion applications:

PMML pmml = ...;
MiningSchemaCleaner schemaCleaner = new MiningSchemaCleaner();
schemaCleaner.applyTo(pmml);

Do you need to work with competing PMML software sometimes? Here's a
technical article about tidying R's AdaBoost models (hint: they're 90%
noise and 10% signal) using the JPMML-Model library:
http://openscoring.io/blog/2016/02/05/tidying_adaboost_pmml/


VR
Reply all
Reply to author
Forward
0 new messages