Importing a python PMML into Spark (Scala)

846 views
Skip to first unread message

Alireza Chakeri

unread,
Aug 15, 2017, 10:54:24 AM8/15/17
to Java PMML API
I have exported a PMML file from a fitted model in python, and would like to import it in Spark (scala). I am using maven packaging in scala IDE, and made this pom file dependencies:

<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>[1.5.0, 1.6.3]</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-evaluator-spark</artifactId>
<version>1.0.0</version>
</dependency>
</dependencies>


The scala code looks like this:

import org.jpmml.evaluator.Evaluator
import org.jpmml.evaluator.spark._

val fileNamePmml = "mypmml.pmml"
val pmmlFile = new File(fileNamePmml)

val myEvaluator: Evaluator = EvaluatorUtil.createEvaluator(pmmlFile)

val pmmlTransformerBuilder = new TransformerBuilder(myEvaluator)
.withTargetCols()
.withOutputCols()
.exploded(false);


But I am getting this error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ml/Transformer
at spark.SimpleJpmml$.main(SimpleJpmml.scala:27)
at spark.SimpleJpmml.main(SimpleJpmml.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.ml.Transformer
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 2 more


If I add this dependency <artifactId:spark-mllib_2.10,version:2.0.0> to my pom file too, then I get this error:

Exception in thread "main" java.lang.IllegalArgumentException: http://www.dmg.org/PMML-4_3
at org.jpmml.schema.Version.forNamespaceURI(Version.java:61)
at org.jpmml.model.PMMLFilter.updateSource(PMMLFilter.java:121)
at org.jpmml.model.PMMLFilter.startPrefixMapping(PMMLFilter.java:43)

But I can see that Maven has added the pmml-model-1.1.15.jar and pmml-schema-1.1.15.jar to my classpath.

I am wondering is it because of the dependencies conflict Google Guava library, ... with apache spark? or other issue that I am not aware of. Any help would be appreciated. thanks!

Villu Ruusmann

unread,
Aug 15, 2017, 5:53:12 PM8/15/17
to Java PMML API
Hi Alireza,

> I am using maven packaging in scala IDE, and made this pom file dependencies:
>
> <dependencies>
> <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-core_2.10</artifactId>
> <version>[1.5.0, 1.6.3]</version>
> <scope>provided</scope>
> </dependency>
> <dependency>
> <groupId>org.jpmml</groupId>
> <artifactId>jpmml-evaluator-spark</artifactId>
> <version>1.0.0</version>
> </dependency>
> </dependencies>
>

The version number of the 'org.apache.spark:spark-core_2.10'
dependency should match that of your Apache Spark installation.

A [min, max]-style version range makes sense when building a reusable library.

>
> But I am getting this error:
>
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ml/Transformer
> at spark.SimpleJpmml$.main(SimpleJpmml.scala:27)
> at spark.SimpleJpmml.main(SimpleJpmml.scala)
>

How do you execute your Apache Spark application? If you're executing
it using $SPARK_HOME/bin/spark-submit script, then all
org.apache.spark classes (including the
org.apache.spark.ml.Transformer class) will be provided the runtime.

If you're executing it via other means, then you probably should
package org.apache.spark classes into the application uber-JAR file.
You can do so by changing the scope of all org.apache.spark
dependencies from "provided" to "compile".

>
> If I add this dependency <artifactId:spark-mllib_2.10,version:2.0.0> to my pom file too, then I get this error:
>
> Exception in thread "main" java.lang.IllegalArgumentException: http://www.dmg.org/PMML-4_3
> at org.jpmml.schema.Version.forNamespaceURI(Version.java:61)
> at org.jpmml.model.PMMLFilter.updateSource(PMMLFilter.java:121)
> at org.jpmml.model.PMMLFilter.startPrefixMapping(PMMLFilter.java:43)
>

By adding the 'org.apache.spark:spark-mllib_2.10' dependency to
application classpath, you're also bringing in its transitive
dependencies, which include JPMML-Model library version 1.1.15.

This classpath conflict is specifically mentioned in the README file
of the JPMML-Evaluator-Spark project. The resolution is given in the
README file of the JPMML-SparkML project:
https://github.com/jpmml/jpmml-sparkml#installation

In Apache Spark 1.6.X you don't have the easy option of deleting the
offending JPMML-Model JAR files from the $SPARK_HOME/jars directory.
You'd need to unzip the
$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.6.0.jar uber-JAR file,
delete all org.jpmml classes, and zip the remaining classes back into
the uber-JAR file.

For application packaging hints, see the pom.xml file of the
JPMML-SparkML project:
https://github.com/jpmml/jpmml-sparkml/blob/1.0.X/pom.xml#L180-L228

>
> I am wondering is it because of the dependencies
> conflict Google Guava library, ... with apache spark?
>

Another conflicting library on application classpath? No worries, the
resolution is as easy as adding another relocation directive:

<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>com.shaded.google.common</shadedPattern>
</relocation>


VR

Aayush Shah

unread,
Jun 7, 2022, 9:25:23 AM6/7/22
to Java PMML API
Hello, I was looking for a way to import PMML file (generated from any platform python/knime etc) into Spark using java.
I have seen the `jpmml` library and have installed it but am not able to get a proper documentation on how to deal with it. Does anyone have any idea where to begin and if possible could you please explain with the toy example with steps - from pmml loading to the prediction please?

Thank you.

Villu Ruusmann

unread,
Jun 7, 2022, 1:13:26 PM6/7/22
to Java PMML API
Hi Aayush,

>
> Hello, I was looking for a way to import PMML file
> (generated from any platform python/knime etc) into
> Spark using java.
>

You're quoting an old thread that discusses the JPMML-Evaluator-Spark library.

Did you see its GitHub repository?
https://github.com/jpmml/jpmml-evaluator-spark

Last updated in April 2022. The README has full installation and usage
instructions. What else is needed?


VR

Aayush Shah

unread,
Jun 8, 2022, 8:41:15 AM6/8/22
to Java PMML API
Thank you very much,
I was struggling with how to convert a PMML into the actual pipeline and didn't know exactly which part of the code was doing that. Then that mystery was the TransformerBuilder. I tried putting into my code but was getting the not found error for that TransformerBuilder only, other parts worked well.

Then finally I figured out it was the wrong dependency used by me in pom file. The working one is:
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>jpmml-evaluator-spark</artifactId>
<version>1.3.0</version>
</dependency>

Now, it all works. Thanks again.
Reply all
Reply to author
Forward
0 new messages