Issue with building a Spark transformer from a PMML

208 views
Skip to first unread message

Kean Jaime-Bustamante

unread,
Nov 21, 2017, 4:25:35 PM11/21/17
to Java PMML API
Hi,

I am trying to read a PMML file that was created in sklearn into spark. I have created a JAR of the jpmml-evaluator-spark package with all of its dependencies and launched a spark session with it.

I am trying to run the following Spark code:
-----------------
import org.shaded.jpmml.evaluator.spark.EvaluatorUtil
import java.io.FileInputStream
import java.io.InputStream
import org.shaded.jpmml.evaluator.Evaluator
import org.shaded.jpmml.evaluator.spark.TransformerBuilder
import org.apache.spark.ml.Transformer

val fis: InputStream = new FileInputStream("test.pmml")
val evaluator: Evaluator = EvaluatorUtil.createEvaluator(fis);
val pmmlTransformerBuilder: TransformerBuilder = new TransformerBuilder(evaluator).
withTargetCols().
withOutputCols().
exploded(false)
val pmmlTransformer: Transformer = pmmlTransformerBuilder.build()
-----------------

However, I am getting the following error when using the .build method:
-----------------
Name: java.lang.NullPointerException
Message: null
StackTrace: at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:346)
at .$print$lzycompute(<console>:10)
at .$print(<console>:6)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreterSpecific.scala:386)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreterSpecific.scala:381)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at scala.Console$.withErr(Console.scala:80)
at org.apache.toree.global.StreamState$$anonfun$1$$anonfun$apply$1.apply(StreamState.scala:73)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at scala.Console$.withOut(Console.scala:53)
at org.apache.toree.global.StreamState$$anonfun$1.apply(StreamState.scala:72)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at scala.Console$.withIn(Console.scala:124)
at org.apache.toree.global.StreamState$.withStreams(StreamState.scala:71)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1.apply(ScalaInterpreterSpecific.scala:380)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1.apply(ScalaInterpreterSpecific.scala:380)
at org.apache.toree.utils.TaskManager$$anonfun$add$2$$anon$1.run(TaskManager.scala:140)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
-----------------------

Can you advise on how to troubleshoot this?

thanks,
Kean

Villu Ruusmann

unread,
Nov 22, 2017, 2:17:06 AM11/22/17
to Java PMML API
Hi Kean,

>
> I am trying to run the following Spark code:
> -----------------
> import org.shaded.jpmml.evaluator.spark.EvaluatorUtil
> import org.shaded.jpmml.evaluator.Evaluator
> import org.shaded.jpmml.evaluator.spark.TransformerBuilder
>
> Can you advise on how to troubleshoot this?
>

Looks like you've shaded Java classes from the JPMML-Evaluator library
(https://github.com/jpmml/jpmml-evaluator; package prefix
"org.jpmml.evaluator"). However, you should be shading Java classes
from the JPMML-Model library instead
(https://github.com/jpmml/jpmml-model; package prefixes "org.dmg.pmml"
and "org.jpmml.model").

In your application code you would be importing normal classes (eg.
"org.jpmml.evaluator.Evaluator"). The shading is performed during
application packaging, and happens at the Java bytecode level (eg.
inside bytecode files, all matches of "org/jpmml/evaluator/Evaluator"
are replaced with "org/shaded/jpmml/evaluator/Evaluator"). It is
definitely a "code smell" to see shaded class names in Java source
files.

I would also suggest you to start small and simple, and move to more
complex workflows that require shading later on. For starters, you
could modify your Apache Spark ML installation (assuming you're
prototyping this stuff on your personal computer, not on your
organization's production cluster) by simply deleting the two
offending legacy JPMML-Model library JAR files so that shading becomes
unnecessary. This hack is explained here:
https://github.com/jpmml/jpmml-sparkml#modifying-apache-spark-installation

When you get the above setup running, then put those legacy
JPMML-Model library JAR files back into their original location, and
activate shading in Apache Maven build.


VR

Kean Jaime-Bustamante

unread,
Nov 30, 2017, 7:41:08 PM11/30/17
to Java PMML API

Hi Villu, thanks so much for your reply. I am in fact using a production cluster at work, so cannot change the Spark installation. I have tried shading just the correct dependencies (org.dmg.pmml and org.jpmml.model). However now i am not able to build the Evaluator. When i use the code below i get the following error:

import org.jpmml.evaluator.spark.EvaluatorUtil
import java.io.FileInputStream
import java.io.InputStream
import org.jpmml.evaluator.Evaluator
import org.jpmml.evaluator.spark.TransformerBuilder
import org.apache.spark.ml.Transformer
import org.jpmml.evaluator.spark.PMMLTransformer



val fis: InputStream = new FileInputStream("test.pmml")

val evaluator: Evaluator = EvaluatorUtil.createEvaluator(fis)

Error:
Name: java.lang.NoSuchFieldError
Message: PMML_4_3
StackTrace: at org.shaded.jpmml.model.ImportFilter.<init>(ImportFilter.java:29)
at org.shaded.jpmml.model.ImportFilter.<init>(ImportFilter.java:25)
at org.shaded.jpmml.model.ImportFilter.apply(ImportFilter.java:93)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:53)

----
I have attached the POM file i am using to package the JAR. Any ideas?

thanks again,
Kean

pom.xml

Kean Jaime-Bustamante

unread,
Nov 30, 2017, 7:46:30 PM11/30/17
to Java PMML API

Also i thought it might be useful to have the PMML file i'm trying to read

test.pmml
Reply all
Reply to author
Forward
0 new messages