I am trying to read a PMML file that was created in sklearn into spark. I have created a JAR of the jpmml-evaluator-spark package with all of its dependencies and launched a spark session with it.
I am trying to run the following Spark code:
-----------------
import org.shaded.jpmml.evaluator.spark.EvaluatorUtil
import java.io.FileInputStream
import java.io.InputStream
import org.shaded.jpmml.evaluator.Evaluator
import org.shaded.jpmml.evaluator.spark.TransformerBuilder
import org.apache.spark.ml.Transformer
val fis: InputStream = new FileInputStream("test.pmml")
val evaluator: Evaluator = EvaluatorUtil.createEvaluator(fis);
val pmmlTransformerBuilder: TransformerBuilder = new TransformerBuilder(evaluator).
withTargetCols().
withOutputCols().
exploded(false)
val pmmlTransformer: Transformer = pmmlTransformerBuilder.build()
-----------------
However, I am getting the following error when using the .build method:
-----------------
Name: java.lang.NullPointerException
Message: null
StackTrace: at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:346)
at .$print$lzycompute(<console>:10)
at .$print(<console>:6)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreterSpecific.scala:386)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreterSpecific.scala:381)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at scala.Console$.withErr(Console.scala:80)
at org.apache.toree.global.StreamState$$anonfun$1$$anonfun$apply$1.apply(StreamState.scala:73)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at scala.Console$.withOut(Console.scala:53)
at org.apache.toree.global.StreamState$$anonfun$1.apply(StreamState.scala:72)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at scala.Console$.withIn(Console.scala:124)
at org.apache.toree.global.StreamState$.withStreams(StreamState.scala:71)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1.apply(ScalaInterpreterSpecific.scala:380)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$interpretAddTask$1.apply(ScalaInterpreterSpecific.scala:380)
at org.apache.toree.utils.TaskManager$$anonfun$add$2$$anon$1.run(TaskManager.scala:140)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
-----------------------
Can you advise on how to troubleshoot this?
thanks,
Kean
Hi Villu, thanks so much for your reply. I am in fact using a production cluster at work, so cannot change the Spark installation. I have tried shading just the correct dependencies (org.dmg.pmml and org.jpmml.model). However now i am not able to build the Evaluator. When i use the code below i get the following error:
import org.jpmml.evaluator.spark.EvaluatorUtil
import java.io.FileInputStream
import java.io.InputStream
import org.jpmml.evaluator.Evaluator
import org.jpmml.evaluator.spark.TransformerBuilder
import org.apache.spark.ml.Transformer
import org.jpmml.evaluator.spark.PMMLTransformer
val fis: InputStream = new FileInputStream("test.pmml")
val evaluator: Evaluator = EvaluatorUtil.createEvaluator(fis)
Error:
Name: java.lang.NoSuchFieldError
Message: PMML_4_3
StackTrace: at org.shaded.jpmml.model.ImportFilter.<init>(ImportFilter.java:29)
at org.shaded.jpmml.model.ImportFilter.<init>(ImportFilter.java:25)
at org.shaded.jpmml.model.ImportFilter.apply(ImportFilter.java:93)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:53)
----
I have attached the POM file i am using to package the JAR. Any ideas?
thanks again,
Kean