'Element TreeModel is not supported' exception in jpmml-evaluator-spark

73 views
Skip to first unread message

Pratyush Banerjee

unread,
Jun 16, 2022, 11:35:37 AM6/16/22
to Java PMML API
Hi,

I have created a PMML file from skLearn using sklearn2pmml and trying to load up the model in a Spark cluster.
however, as per the documentation when I execute the following code:


...
val pmmlFile = new File("/home/pbanerjee/DecisionTreeIris.pmml")
val evaluatorBuilder = new LoadingModelEvaluatorBuilder().setLocatable(false).load(pmmlFile)
val evaluator = evaluatorBuilder.build()

I am hit with the following exception:

org.jpmml.model.UnsupportedElementException: Element TreeModel is not supported
  at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:113)
  at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:38)
  at org.jpmml.evaluator.ModelEvaluatorBuilder.build(ModelEvaluatorBuilder.java:121)
  ... 49 elided

I have imported jpmml-evaluator-spar version 1.3.0 and my spark version is 2.4.8

Any idea what I am doing wrong here?

Thanks & Regards,

Pratyush

Villu Ruusmann

unread,
Jun 16, 2022, 12:02:05 PM6/16/22
to Java PMML API, Pratyush Banerjee
Hi PB,

Let's keep stack traces & other low-level technical stuff in GitHub issues.

>
> however, as per the documentation ...
>

Works in Python, but doesn't work in Apache Spark? Must be a runtime
configuration issue then.

Are you attaching JPMML-Evaluator-Spark using the "--packages"
mechanism, or did you bundle it your Java application classes?

> org.jpmml.model.UnsupportedElementException: Element TreeModel is not supported
> at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:113)
>

I'm assuming that you've built your own uber-JAR file.

Does it contain a META-INF/services/org.jpmml.evaluator.ModelEvaluator
file, which contains the list of all "usable" ModelEvaluator
subclasses?

Do you have this file on your application classpath?
https://github.com/jpmml/jpmml-evaluator/blob/1.6.3/pmml-evaluator/src/main/resources/META-INF/services/org.jpmml.evaluator.ModelEvaluator

Specifically, the handler for the TreeModel element is configured in
two locations (one for simpler trees, another one for complex trees):
https://github.com/jpmml/jpmml-evaluator/blob/1.6.3/pmml-evaluator/src/main/resources/META-INF/services/org.jpmml.evaluator.ModelEvaluator#L13-L14

+++

If the error persists, please open a GitHub issue instead.


VR

Pratyush Banerjee

unread,
Jun 16, 2022, 12:45:17 PM6/16/22
to Villu Ruusmann, Java PMML API
Hi VR,

Ah yes, my bad! 
While creating the uber jar I was discarding a lot of meta-infs.
So the file you mentioned is not in the Uber jar. 
Will work on my build.sbt file to fix it. That should probably handle the issue.

Thanks again for pointing me in the right direction, much appreciated!!

Thanks & Regards,

Pratyush
This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Pratyush Banerjee

unread,
Jun 16, 2022, 7:38:51 PM6/16/22
to Villu Ruusmann, Java PMML API
Hi VR,

Apologies to keep bothering you with this, but I seemed to have hit another wall with the spark evaluation!

I downloaded the source for jpmml-evaluator-spark-1.3.0 and did a clean install on my machine and then copied the shaded jar, jpmml-evaluator-spark-runtime-1.3.0_2.11.jar to my cluster. 
I started my spark shell with the following command:

sudo spark-shell --jars jpmml-evaluator-spark-runtime-1.3.0_2.11.jar

In the spark-shell I execute the following script:

import org.jpmml.evaluator.EvaluatorBuilder
import org.jpmml.evaluator.LoadingModelEvaluatorBuilder
import org.apache.commons.io.IOUtils
import org.jpmml.evaluator.spark.TransformerBuilder
import java.io.File


val pmmlFile = new File("/home/pbanerjee/DecisionTreeIris.pmml")
val evaluatorBuilder = new LoadingModelEvaluatorBuilder().setLocatable(false).load(pmmlFile)
val evaluator = evaluatorBuilder.build()
val pmmlTransformerBuilder = new TransformerBuilder(evaluator).withLabelCol("label").exploded(true)
val pmmlTransformer = pmmlTransformerBuilder.build()


Everytime, I seem to hit the following runtime exception now:

scala> val pmmlTransformer = pmmlTransformerBuilder.build()
java.lang.NullPointerException
  at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:346)
  at .$print$lzycompute(<console>:10)
  at .$print(<console>:6)
  at $print(<console>)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)


Initially I thought this to be a problem with my Uber Jar, but since then I have dropped that approach and gone with the approach mentioned above, but the problem persists!
I took a look at the issue previously mentioned here, but couldn't really solve the issue!

My scala version (2.11.12) and Spark Version (2.4.8) seem to match the versions in the pom.xml, so I'm not sure what might be causing the issue. Does this look familiar?

Thanks & Regards,

Pratyush

Villu Ruusmann

unread,
Jun 17, 2022, 1:56:01 AM6/17/22
to Java PMML API, Pratyush Banerjee
Hi Pratyush,

This looks like another low-level technical issue, which should have
gone straight into GitHub issue tracker!

>
> I downloaded the source for jpmml-evaluator-spark-1.3.0
> and did a clean install on my machine and then copied the
> shaded jar, jpmml-evaluator-spark-runtime-1.3.0_2.11.jar to my cluster.
>

Why did you decide to try shading?

According to my notes, the classpath conflict that must be resolved
via shading only affects Apache Spark 2.0, 2.1 and 2.2 versions:
https://github.com/jpmml/jpmml-sparkml/blob/1.5.14/README.md#installation

Here, the fix version is stated as 2.3.0:
https://issues.apache.org/jira/browse/SPARK-15526

Your Apache Spark version 2.4.8 should be safe in all regards, no
shading is necessary.

> sudo spark-shell --jars jpmml-evaluator-spark-runtime-1.3.0_2.11.jar
>

Did you get your application running with my pre-packaged
JPMML-Evaluator-Spark 1.3.0 version?
$ spark-shell --packages org.jpmml:jpmml-evaluator-spark:1.3.0 --jars
your-app.jar

I wonder if this technical error persists with the default (non-shaded) library.

>
> val pmmlTransformerBuilder = new TransformerBuilder(evaluator).withLabelCol("label").exploded(true)
> val pmmlTransformer = pmmlTransformerBuilder.build()
>
> Everytime, I seem to hit the following runtime exception now:
>
> scala> val pmmlTransformer = pmmlTransformerBuilder.build()
> java.lang.NullPointerException
> at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:346)
>

This exception is raised by Spark/Scala REPL environment, and doesn't
seem to refer to any org.jpmml.* namespace classes.

I'm not a Scala expert, so I can't advise in that area.

However, for debugging purposes, you could try the following:

1) Print out the value of 'pmmlTransformedBuilder' variable. Does it
exist (ie. is non-null)? If it does, then it means that the issue
happens inside the TransformerBuilder#build() method.

2) When I read the source code of TransformerBuilder#build() method,
then I can see two execution pathways in there - one for the
exploded=true option (more complex), and another one for the
exploded=false option (simpler). Right now you're using exploded=true.
Does your code complete when you do exploded=false?

3) If this issue is about exploded=true, then it seems to me that you
need to configure TransformerBuilder#withOutputCols() and/or
TransformerBuilder#withProbabilityCols() options.


VR

Pratyush Banerjee

unread,
Jun 17, 2022, 4:33:15 AM6/17/22
to Villu Ruusmann, Java PMML API
Hi VR,

Thanks again for your reply on this. To answer your questions one-by-one:
  • I tried shading since I was hitting that error consistently. But it is good to know that spark-2.4.8 does not have the problems with shading.
  • I tried running with the pre-packaged version as well, but got the same error.
  • The 'pmmlTransformedBuilder' variable is not null and is fine ..
scala> pmmlTransformerBuilder
res0: org.jpmml.evaluator.spark.TransformerBuilder = org.jpmml.evaluator.spark.TransformerBuilder@3fc0a2c
  • Ultimately, the issue boiled down to using exploded=true, when I set it to false, I am able to complete my workflow and obtain classifications
  • Re. configuring TransformerBuilder#withOutputCols() and/or TransformerBuilder#withProbabilityCols() options, I tried both the following but got the same error on both:

scala> val pmmlTransformerBuilder = new TransformerBuilder(evaluator).withLabelCol("variety").withProbabilityCol("Species_probability", Arrays.asList("Setosa", "Versicolor", "Virginica")).exploded(true)
pmmlTransformerBuilder: org.jpmml.evaluator.spark.TransformerBuilder = org.jpmml.evaluator.spark.TransformerBuilder@66699a8d
scala> val pmmlTransformer = pmmlTransformerBuilder.build()
java.lang.NullPointerException

scala> val pmmlTransformerBuilder = new TransformerBuilder(evaluator).withTargetCols().withOutputCols().exploded(true)
pmmlTransformerBuilder: org.jpmml.evaluator.spark.TransformerBuilder = org.jpmml.evaluator.spark.TransformerBuilder@240ac58e
scala> val pmmlTransformer = pmmlTransformerBuilder.build()
java.lang.NullPointerException

I guess for the time being I should be able to work around this using explode set to false. 
But considering the issue persists when using explode=true, should I create a Github issue on this? Or perhaps something is still wrong with my configs!

Also, regarding TransformerBuilder:
I can either use #withOutputCols()/#withTargetCols() or I can use #.withLabelCol(...).withProbabilityCol(...) or can they be used with each other? 
Is there any documentation on how/what combinations to use them in, or should I just go by the source-code.

Finally, I cannot thank you enough for taking your time to look into this! Could not have gone through without your help.

Thanks & Regards,

Pratyush

Villu Ruusmann

unread,
Jun 17, 2022, 5:29:28 AM6/17/22
to Java PMML API, Pratyush Banerjee
Hi Pratyush,

>
> But considering the issue persists when using
> explode=true, should I create a Github issue on this?
>

I opened one myself here:
https://github.com/jpmml/jpmml-evaluator-spark/issues/46

Clearly, there is some JPMML-Evaluator-Spark code change needed. At
minimum, the #build() method should throw a more meaningful exception
(eg. "incomplete configuration") instead of failing with an obscure
NullPointerException.

> Also, regarding TransformerBuilder:
> I can either use #withOutputCols()/#withTargetCols()
> or I can use #.withLabelCol(...).withProbabilityCol(...)
> or can they be used with each other?
>

These method pairs are effectively analogous/synonymous to one another:

1) the primary predicted field is called "target" in PMML, but "label"
in Apache Spark.
2) the secondary predicted fields are called "output" in PMML. There
is no exact correspondence for this in Apache Spark, because Spark
maps individual prediction artifacts to individual columns.
3) The most common secondary prediction artifact is the probability
distribution. If you know that you're dealing with probabilistic
classification models, then you can conveniently extract this subset
of output fields as "probability" fields.

> Is there any documentation on how/what combinations
> to use them in, or should I just go by the source-code.
>

Go with the source code - this is what actually executes on your computer.

If you have reasonable complaints about missing docs, then please
consider opening another GitHub issue.


VR
Reply all
Reply to author
Forward
0 new messages