Transformer class org.apache.spark.ml.feature.StringIndexerModel is not supported

178 views
Skip to first unread message

Youping Xiao

unread,
May 5, 2017, 11:06:47 PM5/5/17
to Java PMML API
Hi Villu,
My code that uses jpmml works fine with spark-shell --jars jpmml-sparkml-package-1.0-SNAPSHOT.jar.
However, after I built my code into a jar file and use spark-submit, I got the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Transformer class org.apache.spark.ml.feature.StringIndexerModel is not supported
at org.jpmml.sparkml.ConverterUtil.createConverter(ConverterUtil.java:209)
....

I added the following to build.sbt:
libraryDependencies ++= Seq(
"org.jpmml" % "pmml-evaluator" % "1.3.5",
"org.jpmml" % "jpmml-sparkml" % "1.1.7"
)
Did I miss something.
Many thanks,

Villu Ruusmann

unread,
May 6, 2017, 6:49:16 AM5/6/17
to Java PMML API
Hi Youping,

> My code that uses jpmml works fine with
> spark-shell --jars jpmml-sparkml-package-1.0-SNAPSHOT.jar.

That's good news. It proves that the library itself is in great shape,
and the problem is caused by application packaging.

> However, after I built my code into a jar file and use
> spark-submit, I got the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException:
> Transformer class org.apache.spark.ml.feature.StringIndexerModel is not supported
> at org.jpmml.sparkml.ConverterUtil.createConverter(ConverterUtil.java:209)
> ....
>

I refactored the registration of converter classes between 1.0.8/1.0.9
and 1.1.6/1.1.7. The list of registerable converter classes is now
retrieved from the META-INF/sparkml2pmml.properties configuration
file:
https://github.com/jpmml/jpmml-sparkml/blob/master/src/main/resources/META-INF/sparkml2pmml.properties

The ConverterUtil should generate adequate logs about every
registration attempt:
https://github.com/jpmml/jpmml-sparkml/blob/master/src/main/java/org/jpmml/sparkml/ConverterUtil.java#L297-L331

So, I would advise you to enable TRACE-level logging for the
"org.jpmml.sparkml" package, and see what converter classes
succeed/fail.

Your error message is pointing out the StringIndexerModel class,
because it is typically the first step in classification workflows. I
believe that the registration failed for all converter classes (ie.
the Map field ConverterUtil#converters is empty), not just the
StringIndexerModel class.

> I added the following to build.sbt:
> libraryDependencies ++= Seq(
> "org.jpmml" % "pmml-evaluator" % "1.3.5",
> "org.jpmml" % "jpmml-sparkml" % "1.1.7"
> )
>

I'm a fan of Apache Maven/XML builds, and know nothing about SBT/Scala builds.

It's more likely that there's something extra, not missing, on your
application classpath. Please see the README.md file of the
JPMML-SparkML library for Apache Maven-based build instructions, and
"translate" them to SBT.

However, please refer to Apache Spark ML logs first to see what
org.jpmml.sparkml.ConverterUtil is complaining about.


VR

Youping Xiao

unread,
May 6, 2017, 2:15:01 PM5/6/17
to Java PMML API

Hi Villu,
Many thanks for prompt response. Following your suggestion to translate Maven to SBT, and added an exclude cause in Dependencies as below:
"org.jpmml" % "jpmml-sparkml" % "1.1.5" exclude("com.beust", "jcommander"),

Now I see some improvement (I hope). The new exception is as follow:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ml/regression/GeneralizedLinearRegressionModel
at org.jpmml.sparkml.ConverterUtil.<clinit>(ConverterUtil.java:323)

The strange thing is that my code doesn't use GeneralizedLinearRegression.
Any idea what this new exception means?
Thanks again

Villu Ruusmann

unread,
May 6, 2017, 4:24:40 PM5/6/17
to Java PMML API
Hi Youping,

> Following your suggestion to translate Maven to SBT,
> and added an exclude cause in Dependencies as below:
> "org.jpmml" % "jpmml-sparkml" % "1.1.5" exclude("com.beust", "jcommander"),
>

You've just downgraded from 1.1.7 to 1.1.5. Not a good idea - always
stay with the latest.

The library installation instructions are given here:
https://github.com/jpmml/jpmml-sparkml#installation

Two important things:
1) Exclude org.jpmml:pmml-model dependency that is inherited via
org.apache.spark:spark-mllib_2.1 dependency.
2) And to be extra sure, rename org.dmg.pmml.* and org.jpmml.*
packages (that are inherited via the JPMML-SparkML 1.1.7 library) to
something like com.mycompany.org.dmg.pmml.* and
com.mycompany.org.jpmml.* In Apache Maven, you can rename Java
packages using the Maven Shade plugin. I have no idea how it's done in
SBT.

The above installation instructions are "the one and only way of doing
it right". If you try to cut corners, then you'll be experiencing
classpath conflicts sooner or later.

> Now I see some improvement (I hope). The new exception is as follow:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/ml/regression/GeneralizedLinearRegressionModel
> at org.jpmml.sparkml.ConverterUtil.<clinit>(ConverterUtil.java:323)
>

In previous JPMML-SparkML library versions the list of converter
classes was hardcoded in the static initializer block of the
ConverterUtil class. It was not a very good solution, because if one
converter class could not be found (eg. due to outdated Apache Spark
version), then the whole initialization procedure errored out, and the
PMML conversion functionality became unavailable.

> The strange thing is that my code doesn't use GeneralizedLinearRegression.
> Any idea what this new exception means?
> Thanks again
>

You're trying to use JPMML-SparkML version 1.1.X with Apache Spark
version that doesn't "contain" the GeneralizedLinearRegression class
yet. Must be some 1.6.X version?

JPMML-SparkML 1.1.6 and earlier, which contain hardcoded list of
converter classes, will fail if some class is missing. JPMML-SparkML
1.1.7 and newer, which load converter classes from the
META-INF/sparkml2pmml.properties file, will emit a "class XYZ not
found" warning to log, and keep loading other classes.


VR
Reply all
Reply to author
Forward
0 new messages