Spark 3.1

31 views
Skip to first unread message

Shih-gian Lee

unread,
Mar 11, 2021, 4:19:04 PM3/11/21
to Java PMML API
Hi Villu,

Does the latest JPMML-SparkML (1.6) support Spark 3.1.x?

Many thanks,
Shihgian

Villu Ruusmann

unread,
Mar 11, 2021, 4:51:00 PM3/11/21
to Java PMML API
Hi Shihgian,

>
> Does the latest JPMML-SparkML (1.6) support Spark 3.1.x?
>

I was unaware that Spark 3.1 is already available, so I haven't tested
it myself.

It should work, given that it's a minor version upgrade. Check out the
'master' branch, and change this Apache Spark version identifier here
to "{3, 1}": https://github.com/jpmml/jpmml-sparkml/blob/1.6.3/src/main/java/org/jpmml/sparkml/ConverterFactory.java#L241

Then build the modified library JAR file using "mvn
-Dmaven.test.skip=true clean package" (disable integration tests that
encapsulate Apache Spark 3.0 resources), and see if it works.

I'll take a look at it myself over this weekend. I'd hate to break the
current "separate JPMML-SparkML development branch for each Apache
Spark version"-pattern, so it's probably time to start JPMML-SparkML
1.7.X then (and retire 1.4.X and 1.5.X development branches (targeting
Apache Spark 2.3.X and 2.4.X), as merging forward across three-four
branches is getting a bit tedious).


VR

Villu Ruusmann

unread,
Mar 12, 2021, 10:46:38 AM3/12/21
to Java PMML API
Hi Shihgian,

>
> Does the latest JPMML-SparkML (1.6) support Spark 3.1.x?
>

The 'master' branch of the JPMML-SparkML repository has been updated
to 1.7-SNAPSHOT, and the 1.7.0 version has been sent to the Maven
Central repository way (should become visible there in a couple of
hours time).

The Apache Spark ML version 3.1.0 appears to be broken in the relation
to the ChiSqSelector feature selector class (raises some Scala
IllegalArgumentException while loading from the archive ZIP file).
Therefore it's advisable to go with 3.1.1.


VR

Shih-gian Lee

unread,
Mar 12, 2021, 1:10:18 PM3/12/21
to Villu Ruusmann, Java PMML API
Hi Villu,

We will give it a try and let you know.

Many thanks!

Shihgian

On Mar 12, 2021, at 7:46 AM, Villu Ruusmann <villu.r...@gmail.com> wrote:

Hi Shihgian,

Shih-gian Lee

unread,
Mar 15, 2021, 2:08:08 PM3/15/21
to Java PMML API
Hi Villu,

We tested the latest JPMML-SparkML (1.7) with Spark 3.1.1 and found NO issues. 

Thank you so much for your help!

Regards,
Shihgian

Reply all
Reply to author
Forward
0 new messages