Availability of Dataproc with Spark 3.1.1image needed

356 views
Skip to first unread message

Mich Talebzadeh

unread,
Mar 5, 2021, 4:11:01 PM3/5/21
to Google Cloud Dataproc Discussions
Dear Dataproc team,

We have tested Spark 3.1.1 with Spark Structured Streaming (which we had an issue making it work with in Dataproc built on the image 3.1.1-RC2.).

By test, I mean we set-up a 3-node spark cluster sending streaming data from on-premise to GCP BigQuery. Obviously this cannot be used in anger because of latency incurred sending data from Spark cluster on-premise.

In short to get back to using Dataproc nodes, we need an image of Spark built on Spark 3.1.1. 

Spark only offers 3.1.1 plus two other releases as STABLE versions

Release Notes for Stable Releases

Please bear in mind that Spark 3.0 and 3.1 are not available for download, meaning IMO they are defective releases Also few times Databricks in Spark user group stated that "3.1.1-RC2 is not a release" whatever that means. But taht is a moot point now.

Anyway to get in position, can you please build an image on 3.1.1 and I would be more than happy to test it as I am sure other users would agree with such release.

Finally I attach a spark GUI with structured streaming for version 3.1.1 

Thanks,

Mich
streaming.PNG

Mich Talebzadeh

unread,
Mar 9, 2021, 2:40:33 PM3/9/21
to Google Cloud Dataproc Discussions
Hi,

Any update on this request please?

Thanks

Mich Talebzadeh

unread,
Mar 17, 2021, 9:28:26 AM3/17/21
to Google Cloud Dataproc Discussions
Hi Dataproc team,

Can you please provide a response about when Dataproc images will use Spark 3.1.1.

Currently we cannot work on 3.1.1-RC2 with Spark Structured Streaming.

Thanks,

Mich

Daniel Solow

unread,
Mar 18, 2021, 8:59:11 AM3/18/21
to Google Cloud Dataproc Discussions
3.1.1 is available under 2.0.6-debian10/-ubuntu18/-centos8

Mich Talebzadeh

unread,
Mar 18, 2021, 9:17:39 AM3/18/21
to Google Cloud Dataproc Discussions
Thanks Daniel.

In creating dataproc cluster I cannot see 3.1.1 (see the attached image).

Please advise

regards,

Mich

image.png

Mich Talebzadeh

unread,
Mar 18, 2021, 9:43:50 AM3/18/21
to Google Cloud Dataproc Discussions
I can also see The announcement in this page. 


Important Announcements
Dataproc 2.0 image version will become a default Dataproc image version on March 15, 2021 (was February 22, 2021).
March 16, 2021

New sub-minor versions of Dataproc images: 1.3.87-debian101.3.87-ubuntu181.4.58-debian101.4.58-ubuntu181.5.33-centos81.5.33-debian101.5.33-ubuntu182.0.6-centos82.0.6-debian10, and 2.0.6-ubuntu18.

Image 2.0: Upgraded Spark to version 3.1.1

Does the Dataproc 2.0 image by default provides Spark 3.1.1?

Thanks

Mich Talebzadeh

unread,
Mar 18, 2021, 10:04:19 AM3/18/21
to Google Cloud Dataproc Discussions
Yes it is version 3.1.1 if you choose 2.0-debian10 from create cluster image

I just confirmed

echo $SPARK_HOME
/usr/lib/spark
spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context available as 'sc' (master = yarn, app id = application_1616075327443_0001).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.13 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)

Thanks


Mich Talebzadeh

unread,
Mar 18, 2021, 10:42:57 AM3/18/21
to Google Cloud Dataproc Discussions
plus the existing clusters on 3.1.1-RC have been upgraded to 3.1.1

HTH

Reply all
Reply to author
Forward
0 new messages