Databricks on GCP - How it can be differentiated with dataproc

146 views
Skip to first unread message

moses.s...@gmail.com

unread,
Feb 19, 2021, 1:06:36 PM2/19/21
to Google Cloud Dataproc Discussions
Hi All,

I see the announcement of Databricks on GCP. We already have Dataproc spark cluster, then how can we choose which one to implement.

which one is cheaper?

Thanks in advance.

Regards,
Moses 

Xin

unread,
Feb 19, 2021, 1:11:01 PM2/19/21
to Google Cloud Dataproc Discussions
BTW any idea how to try it out? I don't find databricks anywhere in the console. Search only returns documentation.

Thanks,
Xin

mich.ta...@gmail.com

unread,
Feb 19, 2021, 2:59:08 PM2/19/21
to Google Cloud Dataproc Discussions
Well it is an interesting point.

That is only an announcement and there is no Databricks on Google Cloud yet as of today. Please see the attachment.

So it is there to be discovered.

These are natural questions for some of us like me who are consumers of GCP daily.

I read the presentation again and from a practical point of view the devil is in the details. GCP already offers Google Compute Engines as IaaS which support Spark with Yarn. In addition, you have other cost saving  'preemptible instances' that can run Spark. GCP also offers BigQuery as a Data Warehouse (DW) with ML models built in. So there is a fair bit of 'either or choice' here. There is also the question of the migration path from GCP artifacts to Databricks. Will Databricks provide all these as a service? For example, BigQuery is a fully managed serverless DW. Will Lakehouse provide the same in GCP etc? However, neither BigQuery nor compute engines are cheap. Personally I believe the landscape on Cloud is getting congested and unless there is a clear motivation to move from one to another, many will choose to stay where they are.

HTH

Mich
notavailable.PNG

moses.s...@gmail.com

unread,
Feb 20, 2021, 2:28:43 AM2/20/21
to Google Cloud Dataproc Discussions
Thank you Mich for the detailed explanation. Just trying to find the reason, which wins in which business cases. 

As per the overview, I understand Databricks going to provide the compute engine in DOCKER Container which is orchestrated on GKE.

Google cloud will provide Orchestration and Storage, whereas Databricks take care of computing.

Since I am new to understand the pricing concepts, so can you help me with any links to understand the pricing of Dataproc jobs. I tried the google pricing calculator but not able to understand that detail level.

Regards,
Moses Palla

moses.s...@gmail.com

unread,
Feb 20, 2021, 2:29:35 AM2/20/21
to Google Cloud Dataproc Discussions
Hi Xin,

The product launch is in month of April. 

Regards,
Moses Palla

Reply all
Reply to author
Forward
0 new messages