How to measure High Availability of Google cloud Dataproc service in context to Regions/Zones

178 views
Skip to first unread message

Deena Dhayal

unread,
May 19, 2020, 6:50:48 AM5/19/20
to Google Cloud Dataproc Discussions
Hello Everyone ,

 As per the google docs , measuring High Availability of Dataproc  based on HDFS & YARN availability not based on regions/zones . Is it possible to keep one master in one zone & another in different zone to get HA in context to Location ?
Also please elaborate , whether configuring Dataproc cluster in Global Endpoint achieve HA in context to location ?

I have already gone through Google docs but that doesn't clear above doubts .

Thanks,
Deena

karth...@google.com

unread,
May 28, 2020, 9:36:57 PM5/28/20
to Google Cloud Dataproc Discussions
Yeah High Availability in that context just refers to Hadoop HA. Clusters must be entirely one zone because network traffic across zones costs money. Hadoop workloads require a lot of bandwidth among all hosts, largely because of the shuffle phase.

If you want to create a Hadoop environment that is resilient to zonal failures, there are a couple options:

1) Use "ephemeral" clusters for each pipeline. Either give each job or a related set of jobs their own cluster. Then, use auto zone placement to let Dataproc pick a zone for each cluster at runtime: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone. This makes sense if you're running ETL (batch) workloads that aren't as latency sensitive, or if the jobs run for more than 10 minutes. You can create a workflow template with a cluster configuration and job(s) to run on that cluster before tearing it down: https://cloud.google.com/dataproc/docs/concepts/workflows/using-yamls.

2) For interactive jobs, or shorter jobs, create a "cluster pool" by creating multiple clusters in different zones, and give them a common label. Then, use a workflow template again with a set of job(s) to run, and just specify the label designating your cluster pool: https://cloud.google.com/dataproc/docs/concepts/workflows/cluster-selectors.
Reply all
Reply to author
Forward
0 new messages