ERROR: Secondary worker configuration must contain a positive number of instances

Erik Dubbelboer

unread,

Mar 2, 2016, 2:30:33 AM3/2/16

to Google Cloud Dataproc Discussions

I'm trying to create a cluster using the below command. I would expect this to be almost the same as the below image from the web interface (minus the properties as they are not available in the web interface for some reason).
But when I run the command I get: ERROR: (gcloud.dataproc.clusters.create) Secondary worker configuration must contain a positive number of instances.
I'm guessing this is because I set the number of preemptible workers to 0? Why is this not supported on the command line?

The reason I'm setting the preemptible workers to 0 is because for me hadoop never registers them so they are never being used in the yarn tasks I submit. I'm guessing they are only used for tasks that are submitted though the google API?

gcloud dataproc clusters create hadoop-eu-1 \
  --bucket=hadoop-eu-atomx \
  --image-version=1.0 \
  --initialization-actions="gs://hadoop-eu-atomx/hadoop-initialization-v1.sh" \
  --master-boot-disk-size-gb 15 \
  --master-machine-type n1-standard-2 \
  --num-master-local-ssds 0 \
  --worker-machine-type n1-standard-8 \
  --worker-boot-disk-size-gb 2000 \
  --num-worker-local-ssds 0 \
  --num-preemptible-workers 0 \
  --num-workers 8 \
  --properties "mapred:apreduce.reduce.java.opts=-Xmx8915m,core:fs.gs.metadata.cache.directory=/tmp/hadoop_gcs_connector_metadata_cache,mapred:mapreduce.jobtracker.address=hadoop-eu-1-m:9001,yarn:yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler,yarn:yarn.scheduler.minimum-allocation-mb=1024" \
  --zone europe-west1-c

Google Cloud Dataproc Discussions

unread,

Mar 2, 2016, 1:45:30 PM3/2/16

to Google Cloud Dataproc Discussions

Hi Erik,

Thank you for bringing this up.

It does look like we do not handle "--num-preemptible-workers 0" correctly. I have filed a bug to fix this. The default value for --num-preemptible-workers is 0 and can be omitted entirely if you do not want preemptible workers.

Your comment that hadoop does not register preemptible workers is worrisome. They should always join the cluster and be usable regardless of how the job is submitted. I would like to emphasize that preemptible instances are a limited resource, subject to availability. It is possible that your cluster will not have any preemptible workers. (You will not incur any charges until preemptible instances are actually created) This also means that in some cases preemptible workers will join after cluster is reported RUNNING.

You can verify if preemptible instances are actually created by this query:

$ gcloud compute instances list --zone <cluster-zone> | grep <cluster-name>-sq

If there are instances present, then verify they have joined the cluster via:

$ gcloud compute ssh <cluster-name>-m --zone <cluster-zone>

$ yarn node -list

If you find that there are instances but they are not joining the cluster then kindly respond to this thread or file a report through cloud support. We'll be happy to assist and debug.

Erik Dubbelboer

unread,

Mar 6, 2016, 4:39:14 AM3/6/16

to Google Cloud Dataproc Discussions

I tried it again to get a nice reproducable cluster for you but this time the preemptible workers register just fine. Maybe I was checking too quickly last time. The instances were created but the hadoop management console just didn't show them in it's node section.

Thanks,
Erik

Reply all

Reply to author

Forward