Testing the new autoscaling feature

ced...@cogniteev.com

unread,

Oct 2, 2018, 11:11:36 AM10/2/18

to Google Cloud Dataproc Discussions

Hi guys,

I'm trying to test the new autoscaling feature of Dataproc and i got an issue ! :(

Here is my test case :

I create a new cluster with autoscaling enabled :

gcloud beta dataproc clusters create test-autoscaling --subnet default --zone europe-west1-b --master-machine-type n1-standard-1 --master-boot-disk-size 50 --num-workers 2 --worker-machine-type n1-standard-1 --worker-boot-disk-size 50 --project xxxxxxx --properties "\
dataproc:alpha.autoscaling.enabled=true,\
dataproc:alpha.autoscaling.primary.max_workers=20,\
dataproc:alpha.autoscaling.secondary.max_workers=20,\
dataproc:alpha.autoscaling.cooldown_period=10m,\
dataproc:alpha.autoscaling.scale_up.factor=1,\
dataproc:alpha.autoscaling.graceful_decommission_timeout=5m"

And after that i create multiple Spark jobs :

gcloud dataproc jobs submit spark --cluster test-autoscaling --class org.apache.spark.examples.SparkPi --jars file:///usr/lib/spark/examples/jars/spark-examples.jar --properties "spark.executor.memory=2600m" -- 1000000

All my scheduled jobs are starting sequentially without scale up my cluster.

What am i doing wrong ?

Thanks a lot !

Regards.

Dennis Huo

unread,

Oct 2, 2018, 11:33:11 AM10/2/18

to Google Cloud Dataproc Discussions

Since autoscaling only applies to horizontal scaling and not vertical scaling (i.e. changing machine types in-place), it's likely you're experiencing the effects of pre-YARN job queuing within Dataproc due to your small master-node size, since the master node must have enough headroom for driver/launcher programs.

If you try doing the same thing with a larger master machine type you should see more jobs start concurrently, enqueue YARN container requests, and thus see your cluster autoscale.

ced...@cogniteev.com

unread,

Oct 2, 2018, 11:57:17 AM10/2/18

to Google Cloud Dataproc Discussions

Hi Dennis,

I tried to upscale the master node to a n1-standard-4 :4 vCPU, 15 Go de mémoire and i still got the same issue, all jobs start sequentially and no scale up of my cluster.

Here is one of my waiting job :

Job [5e574beaa6d94f4286d49566ee32dXXX] submitted.
Waiting for job output...
18/10/02 15:40:13 INFO org.spark_project.jetty.util.log: Logging initialized @4839ms
18/10/02 15:40:13 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT
18/10/02 15:40:13 INFO org.spark_project.jetty.server.Server: Started @5008ms
18/10/02 15:40:14 WARN org.apache.spark.util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/10/02 15:40:14 WARN org.apache.spark.util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
18/10/02 15:40:14 WARN org.apache.spark.util.Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
18/10/02 15:40:14 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@5e627d43{HTTP/1.1,[http/1.1]}{0.0.0.0:4043}
18/10/02 15:40:14 INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.10-hadoop2
18/10/02 15:40:16 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at test-autoscaling-m/10.X.X.X:8032
18/10/02 15:40:18 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1538494735XXX_0004

Any idea ?

Thanks.

Regards

Karthik Palaniappan

unread,

Oct 2, 2018, 3:28:45 PM10/2/18

to Google Cloud Dataproc Discussions

If I recall correctly, this is just an issue with how Spark-on-YARN works.

In MapReduce, map and reduce tasks are generally pretty short (seconds or a couple minutes max). So, if one job starts and fills up the entire cluster, then you submit a second job, the second job will have slots to allocate its map and reduce containers within seconds.

However, Spark is different. As long as there are enough Spark tasks pending, the first Spark job will allocate executors to fill the entire cluster. Executors are essentially long running daemons that will not exit unless they have not run a task in 1 minute. The Spark docs have a great explanation of how dynamic allocation works. We discuss it in the Autoscaling docs as well.

This means that if you submit a second job, it will not have space to allocate its app master or executors until the first job finishes or at least until executors are idle for 1m.

There are a couple ways to get around this:

1) Disable dynamic allocation: set spark.dynamicAllocation.enabled=false and explicitly set spark.executor.instances=<some-number>.

2) Or, keep dynamic allocation on, and set the max number of executors to a smaller number (spark.dynamicAllocation.maxExecutors=<some-number>).

Note that on n1-standard-1 VMs, we run 1 executor per node, and on n1-standard-4 VMs we run 2 executors per node. Also note that the app master for each job takes one "slot". So a cluster of 2 n1-standard-1 worker VMs will only run 1 executor iirc.

After lunch, I'll run your repro and confirm this theory.

Karthik Palaniappan

unread,

Oct 2, 2018, 4:09:27 PM10/2/18

to Google Cloud Dataproc Discussions

Actually, I think your case is even simpler. On a cluster with 2 n1-standard-1 workers, you will use 1 VM for the app master, and one for an executor. There is no space for other applications at all. The cluster will indeed scale up at the next cooldown period (10 minutes) and you'll be able to schedule more jobs. Note that because of dataproc:alpha.autoscaling.cooldown_period=10m, the cluster will only scale every 10 minutes.

Consider starting with a larger cluster, e.g. --num-workers=10 or --num-preemptible-workers=10.

Lastly, is there a reason you're explicltly changing spark.executor.memory? Dataproc sets spark.executor.memory=2688m on n1-standard-1 VMs.

Karthik Palaniappan

unread,

Oct 2, 2018, 4:11:30 PM10/2/18

to Google Cloud Dataproc Discussions

I'll also echo what Dennis said above -- I tried your repro, and hit OOM issues when running multiple jobs concurrently on an n1-standard-1 master. Consider using n1-standard-4 for the master and workers.

ced...@cogniteev.com

unread,

Oct 3, 2018, 12:00:10 PM10/3/18

to Google Cloud Dataproc Discussions

Hi Karthik,

I forgot I was in my free account and I am limited in terms of cpu. On my other account, it scale up&down correctly.

I'm surprised I could not detect this problem in logs.

Thanks.

Karthik Palaniappan

unread,

Oct 3, 2018, 12:49:12 PM10/3/18

to Google Cloud Dataproc Discussions

This is good feedback. Were you referring to logs for why the second Spark job didn't make progress? Or logs on what the autoscaler was doing? Or both?

FYI: we're working on exposing autoscaling state, likely as stackdriver logs. Hopefully that will be out in the next couple months.

Reply all

Reply to author

Forward