Google Cloud Dataproc - January 27th release

31 views

Skip to first unread message

unread,

Feb 1, 2016, 4:06:06 PM2/1/16

to gcp-hadoo...@googlegroups.com

Hello everyone,

Late last week on January 27th, we released a new set of updates to Google Cloud Dataproc.

New Features

Two new options have been added to the Cloud Dataproc gcloud command for adding tags and metadata to virtual machines used in Cloud Dataproc clusters. These tags and metadata will apply to both regular and preemptible instances.

The --tags option will add tags to the Google Compute Engine instances in a cluster. For example, using the argument --tags foo,bar,baz will add three tags to the virtual machine instances in the cluster.
The --metadata option will add metadata to the compute engine instances. For example, using --metadata 'meta1=value1,key1=value2' will add two key-value pairs of metadata.

Support for heterogeneous clusters where the master node and worker nodes have different amounts of memory. Some memory settings were previously based on the master node which caused some problems as described in this Stack Overflow question. Cloud Dataproc now better supports clusters with master and worker nodes which use different machine types.
Google Cloud Platform Console

The Output tab for a job now includes a Line wrapping option to make it easier to view job output containing very long lines

Bugfixes

Fixed two issues which would sometimes cause virtual machines to remain active after a cluster deletion request was submitted
The Spark maxExecutors setting is now set to 10000 to avoid the AppMaster failing on jobs with many tasks
Improved handling for aggressive job submission by making several changes to the Cloud Dataproc agent, including:

Limiting the number of concurrent jobs so they are proportional to the memory of the master node
Checking free memory before scheduling new jobs
Rate limiting how many jobs can be scheduled per cycle

Improved how HDFS capacity is calculated before commissioning or decommissioning nodes to prevent excessively long updates

The Cloud Dataproc release notes also contain these notes and all past release notes.

Best,

Google Cloud Dataproc / Google Cloud Spark & Hadoop Team

Reply all

Reply to author

Forward

0 new messages