Google Cloud Dataproc - January 27th release

31 views
Skip to first unread message

James Malone

unread,
Feb 1, 2016, 4:06:06 PM2/1/16
to gcp-hadoo...@googlegroups.com

Hello everyone,


Late last week on January 27th, we released a new set of updates to Google Cloud Dataproc.


New Features
  • Two new options have been added to the Cloud Dataproc gcloud command for adding tags and metadata to virtual machines used in Cloud Dataproc clusters. These tags and metadata will apply to both regular and preemptible instances.
    • The --tags option will add tags to the Google Compute Engine instances in a cluster. For example, using the argument --tags foo,bar,baz will add three tags to the virtual machine instances in the cluster.
    • The --metadata option will add metadata to the compute engine instances. For example, using --metadata 'meta1=value1,key1=value2' will add two key-value pairs of metadata.
  • Support for heterogeneous clusters where the master node and worker nodes have different amounts of memory. Some memory settings were previously based on the master node which caused some problems as described in this Stack Overflow question. Cloud Dataproc now better supports clusters with master and worker nodes which use different machine types.
  • Google Cloud Platform Console
    • The Output tab for a job now includes a Line wrapping option to make it easier to view job output containing very long lines
Bugfixes
  • Fixed two issues which would sometimes cause virtual machines to remain active after a cluster deletion request was submitted
  • The Spark maxExecutors setting is now set to 10000 to avoid the AppMaster failing on jobs with many tasks
  • Improved handling for aggressive job submission by making several changes to the Cloud Dataproc agent, including:
    • Limiting the number of concurrent jobs so they are proportional to the memory of the master node
    • Checking free memory before scheduling new jobs
    • Rate limiting how many jobs can be scheduled per cycle
  • Improved how HDFS capacity is calculated before commissioning or decommissioning nodes to prevent excessively long updates

Best,


Google Cloud Dataproc / Google Cloud Spark & Hadoop Team

Reply all
Reply to author
Forward
0 new messages