Hello everyone,
Late last week we released a new set of updates to Google Cloud Dataproc.
New features
The dataproc command in the Google Cloud SDK now includes a --properties option for adding or updating properties in some cluster configuration files, such as core-site.xml. Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations". The following are supported prefixes and their mappings:
For example, to change the spark.master value in the spark-defaults.conf file, you would specify the following property:
spark:spark.master=spark://example.com
You can separate multiple properties with a comma. Each property must be specified in the full file:key=value format. Some properties are reserved cannot be overridden because they would impact the functionality of the Cloud Dataproc cluster. For more information, see the Cloud Dataproc documentation for the --properties command.
Google Developers Console
An option has been added to the “Create Clusters” form to enable the cloud-platform scope for a cluster. This lets you view and manage data across all Google Cloud Platform services from Cloud Dataproc clusters. You can find this option by expanding the “Preemptible workers, bucket, network, version, initialization, & access options” section at the bottom of the form.
Bugfixes
SparkR jobs no longer immediately fail with a “permission denied” error (Spark JIRA issue)
Configuring logging for Spark jobs with the --driver-logging-levels option no longer interferes with Java driver options
Google Developers Console
The error shown for improperly-formatted initialization actions now properly appears with information about the problem
Very long error messages now include a scrollbar so the Close button remains on-screen
Connectors and documentation
The Cloud Dataproc release notes also contain these notes and all past release notes.
Best,