Hello everyone,
Recently we launched Google Cloud Dataproc in beta as an easy to use, fast, and cost-effective managed Spark and Hadoop service. Since Cloud Dataproc is a Spark/Hadoop-related product on Google Cloud Platform, we will be sending updates to this list when we release updates to the Cloud Dataproc service or update the Spark/Hadoop components deployed on Dataproc clusters.
Today we released an update to the Cloud Dataproc service with a number of fixes, enhancements, and optimizations. Here are the release notes for this 10/15/2015 release.
Bugfixes
Fixed a bug where DataNodes failed to register with the NameNode on startup, resulting in less HDFS capacity than expected.
Jobs cannot be submitted to clusters in an Error state.
Clusters which would not cleanly delete in some cases now properly delete upon request.
Reduced HTTP 500 errors when deploying Cloud Dataproc clusters.
Corrected distcp out-of-memory errors with better cluster configuration.
Fixed a situation where jobs would fail to properly delete and would get stuck in a Deleting state.
Core service improvements
HTTP 500 errors with more detail about the error are shown instead of 4xx errors.
Resource already exists errors show more detail about which resources already exist.
Errors related to Google Cloud Storage display specific information instead of a generic error message.
Listing operations support pagination.
Optimizations
Significantly improved YARN utilization for MapReduce jobs running directly against Cloud Storage.
Adjustments to yarn.scheduler.capacity.maximum-am-resource-percent enable better utilization and concurrent job support.
The Cloud Dataproc release notes will serve as a consolidated list of all release notes from our beta launch forward. You can learn more about Cloud Dataproc on the Google Cloud Platform site.
Best,
Google Cloud Dataproc / Google Cloud Spark & Hadoop Team