Announcing bdutil-1.3.1, gcs-connector-1.4.1, and bigquery-connector-0.7.1

74 views

Skip to first unread message

Hadoop on Google Cloud Platform Team

unread,

Jul 9, 2015, 7:20:45 PM7/9/15

to gcp-had...@google.com, gcp-hadoo...@googlegroups.com

Greetings, users of Hadoop on the Google Cloud Platform!

We’re excited to announce the latest versions of bdutil, gcs-connector, and bigquery-connector with several bug fixes and new features.

Abridged highlights for connector updates:

Upgraded Google API client libraries and Guava to latest versions (see detailed notes below for exact version numbers)
Removed an obsolete configuration setting for a 250GB upload limit on objects; objects larger than 250GB can now be uploaded without modifying config settings
Added better configurability and usability for directly calling the lower-level GoogleCloudStorage libraries when not going through the Hadoop FileSystem interface

Abridged highlights for bdutil:

Thanks to the generous contributions of MapR, bdutil now includes a plugin for deploying MapR clusters on GCE
Added support for running Storm on Cloud Bigtable using the Cloud Bigtable connector for HBase, with an end-to-end example under cloud-bigtable-examples
Default Flink version is now 0.9.0

Download bdutil-1.3.1.tar.gz or bdutil-1.3.1.zip now to try it out, or visit the developer documentation, where the download links now point to the latest version. For manual installation or local library usage, download the jar directly.

Please see the detailed release notes below for more information about the new bdutil, GCS connector, and BigQuery connector features.

As always, please send any questions or comments to gcp-hadoo...@google.com or post a question on stackoverflow.com with tag ‘google-hadoop’ for additional assistance.

All the best,

Your Google Team

Release Notes

bdutil-1.3.1: CHANGES.txt

1.3.1 - 2015-07-09

1. Added plugin for deploying MapR under platforms/mapr/mapr_env.sh; see

platforms/mapr/README.md for details.

2. Changed mapreduce.fileoutputcommitter.algorithm.version to "2"; this should

only have an effect when running with Hadoop 2.7+, where it significantly

speeds up job-commit time when using the GCS connector.

See https://issues.apache.org/jira/browse/MAPEDUCE-4815 for more details.

3. Added an option ENABLE_STORM_BIGTABLE to extensions/storm/storm_env.sh to

set up using Google Cloud Bigtable from Apache Storm.

4. Updated Flink version to 0.9.0.

5. Switched from using SPARK_CLASSPATH to using SPARK_DIST_CLASSPATH pointed

at the Hadoop classpath to inherit gcs-connector and other Hadoop libraries

on the default Spark classpath. This gets rid of a warning message about

SPARK_CLASSPATH deprecation when running Spark, and improves access to

related Hadoop libraries from Spark jobs.

6. Fixed reboot recovery for single-node clusters; this includes the ability

for single-node clusters to recover from issuing "Stop" and then "Start"

commands via the GCE API.

7. Added explicit value for mapreduce.job.working.dir in Ambari config; this

works around a bug in PigInputFormat where an exception is thrown with

"Wrong FS scheme" when the default filesystem doesn't have the same scheme

as the filesystem of the input file(s) (e.g. when reading GCS files and

the default FS is HDFS). Pig reading from GCS should now work in ambari

bdutil deployments.

8. Fixed a bug where Hive deployed under ambari_env.sh is unable to

LOAD DATA INPATH 'gs://<...>' due to Hive server needing to be restarted

after GCS connector installation to pick it up on its classpath.

gcs-connector-1.4.1: CHANGES.txt

1.4.1 - 2015-07-09

1. Switched from the custom SeekableReadableByteChannel to

Java 7's java.nio.channels.SeekableByteChannel.

2. Removed the configurable but default-constrained 250GB upload limit;

uploads can now exceed 250GB without needing to modify config settings.

3. Added helper classes related to GCS retries.

4. Added workaround support for read retries on objects with content-encoding

set to gzip; such content encoding isn't generally correct to use since

it means filesystem reported bytes will not match actual read bytes, but

for cases which accept byte mismatches, the read channel can now manually

seek to where it left off on retry rather than having a GZIPInputStream

throw an exception for a malformed partial stream.

5. Added an option for enabling "direct uploads" in

GoogleCloudStorageWriteChannel which is not directly used by the Hadoop

layer, but can be used by clients which directly access the lower

GoogleCloudStorage layer.

6. Added CreateBucketOptions to the GoogleCloudStorage interface so that

clients using the low-level GoogleCloudStorage directly can create buckets

with different locations and storage classes.

7. Fixed https://github.com/GoogleCloudPlatform/bigdata-interop/issues/5 where

stale cache entries caused stuck phantom directories if the directories

were deleted using non-Hadoop-based GCS clients.

8. Fixed a bug which prevented the Apache HTTP transport from working with

Hadoop 2 when no proxy was set.

9. Misc updates in library dependencies; google.api.version

(com.google.http-client, com.google.api-client) updated from 1.19.0 to

1.20.0, google-api-services-storage from v1-rev16-1.19.0 to

v1-rev35-1.20.0, and google-api-services-bigquery from v2-rev171-1.19.0

to v2-rev217-1.20.0, and Guava from 17.0 to 18.0.

bigquery-connector-0.7.1: CHANGES.txt

0.7.1 - 2015-07-09

1. Misc updates in library dependencies; google.api.version

(com.google.http-client, com.google.api-client) updated from 1.19.0 to

1.20.0, google-api-services-storage from v1-rev16-1.19.0 to

v1-rev35-1.20.0, and google-api-services-bigquery from v2-rev171-1.19.0

to v2-rev217-1.20.0, and Guava from 17.0 to 18.0.

Reply all

Reply to author

Forward

0 new messages