Announcing bdutil-1.3.1, gcs-connector-1.4.1, and bigquery-connector-0.7.1

74 views
Skip to first unread message

Hadoop on Google Cloud Platform Team

unread,
Jul 9, 2015, 7:20:45 PM7/9/15
to gcp-had...@google.com, gcp-hadoo...@googlegroups.com

Greetings, users of Hadoop on the Google Cloud Platform!


We’re excited to announce the latest versions of bdutil, gcs-connector, and bigquery-connector with several bug fixes and new features.


Abridged highlights for connector updates:


  • Upgraded Google API client libraries and Guava to latest versions (see detailed notes below for exact version numbers)

  • Removed an obsolete configuration setting for a 250GB upload limit on objects; objects larger than 250GB can now be uploaded without modifying config settings

  • Added better configurability and usability for directly calling the lower-level GoogleCloudStorage libraries when not going through the Hadoop FileSystem interface


Abridged highlights for bdutil:



Download bdutil-1.3.1.tar.gz or bdutil-1.3.1.zip now to try it out, or visit the developer documentation, where the download links now point to the latest version. For manual installation or local library usage, download the jar directly.



Please see the detailed release notes below for more information about the new bdutil, GCS connector, and BigQuery connector features.


As always, please send any questions or comments to gcp-hadoo...@google.com or post a question on stackoverflow.com with tag ‘google-hadoop’ for additional assistance.


All the best,

Your Google Team


Release Notes

bdutil-1.3.1: CHANGES.txt

1.3.1 - 2015-07-09


 1. Added plugin for deploying MapR under platforms/mapr/mapr_env.sh; see

    platforms/mapr/README.md for details.

 2. Changed mapreduce.fileoutputcommitter.algorithm.version to "2"; this should

    only have an effect when running with Hadoop 2.7+, where it significantly

    speeds up job-commit time when using the GCS connector.

    See https://issues.apache.org/jira/browse/MAPEDUCE-4815 for more details.

 3. Added an option ENABLE_STORM_BIGTABLE to extensions/storm/storm_env.sh to

    set up using Google Cloud Bigtable from Apache Storm.

 4. Updated Flink version to 0.9.0.

 5. Switched from using SPARK_CLASSPATH to using SPARK_DIST_CLASSPATH pointed

    at the Hadoop classpath to inherit gcs-connector and other Hadoop libraries

    on the default Spark classpath. This gets rid of a warning message about

    SPARK_CLASSPATH deprecation when running Spark, and improves access to

    related Hadoop libraries from Spark jobs.

 6. Fixed reboot recovery for single-node clusters; this includes the ability

    for single-node clusters to recover from issuing "Stop" and then "Start"

    commands via the GCE API.

 7. Added explicit value for mapreduce.job.working.dir in Ambari config; this

    works around a bug in PigInputFormat where an exception is thrown with

    "Wrong FS scheme" when the default filesystem doesn't have the same scheme

    as the filesystem of the input file(s) (e.g. when reading GCS files and

    the default FS is HDFS). Pig reading from GCS should now work in ambari

    bdutil deployments.

 8. Fixed a bug where Hive deployed under ambari_env.sh is unable to

    LOAD DATA INPATH 'gs://<...>' due to Hive server needing to be restarted

    after GCS connector installation to pick it up on its classpath.




gcs-connector-1.4.1: CHANGES.txt

1.4.1 - 2015-07-09


 1. Switched from the custom SeekableReadableByteChannel to

    Java 7's java.nio.channels.SeekableByteChannel.

 2. Removed the configurable but default-constrained 250GB upload limit;

    uploads can now exceed 250GB without needing to modify config settings.

 3. Added helper classes related to GCS retries.

 4. Added workaround support for read retries on objects with content-encoding

    set to gzip; such content encoding isn't generally correct to use since

    it means filesystem reported bytes will not match actual read bytes, but

    for cases which accept byte mismatches, the read channel can now manually

    seek to where it left off on retry rather than having a GZIPInputStream

    throw an exception for a malformed partial stream.

 5. Added an option for enabling "direct uploads" in

    GoogleCloudStorageWriteChannel which is not directly used by the Hadoop

    layer, but can be used by clients which directly access the lower

    GoogleCloudStorage layer.

 6. Added CreateBucketOptions to the GoogleCloudStorage interface so that

    clients using the low-level GoogleCloudStorage directly can create buckets

    with different locations and storage classes.

 7. Fixed https://github.com/GoogleCloudPlatform/bigdata-interop/issues/5 where

    stale cache entries caused stuck phantom directories if the directories

    were deleted using non-Hadoop-based GCS clients.

 8. Fixed a bug which prevented the Apache HTTP transport from working with

    Hadoop 2 when no proxy was set.

 9. Misc updates in library dependencies; google.api.version

    (com.google.http-client, com.google.api-client) updated from 1.19.0 to

    1.20.0, google-api-services-storage from v1-rev16-1.19.0 to

    v1-rev35-1.20.0, and google-api-services-bigquery from v2-rev171-1.19.0

    to v2-rev217-1.20.0, and Guava from 17.0 to 18.0.




bigquery-connector-0.7.1: CHANGES.txt

0.7.1 - 2015-07-09


 1. Misc updates in library dependencies; google.api.version

    (com.google.http-client, com.google.api-client) updated from 1.19.0 to

    1.20.0, google-api-services-storage from v1-rev16-1.19.0 to

    v1-rev35-1.20.0, and google-api-services-bigquery from v2-rev171-1.19.0

    to v2-rev217-1.20.0, and Guava from 17.0 to 18.0.


Reply all
Reply to author
Forward
0 new messages