Announcing new versions of bdutil-1.0.1, gcs-connector-1.3.1, and bigquery-connector-0.5.0

194 views
Skip to first unread message

Hadoop on Google Cloud Platform Team

unread,
Dec 17, 2014, 4:01:45 PM12/17/14
to gcp-had...@google.com, gcp-hadoo...@googlegroups.com

Greetings, users of Hadoop on Google Cloud Platform!


We’re excited to announce the latest version of bdutil which updates bdutil to use the gcloud Cloud SDK interface, the latest version of the bigquery-connector which adds support for Hadoop 2 MapReduce and Avro-based exports, and the latest version of datastore-connector which also adds support for Hadoop 2 MapReduce.


Download bdutil-1.0.1.tar.gz or bdutil-1.0.1.zip now to try it out, or visit the developer documentation where the download links now point to the latest version.


Abridged highlights for bdutil updates:


  • bdutil now uses gcloud compute for interacting with GCE.

  • The default zone for bdutil was changed to us-central1-a.


Abridged highlights for bigquery-connector updates:


  • Support for Hadoop 2 was added for java MapReduce.

  • Adds support for Avro-based exports and MapReduce applications.


Abridged highlights for gcs-connector updates:


  • Several improvements to NFS-cache handling and improved handling of 500-level errors.


Abridged highlights for datastore-connector updates:


  • Adds support for Hadoop 2.


Please see the detailed release notes below for more information about the new bdutil and connector features.


You may download each of the connectors directly via the following links, or use the latest bdutil to install them on a new cluster.

gcs-connector: gcs-connector-1.3.1-hadoop1.jar and gcs-connector-1.3.1-hadoop2.jar

bigquery-connector: bigquery-connector-0.5.0-hadoop1.jar and bigquery-connector-0.5.0-hadoop2.jar

datastore-connector:  datastore-connector-0.14.9-hadoop1.jar and datastore-connector-0.14.9-hadoop2.jar


As always, please send any questions or comments to gcp-hadoo...@google.com or post a question on stackoverflow.com with tag ‘google-hadoop’ for additional assistance.


All the best,

Your Google Team



bdutil-1.0.1: CHANGES.txt

1.0.1 - 2014-12-16


 1. Replaced usage of deprecated gcutil with gcloud compute.

 2. Changed GCE_SERVICE_ACCCOUNT_SCOPES from a comma separated list to a bash

    array.

 3. Fixed cleanup of pig-validate-setup.sh, hive-validate-setup.sh and

    spark-validate-setup.sh.

 4. Upgraded default Spark version to 1.1.1.

 5. The default zone for instances is now us-central1-a.



gcs-connector-1.3.1: CHANGES.txt

1.3.1 - 2014-12-16


 1. Fixed a rare NullPointerException in FileSystemBackedDirectoryListCache

    which can occur if a directory being listed is purged from the cache

    between a call to "exists()" and "listFiles()".

 2. Fixed a bug in GoogleHadoopFileSystemCacheCleaner where cache-cleaner

    fails to clean any contents when a bucket is non-empty but expired.

 3. Fixed a bug in FileSystemBackedDirectoryListCache which caused garbage

    collection to require several passes for large directory hierarchies;

    now we can successfully garbage-collect an entire expired tree in a

    single pass, and cache files are also processed in-place without having

    to create a complete in-memory list.

 4. Updated handling of new file creation, file copying, and file deletion

    so that all object modification requests sent to GCS contain preconditions

    that should prevent race-conditions in the face of retried operations.




bigquery-connector-0.5.0: CHANGES.txt

0.5.0 - 2014-12-16


 1. BigQueryInputFormat has been renamed GsonBigQueryInputFormat to better

    reflect its nature as a gson-based format. A forwarding declaration

    was left in place to maintain compatibility.

 2. JsonTextBigQueryInputFormat was added to provide lines of JSON text as

    they appear in the BigQuery export.

 3. When using sharded BigQuery exports (the default), the keys will no

    longer be in increasing order per mapper. Instead, the keys will be

    as they are reported by the delegate RecordReader which is generally

    going to be the byte position within the current file. However, the

    sharded export creates many files per mapper so this position will

    appear to reset to 0 when we switch between files. The record reader's

    getProgress() will still report progress across the entire dataset that

    the record reader is responsible for.

 4. The BigQuery connector can now ingest Avro based BigQuery exports. Using

    and Avro-based export should result in less data transferred between your

    MapReduce job and Google Cloud Storage and should require less CPU time

    to parse the data files. To use Avro, set the input format to

    AvroBigQueryInputFormat and update your map code to expect LongWritable

    keys and Avro GenericData.Record values.

 5. Hadoop 2 support was added for java MapReduce. Streaming support for

    Hadoop 2 will be included in a future release.



datastore-connector-0.14.9: CHANGES.txt

0.14.9 - 2014-12-16


 1. Added support for Hadoop2.


Reply all
Reply to author
Forward
0 new messages