Announcing new versions of bdutil-1.0.1, gcs-connector-1.3.1, and bigquery-connector-0.5.0

194 views

Skip to first unread message

Hadoop on Google Cloud Platform Team

unread,

Dec 17, 2014, 4:01:45 PM12/17/14

to gcp-had...@google.com, gcp-hadoo...@googlegroups.com

Greetings, users of Hadoop on Google Cloud Platform!

We’re excited to announce the latest version of bdutil which updates bdutil to use the gcloud Cloud SDK interface, the latest version of the bigquery-connector which adds support for Hadoop 2 MapReduce and Avro-based exports, and the latest version of datastore-connector which also adds support for Hadoop 2 MapReduce.

Download bdutil-1.0.1.tar.gz or bdutil-1.0.1.zip now to try it out, or visit the developer documentation where the download links now point to the latest version.

Abridged highlights for bdutil updates:

bdutil now uses gcloud compute for interacting with GCE.
The default zone for bdutil was changed to us-central1-a.

Abridged highlights for bigquery-connector updates:

Support for Hadoop 2 was added for java MapReduce.
Adds support for Avro-based exports and MapReduce applications.

Abridged highlights for gcs-connector updates:

Several improvements to NFS-cache handling and improved handling of 500-level errors.

Abridged highlights for datastore-connector updates:

Adds support for Hadoop 2.

Please see the detailed release notes below for more information about the new bdutil and connector features.

You may download each of the connectors directly via the following links, or use the latest bdutil to install them on a new cluster.

gcs-connector: gcs-connector-1.3.1-hadoop1.jar and gcs-connector-1.3.1-hadoop2.jar

bigquery-connector: bigquery-connector-0.5.0-hadoop1.jar and bigquery-connector-0.5.0-hadoop2.jar

datastore-connector: datastore-connector-0.14.9-hadoop1.jar and datastore-connector-0.14.9-hadoop2.jar

As always, please send any questions or comments to gcp-hadoo...@google.com or post a question on stackoverflow.com with tag ‘google-hadoop’ for additional assistance.

All the best,

Your Google Team

bdutil-1.0.1: CHANGES.txt

1.0.1 - 2014-12-16

1. Replaced usage of deprecated gcutil with gcloud compute.

2. Changed GCE_SERVICE_ACCCOUNT_SCOPES from a comma separated list to a bash

array.

3. Fixed cleanup of pig-validate-setup.sh, hive-validate-setup.sh and

spark-validate-setup.sh.

4. Upgraded default Spark version to 1.1.1.

5. The default zone for instances is now us-central1-a.

gcs-connector-1.3.1: CHANGES.txt

1.3.1 - 2014-12-16

1. Fixed a rare NullPointerException in FileSystemBackedDirectoryListCache

which can occur if a directory being listed is purged from the cache

between a call to "exists()" and "listFiles()".

2. Fixed a bug in GoogleHadoopFileSystemCacheCleaner where cache-cleaner

fails to clean any contents when a bucket is non-empty but expired.

3. Fixed a bug in FileSystemBackedDirectoryListCache which caused garbage

collection to require several passes for large directory hierarchies;

now we can successfully garbage-collect an entire expired tree in a

single pass, and cache files are also processed in-place without having

to create a complete in-memory list.

4. Updated handling of new file creation, file copying, and file deletion

so that all object modification requests sent to GCS contain preconditions

that should prevent race-conditions in the face of retried operations.

bigquery-connector-0.5.0: CHANGES.txt

0.5.0 - 2014-12-16

1. BigQueryInputFormat has been renamed GsonBigQueryInputFormat to better

reflect its nature as a gson-based format. A forwarding declaration

was left in place to maintain compatibility.

2. JsonTextBigQueryInputFormat was added to provide lines of JSON text as

they appear in the BigQuery export.

3. When using sharded BigQuery exports (the default), the keys will no

longer be in increasing order per mapper. Instead, the keys will be

as they are reported by the delegate RecordReader which is generally

going to be the byte position within the current file. However, the

sharded export creates many files per mapper so this position will

appear to reset to 0 when we switch between files. The record reader's

getProgress() will still report progress across the entire dataset that

the record reader is responsible for.

4. The BigQuery connector can now ingest Avro based BigQuery exports. Using

and Avro-based export should result in less data transferred between your

MapReduce job and Google Cloud Storage and should require less CPU time

to parse the data files. To use Avro, set the input format to

AvroBigQueryInputFormat and update your map code to expect LongWritable

keys and Avro GenericData.Record values.

5. Hadoop 2 support was added for java MapReduce. Streaming support for

Hadoop 2 will be included in a future release.

datastore-connector-0.14.9: CHANGES.txt

0.14.9 - 2014-12-16

1. Added support for Hadoop2.

Reply all

Reply to author

Forward

0 new messages