Greetings Hadoop on Google Cloud Platform users!
We’re excited to announce the preview release of two new libraries, the Google BigQuery connector and Google Cloud Datastore connector for Hadoop, to make it easier for you to run Hadoop jobs directly against your data in Google BigQuery and Google Cloud Datastore. The Google BigQuery and Google Cloud Datastore connectors implement Hadoop’s InputFormat and OutputFormat interfaces for accessing data. These two connectors complement the existing Google Cloud Storage connector for Hadoop, which implements the Hadoop Distributed File System interface for accessing data in Google Cloud Storage.
The connectors can be automatically installed and configured when deploying your Hadoop cluster using bdutil simply by including the extra “env” files:
./bdutil deploy bigquery_env.sh
./bdutil deploy datastore_env.sh
./bdutil deploy datastore_env.sh bigquery_env.sh
Here are some word-count MapReduce code samples to get you started
To use bdutil to install the new connectors and also obtain the the pre-built samples, you can download the new bdutil-0.33.1.tar.gz or bdutil-0.33.1.zip directly, or visit the developer documentation where the download links now point to the latest version.
As always, please send any questions or comments to gcp-hadoo...@google.com
All the best,
Your Google Team
gcs-connector-1.2.4: CHANGES.txt
1.2.4 - 2014-04-09
1. The value of fs.gs.io.buffersize.write is now rounded up to 8MB if set to
a lower value, otherwise the backend will error out on unaligned chunks.
2. Misc refactoring to enable reuse of the resumable upload classes in other
libraries.
bdutil-0.33.1: CHANGES.txt
0.33.1 - 2014-04-09
1. Added deployment scripts for the BigQuery and Datastore connectors.
2. Added sample jarfiles for the BigQuery and Datastore connectors under
a new /samples/ subdirectory along with scripts for running the samples.
3. Set the default image type to backports-debian-7 for improved networking.