Greetings, users of Hadoop on Google Cloud Platform!
We're pleased to announce new versions of bdutil, the GCS connector for Hadoop, the BigQuery connector for Hadoop and the Datastore connector for Hadoop with bugfixes and minor improvements. Download bdutil-0.35.1.tar.gz or bdutil-0.35.1.zip now to try it out, or visit the developer documentation where the download links now point to the latest version.
There were several small usability changes to bdutil. The most significant is a change in the VM naming conventions used. The default VM prefix is now ”hadoop”, and the master and worker suffixes are now ’m’ and ’w’ respectively. To use bdutil-0.35.1, with any clusters spun up with a previous version you will need to add the --old_hostname_suffixes flag. Without the --old_hostname_suffixes flag, bdutil will not operate on old/existing clusters deployed with older bdutil versions.
You can download the connector jarfile directly, gcs-connector-1.2.8-hadoop1.jar for use with Hadoop 1 (and other versions of the same series, like 0.20.205.0), and gcs-connector-1.2.8.-hadoop2.jar for use with Hadoop 2 or allow bdutil to perform the installation and configuration for you without having to deal directly with jarfiles.
As always, please send any questions or comments to gcp-hadoo...@google.com
All the best,
Your Google Team
gcs-connector-1.2.8: CHANGES.txt
1.2.8 - 2014-08-07
1. Changed the manner in which the GCS connector jar is built to A) reduce
included dependencies to only those parts which are used and B) repackaged
dependencies whose versions conflict with those bundled with Hadoop.
2. Deprecated fs.gs.system.bucket config.
bdutil-0.35.1: CHANGES.txt
0.35.1 - 2014-08-07
1. Added a boolean bdutil option DEBUG_MODE with corresponding flags
-D/--debug which turns on high-verbosity modes for gcutil and gsutil
calls during the deployment, including on the remote VMs.
2. Added the ability for the Google connectors, bdconfig, and Hadoop
distributions to be stored and fetched from gs:// style URLs in addition
to http:// URLs.
3. In VERBOSE_MODE, on failure the detailed debuginfo.txt is now also printed
to the console in addition to being available in the /tmp directory.
4. Moved all configuration templates into conf/.
5. Changed the default PREFIX to 'hadoop' instead of 'hs-ghfs', and the
naming convention for masters/workers to follow $PREFIX-m and $PREFIX-w-$i
instead of $PREFIX-nn and $PREFIX-dn-$i. IMPORTANT: This breaks
compatibility with existing clusters deployed with bdutil 0.34.x and older,
but there is a new flag "--old_hostname_suffixes" to continue using the old
-nn/-dn naming convention. For example, to turn
down an old cluster if you've been using the default prefix:
./bdutil --prefix=hs-ghfs --old_hostname_suffixes delete
6. Fixed a bug in VM environments where run_command could not find commands
such as 'hadoop' in their PATH.
7. Update BigQuery / Datastore sample scripts to be used with
"./bdutil run_command." rather than locally.
8. Added a test to guarantee VMs had no more than 64 characters in their fully
qualified domain names.
9. Added the import_env helper to allow "_env.sh" files to depend on each
other.
10. Renamed spark1_env.sh to spark_env.sh.
11. Added a gsutil update check upon first entering a VM.
datastore-connector-0.14.6: CHANGES.txt
0.14.6 - 2014-08-07
1. Misc updates in library dependencies.
bigquery-connector-0.4.3: CHANGES.txt
0.4.3 - 2014-08-07
1. Added better validation to BigQueryUtils.getSchemaFromString used by
BigQueryOutputFormat to throw descriptive IllegalArgumentExceptions
instead of NullPointerExceptions for most types of malformed schemas.
2. Fixed a bug in BigQueryUtils.getSchemaFromString to support 'repeated'
fields inside of nested records; used to throw IllegalStateException
if a nested record contained more than 1 inner field.