Greetings, users of Hadoop on Google Cloud Platform!
As you may have heard from Google Cloud Storage announcements or the v1 release notes, the v1beta2 API for Google Cloud Storage will stop working after October 31st, 2014. Since the Google Cloud Storage connector for Hadoop calls the Google Cloud Storage public APIs, this means you may be affected if using old versions of the GCS connector, including already-running clusters that were deployed with old versions as well as new clusters deployed with old tools.
The GCS connector for Hadoop migrated off of the deprecated v1beta2 API ever since the 1.2.6 release on June 6th, 2014, corresponding to bdutil-0.34.2, and was also accordingly pushed to the github repository for the GCS connector. If you can verify that your Hadoop clusters are running 1.2.6 or newer, or if you’re deploying using the default settings of bdutil-0.34.2 or newer, then no further action is necessary on your part. Otherwise, please be sure to upgrade to gcs-connector-1.2.6 or newer before October 31st, 2014 to ensure there are no unplanned disruptions of your Hadoop workloads.
The best way to upgrade is simply to delete your old cluster(s), migrate to the latest version of bdutil, and redeploy fresh clusters to pick up the newest libraries. This is especially recommended if you also use the BigQuery or Datastore connectors, or also installed Pig, Hive, Spark, or Shark (or other higher-level libraries on top of Hadoop).
If you absolutely must upgrade in-place, then there are some options for using bdutil to aid in doing so; this requires stopping your Hadoop daemons, deleting the old connector libraries, installing the new one, and then restarting your Hadoop daemons. Keep in mind that in-place upgrades haven’t been tested across all configurations, and you may run the risk of breaking the configuration of your existing cluster. For example, if you originally used bdutil-0.34.1 to deploy your cluster with Hadoop 1.2.1 as follows:
bdutil-0.34.1$ ./bdutil -b <bucket> -p <project> -P <my-prefix> deploy
Then you may upgrade the gcs-connector on your cluster with the following commands running bdutil-0.35.2:
# Stop the Hadoop daemons.
./bdutil -b <bucket> -p <project> -P <my-prefix> \
--old_hostname_suffixes run_command -t master -- \
"sudo -u hadoop /home/hadoop/hadoop-install/bin/stop-all.sh"
# Comment out HADOOP_CLASSPATH for old connector in hadoop-env.sh.
./bdutil -b <bucket> -p <project> -P <my-prefix> \
--old_hostname_suffixes run_command -t all -- \
"sed -i 's/^\(export HADOOP_CLASSPATH.*gcs-connector\)/# \1/' \
/home/hadoop/hadoop-install/conf/hadoop-env.sh"
# Delete the old connector from the lib/ directory.
./bdutil -b <bucket> -p <project> -P <my-prefix> \
--old_hostname_suffixes run_command -t all -- \
"rm /home/hadoop/hadoop-install/lib/gcs-connector-*.jar"
# Run the command group which installs the newest gcs connector.
./bdutil -b <bucket> -p <project> -P <my-prefix> \
--old_hostname_suffixes run_command_group install_connectors
# Start the Hadoop daemons.
./bdutil -b <bucket> -p <project> -P <my-prefix> \
--old_hostname_suffixes run_command -t master -- \
"sudo -u hadoop /home/hadoop/hadoop-install/bin/start-all.sh"
# Test your upgraded Hadoop cluster.
./bdutil -b <bucket> -p <project> -P <my-prefix> \
--old_hostname_suffixes shell < hadoop-validate-setup.sh
A similar approach may be taken for other cluster setups like Hadoop 2, but will require changing the specified paths for hadoop-env.sh, the lib/ directory, and which script to run for stopping and restarting the Hadoop daemons.
As always, please send any questions or comments to gcp-hadoo...@google.com or post a question on stackoverflow.com with tag ‘google-hadoop’ for additional assistance.
All the best,
Your Google Team