Greetings, users of Hadoop on Google Cloud Platform!
We’ve recently discovered a bug in the implementation of globStatus for the GCS connector for Hadoop which causes it to erroneously report “not found” for globs inside nested subdirectories if using Hadoop 2.2.0. Note that this bug does not affect Hadoop 1.2.1, so it is optional to upgrade to this patched version if you’re only using 1.2.1.
Download the relevant patched connector libraries gcs-connector-1.2.7-hadoop1.jar and gcs-connector-1.2.7-hadoop2.jar directly, or simply upgrade to bdutil-0.34.4.tar.gz or bdutil-0.34.4.zip for your new Hadoop clusters to have them use the latest connector libraries.
The following is an example of the bug which is now fixed in this release:
$ echo hello | hadoop fs -put - dir0/dir1/foo.txt
$ hadoop fs -ls dir0/dir1/*
ls: `dir0/dir1/*': No such file or directory
$ hadoop fs -ls dir0/dir1
Found 1 items
-rwx------ 3 hadoop hadoop 6 2014-06-21 03:14 ../dir0/dir1/foo.txt
Thank you for your continued patience and feedback as we work to smooth out these kinds of issues in the newer Hadoop 2 support, and as always, please send any questions or comments to gcp-hadoo...@google.com
All the best,
Your Google Team
bdutil-0.34.4: CHANGES.txt
0.34.4 - 2014-06-23
1. Switched default gcs-connector version to 1.2.7 for patch fixing a bug
where globs wrongly reported "not found" in some cases in Hadoop 2.2.0.
gcs-connector-1.2.7: CHANGES.txt
1.2.7 - 2014-06-23
1. Fixed a bug where certain globs incorrectly reported the parent directory
being not found (and thus erroring out) in Hadoop 2.2.0 due to an
interaction with the fs.gs.glob.flatlist.enable feature; doesn't affect
Hadoop 1.2.1.