Hi,
I've been testing Intel VM distribution (2.3-12961) with Elasticsearch Hadoop [1] and noticed the hadoop/lib contain multiple versions of the same jar which cause issues on the classpath. For example /user/lib/hadoop/lib contains:
-- jackson 1.0.1 (4+ years old and unsupported) vs 1.8.8
-rw-r--r--. 1 root root 136059 Feb 22 23:30 jackson-core-asl-1.0.1.jar
-rw-r--r--. 1 root root 227500 Feb 22 23:30 jackson-core-asl-1.8.8.jar
-rw-r--r--. 1 root root 270781 Feb 22 23:30 jackson-mapper-asl-1.0.1.jar
-rw-r--r--. 1 root root 668564 Feb 22 23:30 jackson-mapper-asl-1.8.8.jar
Notice that both 1.0.1 and 1.8.8 are available though Hadoop relies only on 1.8.8. However 1.0.1 can be picked up (note that there are no guarantees of the classpath ordering). Unfortunately jackson is not the only library affected by this:
-- commons logging 1.0.4 vs 1.1.1
-rw-r--r--. 1 root root 60686 Feb 22 23:30 commons-logging-1.1.1.jar
-rw-r--r--. 1 root root 26202 Feb 22 23:30 commons-logging-api-1.0.4.jar
-- beanutils 1.7 vs 1.8
-rw-r--r--. 1 root root 188671 Feb 22 23:30 commons-beanutils-1.7.0.jar
-rw-r--r--. 1 root root 206035 Feb 22 23:30 commons-beanutils-core-1.8.0.jar
The end result is getting a lot of errors when the old versions are being picked up by the JVM instead of the proper ones for obvious reasons.
I would be great to address this critical problem and maybe upgrade the libraries (Hive, Pig and Hadoop) in the process.
Thanks,
Costin
[1]
https://github.com/elasticsearch/elasticsearch-hadoop