Titan 0.5.0: Gremlin & Hadoop 2.5 JobCounter.MB_MILLIS_MAPS Exception

650 views
Skip to first unread message

at...@infotrellis.com

unread,
Aug 23, 2014, 11:01:52 AM8/23/14
to aureliu...@googlegroups.com
I have two Hadoop clusters (using Hadoop ver 2.5 & Hadoop ver 2.4.1). I tried Titab 0.5.0's titan-hadoop-2 functionality as per the documentation. Using gremlin, I can load the graph from a file in HDFS and I query the graph with success (i.e. the map reduce jobs finish successfully). However, just after the successful job completion message, I get the following exception when displaying job statistics. 

I suspect this is due to the version mis-match of the Hadoop cluster (2.5 and/or 2.4.1) and the Hadoop jars (ver 2.2.0) used by gremlin. This behaviour is the same independent of which cluster the jobs are run at (either the Hadoop 2.5.0 or Hadoop 2.4.1); i.e. I get the same exception complaining about JobCounter.MB_MILLIS_MAPS.

When I run the Titan-Hadoop-2 against its own local hadoop (i.e. do nor provide any hadoop config), everything seems to be working fine.

I believe the gremlin scripts are causing the problem here, as it blindly loads all jars from the lib directory including its own hadoop 2.2.0 jars.
CP=`abs_path`/../conf
CP=$CP:$(find -L `abs_path`/../lib/ -name '*.jar' | tr '\n' ':')
CP=$CP:$(find -L `abs_path`/../ext/ -name '*.jar' | tr '\n' ':')


Your feedback on this would be much appreciated.
Thanks!






EXCEPTION
java.lang.RuntimeException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.ResultHookClosure.call(ResultHookClosure.java:44)
        at groovy.lang.Closure.call(Closure.java:428)
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:231)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:64)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
        at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:324)
        at org.codehaus.groovy.tools.shell.Groovysh.this$3$setLastResult(Groovysh.groovy)
        at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
        at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:2416)
        at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:3347)
        at org.codehaus.groovy.tools.shell.Shell.setProperty(Shell.groovy)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.setGroovyObjectProperty(ScriptBytecodeAdapter.java:528)
        at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152)
        at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
        at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
        at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
        at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100)
        at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)
        at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1079)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:61)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.<init>(Console.java:68)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.Console.main(Console.java:73)

Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
        at java.lang.Enum.valueOf(Enum.java:236)
        at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
        at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
        at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
        at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
        at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
        at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
        at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
        at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
        at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
        at com.thinkaurelius.titan.hadoop.compat.h2.Hadoop2Compiler.run(Hadoop2Compiler.java:299)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at com.thinkaurelius.titan.hadoop.HadoopPipeline.submit(HadoopPipeline.java:1092)
        at com.thinkaurelius.titan.hadoop.HadoopPipeline.submit(HadoopPipeline.java:1075)
        at com.thinkaurelius.titan.hadoop.tinkerpop.gremlin.ResultHookClosure.call(ResultHookClosure.java:39)



GREMLIN CLASS PATH Extract for Hadoop jars
/lib/hadoop-annotations-2.2.0.jar
/lib/hadoop-auth-2.2.0.jar
/lib/hadoop-client-2.2.0.jar
/lib/hadoop-common-2.2.0.jar
/lib/hadoop-hdfs-2.2.0.jar
/lib/hadoop-mapreduce-client-app-2.2.0.jar
/lib/hadoop-mapreduce-client-common-2.2.0.jar
/lib/hadoop-mapreduce-client-core-2.2.0.jar
/lib/hadoop-mapreduce-client-jobclient-2.2.0.jar
/lib/hadoop-mapreduce-client-shuffle-2.2.0.jar
/lib/hadoop-yarn-api-2.2.0.jar
/lib/hadoop-yarn-client-2.2.0.jar
/lib/hadoop-yarn-common-2.2.0.jar
/lib/hadoop-yarn-server-common-2.2.0.jar
/lib/hadoop-yarn-server-nodemanager-2.2.0.jar

at...@infotrellis.com

unread,
Aug 23, 2014, 11:34:57 AM8/23/14
to aureliu...@googlegroups.com
So I tried a quick and dirty test by replacing the Hadoop 2.2.0 jars from Titan's lib directory with Hadoop 2.5.0 jars (as shown below).  The exception has gone away, I get the job counts for my query properly (see output below).

I guess the gremlin scripts needs to be smarter so that if HADOOP_PREFIX is set, then use the hadoop jars files from there instead of using Titan's default hadoop jar files.


JAR FILES REPLACED
./hadoop-mapreduce-client-core-2.2.0.jar.del
./hadoop-yarn-api-2.2.0.jar.del
./hadoop-yarn-server-common-2.2.0.jar.del
./hadoop-mapreduce-client-jobclient-2.2.0.jar.del
./hadoop-annotations-2.2.0.jar.del
./hadoop-mapreduce-client-shuffle-2.2.0.jar.del
./hadoop-mapreduce-client-common-2.2.0.jar.del
./hadoop-yarn-common-2.2.0.jar.del
./hadoop-auth-2.2.0.jar.del
./hadoop-common-2.2.0.jar.del
./hadoop-yarn-server-nodemanager-2.2.0.jar.del
./hadoop-yarn-client-2.2.0.jar.del
./hadoop-hdfs-2.2.0.jar.del
./hadoop-mapreduce-client-app-2.2.0.jar.del

ADD HADOOP2.5.0 jars
cd titan-0.5.0-hadoop2/lib
ln -s /hadoop-2.5.0/mapreduce/hadoop-mapreduce-client-core-2.5.0.jar .
ln -s /hadoop-2.5.0/yarn/hadoop-yarn-api-2.5.0.jar . 
ln -s /hadoop-2.5.0/yarn/hadoop-yarn-server-common-2.5.0.jar .
ln -s /hadoop-2.5.0/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0.jar .
ln -s /hadoop-2.5.0/common/lib/hadoop-annotations-2.5.0.jar .
ln -s /hadoop-2.5.0/common/lib/hadoop-annotations-2.5.0.jar . 
ln -s /hadoop-2.5.0/mapreduce/hadoop-mapreduce-client-shuffle-2.5.0.jar .
ln -s /hadoop-2.5.0/mapreduce/hadoop-mapreduce-client-common-2.5.0.jar .
ln -s /hadoop-2.5.0/yarn/hadoop-yarn-common-2.5.0.jar .
ln -s /hadoop-2.5.0/common/lib/hadoop-auth-2.5.0.jar . 
ln -s /hadoop-2.5.0/common/hadoop-common-2.5.0.jar .
ln -s /hadoop-2.5.0/yarn/hadoop-yarn-server-nodemanager-2.5.0.jar . 
ln -s /hadoop-2.5.0/yarn/hadoop-yarn-client-2.5.0.jar .
ln -s /hadoop-2.5.0/hdfs/hadoop-hdfs-2.5.0.jar .
ln -s /hadoop-2.5.0/mapreduce/hadoop-mapreduce-client-app-2.5.0.jar .


QUERY OUTPUT
g = HadoopFactory.open('titan-graphson.properties')
g.V.map

gremlin> g.V.map
11:31:25 WARN  com.thinkaurelius.titan.hadoop.compat.h2.Hadoop2Compiler  - Path tracking is enabled for this Titan/Hadoop job (space and time expensive)
11:31:25 WARN  com.thinkaurelius.titan.hadoop.compat.h2.Hadoop2Compiler  - State tracking is enabled for this Titan/Hadoop job (full deletes not possible)
11:31:26 WARN  org.apache.hadoop.mapreduce.JobSubmitter  - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
11:31:28 INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat  - Total input paths to process : 1
11:31:28 INFO  org.apache.hadoop.mapreduce.JobSubmitter  - number of splits:1
11:31:28 INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Submitting tokens for job: job_1408777709136_0007
11:31:29 INFO  org.apache.hadoop.mapreduce.Job  - The url to track the job: http://masternode2:8088/proxy/application_1408777709136_0007/
11:31:29 INFO  org.apache.hadoop.mapreduce.Job  - Running job: job_1408777709136_0007
11:31:38 INFO  org.apache.hadoop.mapreduce.Job  - Job job_1408777709136_0007 running in uber mode : false
11:31:38 INFO  org.apache.hadoop.mapreduce.Job  -  map 0% reduce 0%
11:31:50 INFO  org.apache.hadoop.mapreduce.Job  -  map 100% reduce 0%
11:31:50 INFO  org.apache.hadoop.mapreduce.Job  - Job job_1408777709136_0007 completed successfully
11:31:50 INFO  org.apache.hadoop.mapreduce.Job  - Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=181566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2154
HDFS: Number of bytes written=2604
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters 
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9588
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=9588
Total vcore-seconds taken by all map tasks=9588
Total megabyte-seconds taken by all map tasks=9818112
Map-Reduce Framework
Map input records=12
Map output records=0
Input split bytes=126
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=165
CPU time spent (ms)=6720
Physical memory (bytes) snapshot=214536192
Virtual memory (bytes) snapshot=714326016
Total committed heap usage (bytes)=134217728
com.thinkaurelius.titan.hadoop.mapreduce.transform.PropertyMapMap$Counters
VERTICES_PROCESSED=12
com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Counters
EDGES_PROCESSED=0
VERTICES_PROCESSED=12
File Input Format Counters 
Bytes Read=2028
File Output Format Counters 
Bytes Written=0
==>0 {_id=[0], name=[saturn], type=[titan]}
==>1 {_id=[1], name=[jupiter], type=[god]}
==>2 {_id=[2], name=[neptune], type=[god]}
==>3 {_id=[3], name=[pluto], type=[god]}
==>4 {_id=[4], name=[sky], type=[location]}
==>5 {_id=[5], name=[sea], type=[location]}
==>6 {_id=[6], name=[tartarus], type=[location]}
==>7 {_id=[7], name=[hercules], type=[demigod]}
==>8 {_id=[8], name=[alcmene], type=[human]}
==>9 {_id=[9], name=[nemean], type=[monster]}
==>10 {_id=[10], name=[hydra], type=[monster]}
==>11 {_id=[11], name=[cerberus], type=[monster]}

Guy Taylor

unread,
Aug 23, 2014, 11:36:53 AM8/23/14
to aureliu...@googlegroups.com
This is a versioning mismatch, likely caused by the libs in the Titan /lib dir. 

I spent a couple of hours on this yesterday, and then got the correct libraries loading and it worked much better.
--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/e53ec2d7-5c1d-40f4-8727-8a93dc4aff28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Guy Taylor
Systems Choreographer
--

at...@infotrellis.com

unread,
Aug 23, 2014, 12:18:36 PM8/23/14
to aureliu...@googlegroups.com
Thanks for your reply. I came to the same conclusion after spending a few hours on this.

Two things stood out for me:
  1. The error message did not contain any indication leading to the version mis-match diagnosis.
    I understand that this is an error message from the hadoop code, so we may be stuck with it.

  2. Enhancing the gremlin startup script is a 'low hanging fruit' when it comes to fixing this problem.
    The HADOOP_PREFIX is already being used, so why not take advantage of that.

To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraphs+unsubscribe@googlegroups.com.

Dan LaRocque

unread,
Aug 25, 2014, 1:38:17 AM8/25/14
to aureliu...@googlegroups.com
Hi,

Thanks for following up.  I think this may be a variation on https://issues.apache.org/jira/browse/MAPREDUCE-5831.

I've seen a couple of oddities on installations with mismatched client/cluster Hadoop versions.  Besides counter linkage errors, I've also seen the http link to the job tracker start with http://http://.  They must have shifted the bit of code that prepends the protocol around between minors in the 2.x series.  Both issues disappear when I've matched the client Hadoop version to the cluster's.

As Vinod mentioned in that bug, cross-version MR/YARN wire compatibility isn't really stable yet.  This matches my experience.

Dropping the cluster's jars into the client often works, but it's really moving the problem around.  It rules out wire compat issues between client and cluster since they're all the same code, but now we have the possibility of ClassNotFoundException/MethodNotFoundException if an ABI-breaking change across Hadoop minor versions touches code referenced by Titan-Hadoop or Titan-HBase's classfiles.  If changing minors produces a Hadoop linkage error in Titan, then I would like to add some defensive reflection around the affected type to work on either side of the ABI change.  Still, linkage errors are a better problem than wire protocol incompatibility, since the latter can't be fixed by making Titan smarter about how it uses Hadoop.

It's theoretically possible to avoid both linkage and wire compat problems by recompiling Titan (and whatever other bits of the stack under/above it that are exposed to Hadoop APIs) against the specific version installed on your cluster.  This is a niche approach and a tremendous pain; I don't think most people want to go down that road.

That's a good point about gremlin.sh & $HADOOP_PREFIX.  Though we would need to modify gremlin.sh so that it falls back gracefully to using the Hadoop jars packed with Titan when $HADOOP_PREFIX is unset.  The zipfile supports a self-contained trivial Hadoop MR environment out of the box using the local JobRunner and LocalFileSystem, so it's possible to play with Titan-Hadoop without an actual cluster in place.  I also wonder if we need to check $HADOOP_HOME for MRv1.  Some details to check, but it seems feasible.

thanks,
Dan
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/4803774a-b465-4708-8939-d3df9c55fbae%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages