Illegal Argument Exception when running Giraph job

21 views
Skip to first unread message

Collin Scangarella

unread,
Apr 25, 2017, 7:20:34 PM4/25/17
to Gremlin-users
Hello, 

I'm trying to run a giraph job in a hadoop cluster. I'm getting an exception when the job is deserializing the vertices:

2017-04-25 23:01:36,883 ERROR org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed
java
.lang.RuntimeException: java.lang.IllegalArgumentException: Edge with id already exists: 47144961054
        at com
.thinkaurelius.titan.hadoop.formats.util.TitanVertexDeserializer.readHadoopVertex(TitanVertexDeserializer.java:181)
        at com
.thinkaurelius.titan.hadoop.formats.util.GiraphRecordReader.nextKeyValue(GiraphRecordReader.java:46)
        at org
.apache.tinkerpop.gremlin.hadoop.process.computer.giraph.io.GiraphVertexReader.nextVertex(GiraphVertexReader.java:50)
        at org
.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:124)
        at org
.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
        at org
.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
        at org
.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
        at org
.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
        at java
.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java
.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Edge with id already exists: 47144961054
        at org
.apache.tinkerpop.gremlin.structure.Graph$Exceptions.edgeWithIdAlreadyExists(Graph.java:1093)
        at org
.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerHelper.addEdge(TinkerHelper.java:57)
        at org
.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex.addEdge(TinkerVertex.java:127)
        at com
.thinkaurelius.titan.hadoop.formats.util.TitanVertexDeserializer.readHadoopVertex(TitanVertexDeserializer.java:129)
       
... 11 more

Additionally, that edge doesn't seem to exist:

gremlin> g.E(47144961054)
gremlin>

Here's my configuration file:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=com.thinkaurelius.titan.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat

gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=false
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

titanmr.ioformat.conf.storage.backend=cassandra
titanmr.ioformat.conf.storage.hostname=cassandra.private.ip.addresses
titanmr.ioformat.conf.storage.port=9160
titanmr.ioformat.conf.storage.keyspace=titan

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

giraph.minWorkers=2
giraph.maxWorkers=3
giraph.useOutOfCoreGraph=true
giraph.useOutOfCoreMessages=true
mapred.job.tracker=job.tracker.private.ip:9001
mapred.map.child.java.opts=-Xmx1024m
mapred.reduce.child.java.opts=-Xmx1024m
giraph.numInputThreads=4
giraph.numComputeThreads=4
giraph.maxMessagesInMemory=100000
giraph.zkList=zk.private.ip.addresses

Does anyone know why this exception might be happening or how I can resolve it?

Thanks,
Collin
Reply all
Reply to author
Forward
0 new messages