Edge details dropped from vertex object in sparkGraphComputer

63 views
Skip to first unread message

anjani...@gmail.com

unread,
Sep 10, 2020, 6:11:46 AM9/10/20
to JanusGraph users
Hi All,

I am trying get complete data from graph (vertex details and edge details). 
We are using connectedVertexProgram with Spark2.4, able to get vertex details but not edge details.

I see in SparkGraphCompuer, edges are dropped from VertexWritable object before reducer loop.
vertexWritable.get().dropEdges(Direction.BOTH);

After removing above line, still not getting edge details. While debuting i found that edge are present in vertex object till combine stage :
final JavaPairRDD combineRDD = mapReduce.doStage(MapReduce.Stage.COMBINE) ? SparkExecutor.executeCombine(mapRDD, newApacheConfiguration) : mapRDD;

But its getting dropped in reduce stage :
final JavaPairRDD reduceRDD = mapReduce.doStage(MapReduce.Stage.REDUCE) ? SparkExecutor.executeReduce(combineRDD, mapReduce, newApacheConfiguration) : combineRDD;

I see vertex object passed to executeReduce() method has edge details. I noticed edge information are dropped from vertex object while doing groupBy in executeReduce() method.

Appreciate any pointer/suggestions to fix it. 

Thanks,
Anjani

Evgeniy Ignatiev

unread,
Sep 10, 2020, 10:26:09 AM9/10/20
to janusgra...@googlegroups.com

Hi Anjani,

What is the version of JanusGraph you are using?
Can you share some code and configuration to reproduce the issue?

Best regards,
Evgenii Ignatev.

9/10/2020 2:11 PM, anjani...@gmail.com пишет:
--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/27f1da00-043c-4c07-8e69-f2dbeaddf14bn%40googlegroups.com.
-- 
Best regards,
Evgeniy Ignatiev.

HadoopMarc

unread,
Sep 10, 2020, 11:11:26 AM9/10/20
to JanusGraph users
Hi Anjani,

No time to look for this now myself, but I remember a similar issue in the TinkerPop JIRA. I also remember there was a workaround from Daniel Kuppitz by applying the CloneVertexProgram.

HTH,     Marc

Op donderdag 10 september 2020 om 16:26:09 UTC+2 schreef yevgeniy...@gmail.com:

anjani...@gmail.com

unread,
Sep 11, 2020, 2:25:06 AM9/11/20
to JanusGraph users
Hi Evgeniy,

Thanks for response. We are using JanusGraph 0.4
I have not written any custom code, just using available conectedVertexProgram and tinker-pop library. 
I just commented below lines in SparkGraphComputer in tinker-pop as it was dropping edged but then also its not working .
vertexWritable.get().dropEdges(Direction.BOTH);

Please let me know if you need more details.

Thank you.
Anjani

anjani...@gmail.com

unread,
Sep 11, 2020, 2:39:03 AM9/11/20
to JanusGraph users
Hi Marc,
Thanks for response.
I will check TinkerPop Jira to get details.

Thanks,
Anjani 

anjani...@gmail.com

unread,
Sep 11, 2020, 9:33:23 AM9/11/20
to JanusGraph users
Hi Marc, 

I want to fetch connected vertices with vertex properties and edge details.
CloneVertexProgram will provide complete data but i think it will not provide it as  connected components. Please correct me if my understanding is wrong .

Thanks,
Anjani

Abhay Pandit

unread,
Sep 11, 2020, 10:32:24 AM9/11/20
to janusgra...@googlegroups.com
Yes you are right to Anjani CloneVertexProgram will provide complete data but not connected component.

Thanks,
Abhay

HadoopMarc

unread,
Sep 12, 2020, 8:10:35 AM9/12/20
to JanusGraph users
Hi Anjani,

What I tried to convey is not to use CloneVertexProgram instead of the ConnectedVertexProgram, but rather to chain these two VertexPrograms. The relatied JIRA issue I referred to, including an example, is:


HTH,    Marc


Op vrijdag 11 september 2020 om 15:33:23 UTC+2 schreef anjani...@gmail.com:

anjani...@gmail.com

unread,
Sep 14, 2020, 2:34:31 AM9/14/20
to JanusGraph users
Thanks Marc for sharing detail.

Regards,
Anjani

anjani...@gmail.com

unread,
Sep 17, 2020, 11:24:22 AM9/17/20
to JanusGraph users
Hi All,

Thanks for all your inputs.  After doing some more analysis found that in SparkGraphComputer (in tinker-pop library),  vertex object has edge details in RDD till we add result to memory. mapReduce.addResultToMemory(finalMemory, outputRDD.writeMemoryRDD(graphComputerConfiguration, mapReduce.getMemoryKey(), reduceRDD));

writeMemoryRDD is using ouput format as "SequenceFileOutputFormat.class" which calls SequenceFile.classI see vertex object has edge details till SequenceFile.class. Till here vertex object is of type ComputerVertex.

But computerResult object does not have edge details in vertex object. I see in ComputerResult vertex object type is changed to DetachedVertex

return new DefaultComputerResult(InputOutputHelper.getOutputGraph(graphComputerConfiguration, this.resultGraph, this.persist), finalMemory.asImmutable());

I think edges are getting dropped while de-serialising and converting object to DetachedVertex. But i was not able to figure out where its getting converted to   DetachedVertex object.

Below configs i am using:

gremlin.graph: org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader: org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter: org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.defaultGraphComputer: org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

spark.serializer: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.kryo.registrator: org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Appreciate any suggestion/pointer to debug the issue. 

Thanks & Regards,

Anjani



HadoopMarc

unread,
Sep 18, 2020, 2:19:51 AM9/18/20
to JanusGraph users
Hi Anjani,

Your original post started with:

" I am trying get complete data from graph (vertex details and edge details). 
We are using connectedVertexProgram with Spark2.4, able to get vertex details but not edge details. "

The connectedVertexPorgram only adds a component property to the vertices, what does this have to do with "edge details¨? Can you give an example of the output you want, in terms of the TinkerPop modern graph, like in the example in the docs:

I feel we might be looking in the wrong direction.

Best wishes,    Marc

Op donderdag 17 september 2020 om 17:24:22 UTC+2 schreef anjani...@gmail.com:

anjani...@gmail.com

unread,
Sep 18, 2020, 2:43:52 AM9/18/20
to JanusGraph users
Hi Marc,

Thanks for response. I want to fetch all connected components with their edge details from Graph. For example say node A and node B are connected with a edge E1, then i want to create output like :
 
   {node A attributes },
   {node B attributes},
   {edge E1 details}
}

For using connectedVertexProgram and able to get node details in computerResult but not edge details.  In map-reduce stage in SparkGraphComputer class, 
we are creating mapRDD, combineRDD, reduceRDD. I tried to read reduceRDD and see vertex object was of type ComputerVertex and have vertex & edge details in it.
 Finally reduceRDD is written to memory:
mapReduce.addResultToMemory(finalMemory, outputRDD.writeMemoryRDD(graphComputerConfiguration, mapReduce.getMemoryKey(), reduceRDD));

writeMemoryRDD is using ouput format as "SequenceFileOutputFormat.class" which calls SequenceFile.classI see vertex object has edge details till SequenceFile.class. Till here vertex object is of type ComputerVertex.

After that computerResult is returned :  return new DefaultComputerResult(InputOutputHelper.getOutputGraph(graphComputerConfiguration, this.resultGraph, this.persist), finalMemory.asImmutable());

But computerResult object does not have edge details in vertex object. I see in ComputerResult vertex object type is changed to DetachedVertex

I think edges are getting dropped while de-serialising and converting object to DetachedVertex. But i was not able to figure out where its getting converted to   DetachedVertex object.

Thanks,
Anjani

HadoopMarc

unread,
Sep 18, 2020, 4:02:44 AM9/18/20
to JanusGraph users
Hi Anjani,

OK, more explicitly, is this what you are looking for:

gremlin> g = TinkerFactory.createModern().traversal().withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().
    connectedComponent().
        with(ConnectedComponent.propertyName, 'component').
    group().
        by('component').
        by(project('vertex', 'outedges').
               by(valueMap(true)).
               by(outE().valueMap(true).fold()).fold())
           
==>[1:[
    [vertex:[id:2,label:person,component:[1],name:[vadas],age:[27]],outedges:[]],
    [vertex:[id:1,label:person,component:[1],name:[marko],age:[29]],outedges:[[id:9,label:created,weight:0.4],[id:7,label:knows,weight:0.5],[id:8,label:knows,weight:1.0]]],
    [vertex:[id:3,label:software,component:[1],name:[lop],lang:[java]],outedges:[]],
    [vertex:[id:4,label:person,component:[1],name:[josh],age:[32]],outedges:[[id:10,label:created,weight:1.0],[id:11,label:created,weight:0.4]]],
    [vertex:[id:5,label:software,component:[1],name:[ripple],lang:[java]],outedges:[]],
    [vertex:[id:6,label:person,component:[1],name:[peter],age:[35]],outedges:[[id:12,label:created,weight:0.2]]]
]]

Best wishes,    Marc



Op vrijdag 18 september 2020 om 08:43:52 UTC+2 schreef anjani...@gmail.com:
Reply all
Reply to author
Forward
0 new messages