Duplicate edge detection using Gremlin (without simplePath())

192 views
Skip to first unread message

Angshuman Sarkar

unread,
Mar 11, 2022, 1:51:38 PM3/11/22
to Gremlin-users
Is there a way to detect and skip results if there is a duplicate edge in the result set? simplePath() applies to all objects - i.e. it will detect duplicate nodes as well. For example g.V().as('a').select('a').simplePath() would not return any results. Our cycle detection demands that we only do cycle detection using edges. 

Stephen Mallette

unread,
Mar 15, 2022, 8:10:26 AM3/15/22
to gremli...@googlegroups.com
Maybe you just need dedup() in relation to edges? Not sure, if I fully understand exactly what you're looking for. Perhaps you could supply a fully fledged example.

On Fri, Mar 11, 2022 at 1:52 PM Angshuman Sarkar <sendt...@gmail.com> wrote:
Is there a way to detect and skip results if there is a duplicate edge in the result set? simplePath() applies to all objects - i.e. it will detect duplicate nodes as well. For example g.V().as('a').select('a').simplePath() would not return any results. Our cycle detection demands that we only do cycle detection using edges. 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/0b720548-c8ab-4e19-980b-a78fa5eb1595n%40googlegroups.com.

Angshuman Sarkar

unread,
Mar 30, 2022, 2:37:09 AM3/30/22
to Gremlin-users
Let me use two examples here - the first one does not return a result because simplePath() is used with select() which repeats the object. However we need to do this because of perf reasons. So we need a way to do this query along with cycle detection, but we cannot use simplePath.What we really need is duplicate edge detection inside a single iterator result. 

g.
addV().property(id,'a').as('a').property('pk', '1').property('temp', 49).property('pressure', 999).
addV().property(id,'b').as('b').property('pk', '1').property('temp', 50).property('pressure', 1000).
addV().property(id,'c').as('c').property('pk', '1').property('temp', 51).property('pressure', 1001).
addE('knows').from('b').to('a').
addE('knows').from('b').to('c')

This looks like (A)<-(B)->(C)

The query that does not return any result is - 

g.V().has(id, 'b').as('X').out().has('temp', 49).as('Y').select('X').out().has('pressure', 1001).as('Z').simplePath()



Here is another example - this one actually gives the correct results using simplePath(). However due to the limitations mentioned above we cannot use simplePath() even in this case.


g.
addV().property(id,'a').as('a').property('pk', '1').
addV().property(id,'b').as('b').property('pk', '1').
addV().property(id,'c').as('c').property('pk', '1').
addV().property(id,'d').as('d').property('pk', '1').
addV().property(id,'e').as('e').property('pk', '1').
addV().property(id,'f').as('f').property('pk', '1').
addV().property(id,'g').as('g').property('pk', '1').
addV().property(id,'h').as('h').property('pk', '1').
addE('knows').from('a').to('b').
addE('knows').from('b').to('c').
addE('knows').from('b').to('d').
addE('knows').from('d').to('e').
addE('knows').from('d').to('f').
addE('knows').from('e').to('g').
addE('knows').from('f').to('h')


GRAPH.png

The query I am trying to run is this - 

g.V().has(id, 'd').as('X').bothE().OtherV().as('Y').bothE().otherV().As('Z').simplePath()

This returns a,c,g,h. Without simplePath this would return a,c,d,g,h. Can we get the same result (a,c,g,h) without using simplePath()? In this case there is actually edes like - [b - d] [d - e ] [d- f] that can be traversed twice because there are two bothE statements in this case.

Stephen Mallette

unread,
Mar 31, 2022, 7:02:31 AM3/31/22
to gremli...@googlegroups.com
There are probably multiple ways to avoid traversing back to "d" without simplePath()? how about just filtering out the starting vertex when you try traversing back over bothE()?

gremlin> g.V().has(id, 'd').as('X').bothE().otherV().as('Y').bothE().otherV().where(neq('X')).as('Z')
==>v[g]
==>v[h]
==>v[c]
==>v[a]



Stark Arya

unread,
Aug 18, 2022, 5:11:27 AM8/18/22
to Gremlin-users
@sendt, @spmal; I think we really need to provide a step to Filter edge for a single path,  Aliyun GDB has some user cases need this ability。

As an example, Neo4j's view : Cypher makes use of relationship isomorphism for path matching, and you can find reference here:neo4j path uniqueness 

But how every, if we just use current steps, we could also accomplished this requirements, but so ugly:
In TinkerFactory.createModern() Graph, you just add one edge: g.addE("fake").from(V("5")).to(V("4"))
The dsl could get v1, v4, v5, v4, v3 through two different edge between v4-v5;


gremlin> g.V(1).repeat(bothE().otherV()).until(hasId(3).or().loops().is(gte(4))).hasId(3).path().as("p").flatMap(unfold().group().by(choose(hasLabel("person", "software"), __.constant("v"), __.constant("e")))).select("e").as("origin_e").dedup(local).as("dedup_e").filter(select("dedup_e", "origin_e").by(count(local)).where("dedup_e", eq("origin_e"))).select("p")
==>[v[1],e[9][1-created->3],v[3]]
==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5],e[0][5-fake->4],v[4],e[11][4-created->3],v[3]]
==>[v[1],e[8][1-knows->4],v[4],e[0][5-fake->4],v[5],e[10][4-created->5],v[4],e[11][4-created->3],v[3]]

Kelvin Lawrence

unread,
Aug 18, 2022, 10:59:38 AM8/18/22
to Gremlin-users
Are you looking for "global" scope or per traverser scope?

Global scope can be achieved using

gremlin> g.V(44).outE().store('e').inV().inE().where(without('e')).count()
==>802
gremlin> g.V(44).outE().store('e').inV().inE().count()
==>806  

If you need per traverser scope

gremlin> g.V(44).outE().sack(assign).inV().inE().as('x').filter(sack().unfold().where(neq('x'))).count()
==>802 


Stark Arya

unread,
Aug 18, 2022, 9:52:11 PM8/18/22
to Gremlin-users
@kelvin, @spmal the Adjacent two edge dedup may seems easy(also g.V(44).outE().store('e').inV().inE().where(without('e')).count() the e is global variable and take effects for all paths).

Let's think a general problem:  if we need to print all the path, and need each path have many edges and vertexs, we just need to make sure the edges in the path not dedup, but not restrict the vertexs, how to fullfill it. Last night I think a way as Following:

The edge dedup  solution:
gremlin> g.withSack {['~~']}{it.clone()}.V(1).repeat(bothE().as("e").filter(__.sack().as("list").select("e").where(without("list"))).sack{m,v -> m += v; m}.otherV()).times(2).path()
==>[v[1],e[9][1-created->3],v[3],e[11][4-created->3],v[4]]
==>[v[1],e[9][1-created->3],v[3],e[12][6-created->3],v[6]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]

==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
==>[v[1],e[8][1-knows->4],v[4],e[0][5-fake->4],v[5]]

The edge may dedup solution for contrast: 
gremlin> g.withSack {['~~']}{it.clone()}.V(1).repeat(bothE().as("e").sack{m,v -> m += v; m}.otherV()).times(2).path()
==>[v[1],e[9][1-created->3],v[3],e[9][1-created->3],v[1]]
==>[v[1],e[9][1-created->3],v[3],e[11][4-created->3],v[4]]
==>[v[1],e[9][1-created->3],v[3],e[12][6-created->3],v[6]]
==>[v[1],e[7][1-knows->2],v[2],e[7][1-knows->2],v[1]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]

==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
==>[v[1],e[8][1-knows->4],v[4],e[0][5-fake->4],v[5]]
==>[v[1],e[8][1-knows->4],v[4],e[8][1-knows->4],v[1]

Reply all
Reply to author
Forward
0 new messages