Find a bug about repeat and dedup

81 views
Skip to first unread message

Taonan

unread,
Dec 2, 2024, 5:24:26 AM12/2/24
to Gremlin-users
It seems that there is something wrong when both repeat and dedup step exist:
If unroll the repeat-step, result will be changed.

gremlin> g = traversal().withEmbedded(TinkerFactory.createModern())
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).hasLabel("person").repeat(both().dedup()).times(2)
==>v[1]
==>v[6]
==>v[5]
gremlin> g.V(1).hasLabel("person").both().dedup().both().dedup()
==>v[1]
==>v[4]
==>v[6]
==>v[5]
==>v[3]

my gremlin-console version is apache-tinkerpop-gremlin-console-3.6.4

Ken Hu

unread,
Dec 3, 2024, 8:32:32 PM12/3/24
to Gremlin-users
The reason you are seeing this behavior is due to the "globalness" of the dedup() step. dedup() is a step that can take a scope, which by default is "global". In this case, the DedupGlobalStep maintains a set of traversers that passes through it and filters out previously seen traversers. repeat(both().dedup()) uses the same DedupGlobalStep whereas both().dedup().both().dedup() will create two separate DedupGlobalSteps.

The output of g.V(1).hasLabel("person").both() is v[3], v[2], v[4]. On the first pass of both().dedup(), these three vertices are added to the set and will be filtered out in future passes.

This is why the end result is v[1], v[6], v[5]. Its because its filtering out v[3], v[2], v[4] from the first pass.

This can be a bit of a sharp edge when it comes to using the repeat() step with a step that is either a barrier or takes a scope.

Taonan

unread,
Dec 3, 2024, 9:39:07 PM12/3/24
to Gremlin-users

Thank you for your explanation; I now understand. 

This implies that it's best to avoid using barrier steps (such as range or dedup) within a repeat step. 

However, I believe the difference in behavior before and after unrolling the repeat step can be confusing for users. 

Is there an effective solution to make using repeat step more intuitive?

Ken Hu

unread,
Dec 4, 2024, 6:41:16 PM12/4/24
to Gremlin-users
I agree that there can be some confusing interactions among steps inside a repeat() but there isn't really a way to make that more intuitive right now. We can probably add some more documentation around repeat() to make users more aware of situations like the one you pointed out. Something else that complicates things is the "RepeatUnrollStrategy". It is a default strategy that will automatically unroll the repeat() for you but it doesn't apply when the DedupGlobalStep is inside the repeat().
Reply all
Reply to author
Forward
0 new messages