How to use dedup() to remove duplicates based on properties from several elements in the traversal

5,504 views
Skip to first unread message

Kevin Schmidt

unread,
Jun 28, 2017, 2:17:14 PM6/28/17
to Gremlin-users
I have a graph where I may have duplicate vertices or edges (not using locking to prevent it) which means my traversals may end up with duplicate paths/results.  I can't just use dedup() alone as there are distinct edges/vertices, so I want to do my dedup'ing using properties of those vertices/edges.

I am using as() at various points to save the elements I need later (I use a final select() to get them for my result too) but don't see a way to reference properties from those different elements in a dedup.

My current traversal is bit more complex than this, but this illustrates the basics:

g.V().has('Person', 'myId', '1234').as('e1').bothE().as('r').otherV().as('e2').select('e1','r','e2')

I would like to do my dedup based on specific properties in both 'r' and 'e2'.  How can I do that?

It seems like it should be simple and I'm sure it is, I can't find docs/examples that help though.

Thanks!

Robert Dale

unread,
Jun 28, 2017, 8:07:15 PM6/28/17
to gremli...@googlegroups.com
Looks like dedup can take only one by() modulator do you can't do g.V().as('a').outE().as('b').inV().as('c').dedup('a','b','c').by('name').by(label).by('name').

If deduping on a single property per element is sufficient, then you can do something like  g.V().as('a').outE().as('b').inV().as('c').select('a','b','c').by('name').by(label).by('name').dedup().select('a','b','c')


gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

// current number of knows relationships
gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').count()
==>2

// add some duplicate vertices
gremlin> g.V().hasLabel('person').as('a').addV('person').property('name',select('a').values('name')).count()
==>4

// add plenty of duplicate relationships
gremlin> g.V().hasLabel('person').as('a').V().hasLabel('person').as('b').addE('knows').from('a').count()
==>64
gremlin> g.V().hasLabel('person').as('a').V().hasLabel('person').as('b').addE('knows').from('a').count()
==>64

// let's count them: 2+64+64 = 
gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').count()
==>130

// dedup
gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').select('a','b','c').by('name').by(label).by('name').dedup().count()
==>16

// let's look at the results: it gives only the selected criteria
gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').select('a','b','c').by('name').by(label).by('name').dedup()
==>[a:marko,b:knows,c:vadas]
==>[a:marko,b:knows,c:josh]
...

// let's get back the elements
gremlin> g.V().hasLabel('person').as('a').as('a1').outE('knows').as('b').as('b1').inV().hasLabel('person').as('c').as('c1').select('a','b','c').by('name').by(label).by('name').dedup().select('a1','b1','c1')
==>[a1:v[1],b1:e[7][1-knows->2],c1:v[2]]
==>[a1:v[1],b1:e[8][1-knows->4],c1:v[4]]
...



Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/8fc01915-cc80-41d3-ba41-d2456e56342c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages