How to use dedup() to remove duplicates based on properties from several elements in the traversal

5,654 views

Skip to first unread message

Kevin Schmidt

unread,

Jun 28, 2017, 6:17:14 PM6/28/17

to Gremlin-users

I have a graph where I may have duplicate vertices or edges (not using locking to prevent it) which means my traversals may end up with duplicate paths/results. I can't just use dedup() alone as there are distinct edges/vertices, so I want to do my dedup'ing using properties of those vertices/edges.

I am using as() at various points to save the elements I need later (I use a final select() to get them for my result too) but don't see a way to reference properties from those different elements in a dedup.

My current traversal is bit more complex than this, but this illustrates the basics:

g.V().has('Person', 'myId', '1234').as('e1').bothE().as('r').otherV().as('e2').select('e1','r','e2')

I would like to do my dedup based on specific properties in both 'r' and 'e2'. How can I do that?

It seems like it should be simple and I'm sure it is, I can't find docs/examples that help though.

Thanks!

Robert Dale

unread,

Jun 29, 2017, 12:07:15 AM6/29/17

to gremli...@googlegroups.com

Looks like dedup can take only one by() modulator do you can't do g.V().as('a').outE().as('b').inV().as('c').dedup('a','b','c').by('name').by(label).by('name').

If deduping on a single property per element is sufficient, then you can do something like g.V().as('a').outE().as('b').inV().as('c').select('a','b','c').by('name').by(label).by('name').dedup().select('a','b','c')

gremlin> graph = TinkerFactory.createModern()

==>tinkergraph[vertices:6 edges:6]

gremlin> g = graph.traversal()

==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

// current number of knows relationships

gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').count()

==>2

// add some duplicate vertices

gremlin> g.V().hasLabel('person').as('a').addV('person').property('name',select('a').values('name')).count()

==>4

// add plenty of duplicate relationships

gremlin> g.V().hasLabel('person').as('a').V().hasLabel('person').as('b').addE('knows').from('a').count()

==>64

gremlin> g.V().hasLabel('person').as('a').V().hasLabel('person').as('b').addE('knows').from('a').count()

==>64

// let's count them: 2+64+64 =

gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').count()

==>130

// dedup

gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').select('a','b','c').by('name').by(label).by('name').dedup().count()

==>16

// let's look at the results: it gives only the selected criteria

gremlin> g.V().hasLabel('person').as('a').outE('knows').as('b').inV().hasLabel('person').as('c').select('a','b','c').by('name').by(label).by('name').dedup()

==>[a:marko,b:knows,c:vadas]

==>[a:marko,b:knows,c:josh]

...

// let's get back the elements

gremlin> g.V().hasLabel('person').as('a').as('a1').outE('knows').as('b').as('b1').inV().hasLabel('person').as('c').as('c1').select('a','b','c').by('name').by(label).by('name').dedup().select('a1','b1','c1')

==>[a1:v[1],b1:e[7][1-knows->2],c1:v[2]]

==>[a1:v[1],b1:e[8][1-knows->4],c1:v[4]]

...

Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/8fc01915-cc80-41d3-ba41-d2456e56342c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages