Avoid duplicate vertex and edge creation

1,596 views
Skip to first unread message

Amit Chandak

unread,
Feb 24, 2018, 1:17:27 AM2/24/18
to Gremlin-users
Hi,
     I am using Gremlin with Neo4j plugin (Tinkerpop 3.3.1). I want to avoid creating duplicate nodes/edges. For eg

gremlin> g.addV("sample")
==>v[8]
gremlin> g.V()
==>v[8]
gremlin> g.V().label()
==>sample
gremlin> g.addV("sample")
==>v[9]
gremlin> g.V().label()
==>sample
==>sample

Its ends up creating 2 nodes with the same label. In my graph, i am treating label as unique, so want only ONE vertex with that label.

Thanks
Amit

HadoopMarc

unread,
Feb 24, 2018, 2:53:39 PM2/24/18
to Gremlin-users
Hi Amit,

Labels are intended to be used to discriminate between different types of vertices. Each type of vertex can have different properties. The label() step is then an easy mechamism to select vertices that have a specific property you want to know about.

In your case you can simply use g.V().id() to get values that are uniqe for each vertex.

HTH,     Marc

Op zaterdag 24 februari 2018 07:17:27 UTC+1 schreef Amit Chandak:

Amit Chandak

unread,
Feb 24, 2018, 3:19:39 PM2/24/18
to Gremlin-users
Thanks Marc, i guess what i am looking for, is how can i avoid adding duplicate vertices/edges? One brute force way would be to read the entire graph and do some sort of mark and sweep and only send the deltas. But that will be quite expensive, so was looking for update (with insert) kind of functionality.

Thanks
Amit

HadoopMarc

unread,
Feb 25, 2018, 5:51:43 AM2/25/18
to Gremlin-users
Hi Amit,

OK, I think section 9.1.1.1 of the JanusGraph docs could be useful for you. An ordinary index (whether in JanusGraph or in TinkerGraph) is not expensive to query either ( O(logN) vs O(N) for a table scan), so you could use that too.

Cheers,   Marc

Op zaterdag 24 februari 2018 21:19:39 UTC+1 schreef Amit Chandak:

Tim Schultz

unread,
Feb 26, 2018, 9:50:28 PM2/26/18
to Gremlin-users
Hi Amit,

Computing graph deltas and applying update/delete routines is something I'm still trying to wrap my head around.  But I was working on this same problem recently - figured I'd give a suggestion and see what people think.

The nodes and edges I'm working with contain sets of metadata with varying degrees of complexity.

Ex: for edges I capture things like "certainty" (double), "source" (string), "creator" (string), etc.  It's one thing to see if an edge exists between two node IDs.  It gets more complicated when you also want to consider these values associated with each edge.  Perhaps I find the same relationship between the exact same nodes, but from different sources with different weights.  They're not equivalent in this case and I want to capture both.

I basically do an MD5 hash of the edge metadata structure and append an "e_md5" property to it at the time of edge creation (also capture timestamps for when it was created and updated).  If/when this relationship is seen a second time for that given source, if the MD5 hash is the same - it considers it unchanged from the previous load and does not re-add the edge.

Maybe there's something equivalent already implemented that I'm not aware of and I'm reinventing the wheel.  It's probably more taxing on storage, and I have yet to see how well it scales on extremely large graphs.

Just a thought - interested in any feedback.

Thanks!

Tim

Anamika Singh

unread,
Nov 29, 2021, 6:35:58 AM11/29/21
to Gremlin-users
Hi Amit,

Did you founda ny solution to this, if yes please post it.

Stephen Mallette

unread,
Nov 30, 2021, 8:53:20 AM11/30/21
to gremli...@googlegroups.com
The solution is to just use coalesce() in the standard pattern you typically see for "get or create" patterns:

g.V().coalesce(hasLabel('unique-label'),addV('unique-label'))

Of course, i think the point HadoopMarc made in the answer from long ago is important. Typically, the label is not meant to be unique. It is meant to categorize elements. If you have a unique value it should either be the id or otherwise a unique property.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/46e72cdf-c2b6-439f-8694-c84688d8e597n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages