Get or Create Vertex

1,745 views
Skip to first unread message

peter...@gmx.de

unread,
Mar 18, 2016, 9:19:34 AM3/18/16
to Gremlin-users
Hi,

I am currently trying to solve a problem in our application that results in duplicate vertices. For this reason I want to implement a kind of getOrCreate() query in Gremlin that verifies whether a vertex already exists before adding it to the graph.
In our application, we have indexed properties that are (or should be) unique per vertex label. So we can use these properties to check for an already existing vertex to avoid duplicates. For persons, the property can be name for example.

So I thought of something like this:

g.V().choose(has('person','name','marco').hasNext(), has('person','name','marco'), addV(T.label, 'person', 'name', 'marco'))

which should either return the already existing vertex or the newly created one. (In the end I would return the ID of the vertex, so I can add edges afterwards.)

But this doesn't work since:
No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.choose() is applicable for argument types: (java.lang.Boolean, org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal, org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal)

It seems that I understood the choose step completely wrong as it doesn't expect a boolean value and it seems to be evaluated for every vertex when I don't apply filter steps in front of it. But how can I filter the vertices before knowing if the vertex in question exists? Or should I use a completely different approach here? Would it make sense to use groovy for the if/else logic and then send multiple smaller Gremlin queries?


Best regards and thanks in advance for any suggestions,
Peter

Daniel Kuppitz

unread,
Mar 18, 2016, 10:28:29 AM3/18/16
to gremli...@googlegroups.com
You really shouldn't use choose() for this use-case. Your query (if you would have done it right) would do a full graph scan. Try this instead:

g.V().has('person','name','marco').tryNext().orElseGet {
   g.addV('person').property('name', 'marco').next()
}

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/e98e3e44-ba49-4ae1-8401-ba4148c55354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Plurad

unread,
Mar 18, 2016, 10:30:02 AM3/18/16
to Gremlin-users
Hi Peter,

How about something like this:

g.V().has('name','peter').tryNext().orElse(g.addV('name','peter'))


-- Jason

pShah

unread,
Mar 18, 2016, 11:01:11 AM3/18/16
to Gremlin-users
Daniel,
  Why did you put ".next()" at the end?
 
  Is it required?

peter...@gmx.de

unread,
Mar 18, 2016, 11:45:11 AM3/18/16
to Gremlin-users
Cool, thanks to both of you! That is exactly what I was looking for :-)

I didn't even know about the tryNext(), orElse() and orElseGet() step.

Daniel Kuppitz

unread,
Mar 18, 2016, 11:46:26 AM3/18/16
to gremli...@googlegroups.com
Why did you put ".next()" at the end?

It is. Otherwise it would get vertex or else get a traversal.

Cheers,
Daniel

peter...@gmx.de

unread,
Mar 22, 2016, 9:13:31 AM3/22/16
to Gremlin-users
I just tested both of your queries, but I ran into problems with both of them.

Jasons query works fine but has the problem with the missing next() that Daniel mentioned. As soon as I insert this next(), the traversal in the orElse() part seems to get executed every time, irrespective of whether the vertex already exists or not. But I have to add the next() as I want to return the id of the vertex to be able to add edges afterwards.

I had to adjust Daniels query, as addV() expects an even number of arguments:

g.V().has('person','name','marco').tryNext().orElseGet {
   g.addV(T.label,'person','name', 'marco').next()
}

But then I get the following exception:

No signature of method: java.util.Optional.orElseGet() is applicable for argument types: (com.thinkaurelius.titan.graphdb.vertices.StandardVertex) values: [v[10035208240]]
Possible solutions: orElseGet(java.util.function.Supplier), orElse(java.lang.Object)

I don't know if it's important, but I use Titan 1.0.0 and I tried the queries in the Gremlin console.

Daniel Kuppitz

unread,
Mar 22, 2016, 9:44:04 AM3/22/16
to gremli...@googlegroups.com
No signature of method: java.util.Optional.orElseGet() is applicable for argument types: (com.thinkaurelius.titan.graphdb.vertices.StandardVertex) values: [v[10035208240]]

There is no argument of type StandardVertex in the method call I was showing. Can you show what you did? I suspect you did something like this:

g.V().has('person','name','marco').tryNext().orElseGet(g.addV(T.label,'person','name', 'marco').next())

Note that the argument should be a lambda.

Cheers,
Daniel


peter...@gmx.de

unread,
Mar 22, 2016, 10:03:00 AM3/22/16
to Gremlin-users
Yes, that is exactly what I did. g.addV(T.label,'person','name', 'marco').next() returns a StandardVertex, which (probably) causes the exception.

Note that the argument should be a lambda.
 
So you mean that I have to put it in {}? It works now when I try it that way:

g.V().has('person','name','marco').tryNext().orElseGet({g.addV(T.label,'person','name', 'marco').next()})

But isn't there another way that doesn't require a lambda? I would like to avoid lambdas completely if possible for the reasons described here: http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/#a-note-on-lambdas

Daniel Kuppitz

unread,
Mar 22, 2016, 10:22:22 AM3/22/16
to gremli...@googlegroups.com
Note that this is not a lambda within a traversal, hence it won't affect any traversal strategies. If you still don't want to use it, then split it into 2 statements:

def t = g.V().has('person', 'name', 'marco')
t.hasNext() ? t.next() : g.addV(T.label, 'person','name', 'marco').next()

Cheers,
Daniel


peter...@gmx.de

unread,
Mar 23, 2016, 11:36:43 AM3/23/16
to Gremlin-users
Thank you very much Daniel! I now use the version with the lambda and seems to work perfectly fine :-)

henrik....@modiviz.com

unread,
Mar 25, 2016, 12:55:47 PM3/25/16
to Gremlin-users
It seems counter-intuitive that orElse() executes unconditionally (something I confirmed on 3.1.1 backed by Neo4j) after a tryNext().
While I was able to solve my similar problem by using orElseGet() with a lambda as outlined here, it seems odd to not have 'else' semantics in orElse().

Thoughts?
/Henrik

Daniel Kuppitz

unread,
Mar 25, 2016, 1:12:07 PM3/25/16
to gremli...@googlegroups.com
orElse() takes a value and Java/Groovy will evaluate the method to get the return value, thus it's not counter-intuitive at all.

gremlin> hello = { name -> msg = "Hello ${name}!"; println "Constructed Hello message: ${msg}"; msg }
==>groovysh_evaluate$_run_closure1@68759011

gremlin> test = { message -> /* don't do anything with the message */}

==>groovysh_evaluate$_run_closure1@7fd4acee

gremlin> test(hello("Gremlin"))

Constructed Hello message: Hello Gremlin!
==>null

I really wouldn't expect something else. On the other hand, with clousres/lambdas you get this:

gremlin> helloLambda = { name -> { -> msg = "Hello ${name}!"; println "Constructed Hello message: ${msg}"; msg} }
==>groovysh_evaluate$_run_closure1@28a0fd6c

gremlin> testLambda = { messageLambda -> /* don't do anything with the message */}
==>groovysh_evaluate$_run_closure1@18a3962d

gremlin> testLambda(helloLambda("Gremlin"))
==>null

See? It has nothing to do with Gremlin, it's simply the way how compilers work.

Cheers,
Daniel


Count Rodrigo

unread,
Jan 13, 2017, 6:37:38 AM1/13/17
to Gremlin-users
Does using the tryNext().orElse() code still result in two roundtrip RPC calls to the server or does it get executed in one single request server side? I am interested in particular when using withRemote() in Java and connecting to a remote Gremlin server.

Thanks!

Kevin Gallardo

unread,
Jan 13, 2017, 9:45:50 AM1/13/17
to Gremlin-users
Hi,

Does using the tryNext().orElse() code still result in two roundtrip RPC calls to the server or does it get executed in one single request server side? I am interested in particular when using withRemote() in Java and connecting to a remote Gremlin server.

Using tryNext().orElseGet() will result in multiple roundtrips/transactions to the database (with both an embedded graph database like Titan, or with a withRemote() interface to a Gremlin Server/other database). The only way to avoid that would be to have a `getOrCreate` syntax within the GraphTraversal API itself, that the Graph database backend would handle itself when facing such syntax. When dealing with a local TinkerGraph tryNext().orElseGet is not a problem because it is local, but as soon as you are using a remote database you can't currently get around avoiding multiple roundtrips. I had suggested adding such an option in the Traversal API which was: 

Would syntaxes for conditional updates/inserts be easier if the steps like addV()/addE()/property() were taking an additional boolean argument indicating a get-or-create logic in the insertion?

Here's a suggestion:
  • always create a new Vertex and add the properties (what's happening right now): 
    g.addV("mylabel").property("name", "me")

  • get-or-create is false, always create, same behaviour as above, (what's happening right now):
    g.addV("mylabel", false).property("name", "me", false)

  • the true on addV() means "go into getOrCreate mode". Whether a new vertex will be created or not is decided by which of the following .property() call has the boolean to true or not. When a .property() has true, it will be added to the condition of creation. Here, we would go look for a vertex that has the label "mylabel" and a property "name" set to "me". If it exists, use it for the rest of the traversal, if not, create a new one and add the property "name", "me" to it:
    g.addV("mylabel", true).property("name", "me", true)

  • look for a vertex that has the property ("name", "me") and ("surname", "Me"), if not found, create a vertex with both these properties:
    g.addV("mylabel", true).property("name", "me", true).property("surname", "Me", true)

  • look only for a vertex that has the property ("name", "me"), if not found create one, in all cases add the property ("surname", "Me") to it. (.property("surname", "Me) is equivalent to .property("surname", "Me", false), get-or-create is false -> always create): 
    g.addV("mylabel", true).property("name", "me", true).property("surname", "Me")

Applying the same logic to addE() one could add 2 vertices with properties and an edge between the two if none, or all exist:
 
g.addV("mylabel", true).property("name", "me", true).property("surname", "Me").as("v1") // will always return a vertex, whether it's new or not
 
.addV("mylabel", true).property("name", "he", true).property("surname", "Me").as("v2") // will always return a vertex, whether it's new or not
 
.addE("knows", true).property("since", "born", true).from("v1").to("v2") // The Edge will only be added if there wasn't one with the property ("since", "born") already existing (would mean that the 2 vertices were already existing as well)

Any feedback welcomed.

Robert Dale

unread,
Jan 13, 2017, 9:59:55 AM1/13/17
to gremli...@googlegroups.com

Robert Dale

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/5e2ed63d-905a-44e4-87b3-39cb693687f2%40googlegroups.com.

Kevin Gallardo

unread,
Jan 13, 2017, 10:53:31 AM1/13/17
to Gremlin-users
Didn't know, thanks.

Robert Dale

Reply all
Reply to author
Forward
0 new messages