upsert() operations

280 views
Skip to first unread message

Stephen Mallette

unread,
Dec 3, 2020, 10:10:28 AM12/3/20
to gremli...@googlegroups.com
One of the most common patterns in use today in Gremlin must be the one developed by Daniel Kuppitz for "get or create" sorts of operations where you effectively:

fold().coalesce(unfold(), <create>)

There is so much documentation out there on this pattern right now, but if you aren't familiar you might start by looking at:

https://stackoverflow.com/a/49758568/1831717

While it is a nice pattern, it can be verbose and unintuitive to new Gremlin users. Dave Bechberger proposed on a dev list thread to codify this pattern into a single step that would try to cover most of the major use cases that we've come across. It's a reasonably long discussion but if you're interested it can be found here:


The culmination of that discussion ended in the following proposal:


and separately leaves open the idea for a merge() step which had some discussion prior to this:


At this point, we thought it would be good to see if there were any thoughts from the wider TinkerPop Community here on gremlin-users regarding this upsert() step. Are there use cases that we are missing? Is the syntax understandable and easier to follow than the fold()/coalesce()/unfold() pattern? Any other ideas?


HadoopMarc

unread,
Dec 3, 2020, 11:05:16 AM12/3/20
to Gremlin-users
Great work. While studying the API and the underlying discussions, I wondered the following.

Why did you prefer:

upsertV(String label, Map matchOrCreateProperties, Map additionalProperties)

over:

upsertV(String label, Map matchOrCreateProperties).propertyMap(Map additionalProperties)  ?

Related to this, with three parameter to upsert() the number of method signatures explodes and the very last example at the bottom does not seem to be covered by the method signatures under API.

Op donderdag 3 december 2020 om 16:10:28 UTC+1 schreef spmal...@gmail.com:

Stephen Mallette

unread,
Dec 4, 2020, 8:13:18 AM12/4/20
to gremli...@googlegroups.com
Thanks for the reply

> with three parameter to upsert() the number of method signatures explodes

I'm not sure we ever considered your suggested syntax, but you call attention to something that has bugged me a bit and I haven't been able to quite put my finger on what it was until now. I need to think about it some more but I think it might be more "Gremlin" to have a modulator syntax you propose rather than the heavy overload list. I'm not sure that we would repurpose propertyMap() to that end but perhaps some other form works. 

> the very last example at the bottom does not seem to be covered by the method signatures under API.

I deleted that last example from the gist. I think we agreed somewhere in that thread that it was not a direction to go and I must have forgotten to remove it. If you want some kind of sophisticated match for the upsert you may need to revert back to fold/coalesce/unfold pattern. Dave has been generally advocating for a solution that covers 90% of use cases with the idea that for the 10% you would fall back to more complex Gremlin. I've come to agree with that approach I think.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/57e29f8d-e04f-42ed-b3ec-66001422d3cbn%40googlegroups.com.

Phil Crosland

unread,
Dec 4, 2020, 10:11:37 AM12/4/20
to Gremlin-users
great - a much needed items. One thing I'd be interested in is a edge move operation - one thing I do a bit is to have an active version of an item and an edge linking that to the parent data grouping. 
Maybe I am missing something or a pattern but being able to move to the in vertex instead of delete/create an edge would be useful.
eg project-[current sub item edge ]->subitemA to be project-[current sub item edge]->subitemB 

David Bechberger

unread,
Dec 4, 2020, 3:29:33 PM12/4/20
to gremli...@googlegroups.com
Hello,

>with three parameter to upsert() the number of method signatures explodes

The 3 signature method probably should have the properties named as below:

upsertV(Traversal label, Traversal matchOrCreateProperties, Traversal additionalCreateProperties)

The purpose of this signature is to provide additional properties to add to the vertex only if it is created.  So in the current syntax something like this:

g.V().has('person','name','marko'). fold(). coalesce(unfold(), addV('person'). property('name','marko'). property('age',29)).property('foo', 'bar')

would be expressed as this in the new syntax:

g.upsertV('person', has('name','marko'), property('age',29)).property('foo', 'bar')

If the vertex already exists then only the 'foo' property would be set.  If the vertex is created then the 'name', 'age', and 'foo' properties will be set.

I am also completely in favor of figuring out a way to pass a map of properties to a step tto save them (maybe properties()) as the one at a time `property()` method is very verbose.

Dave


HadoopMarc

unread,
Dec 6, 2020, 6:13:31 AM12/6/20
to Gremlin-users
Hi,
Indeed, my use of propertyMap() was confusing, I am still busy leaving the valueMap() era.

A difference between Stephen's and David's code examples is the yes/no repetition of the mathOrCreateProperties in the additionalCreateProperties. I think "additional" already suggests no repetition.

David nicely explains the meaning of the additionalCreateProperties, but this does not address the explosion of method signatures. If I put everything together, I arrive at:

g.upsertV('person', ['name': 'marko']).with(['age',29]).property('foo', 'bar')

g.upsertV('person', has('name','marko')).with(property('age',29)).property('foo', 'bar')

This assumes the use of the with() modulator belongs the the preceding upsertV() step unambiguously (that is, when applying all traversal strategies during traversal execution the with() step cannot end up modulating some other step).

Marc

Op vrijdag 4 december 2020 om 21:29:33 UTC+1 schreef David Bechberger:

Nicolas Trangosi

unread,
Dec 7, 2020, 12:41:39 PM12/7/20
to gremli...@googlegroups.com
Hi,
I do not understand in which case a traversal for label is useful ? Even if we want to match on multiple labels with for instance __.hasLabel("label1", "label2") , then which label is used in case of creation ?

Also does:
g.upsertV('person', [name: 'marko'], 
                    [name: 'marko', age: 29])
and
g.upsertV('person', [name: 'marko'], 
                    [age: 29])
are equivalent ?

Nicolas




--

  

Nicolas Trangosi

Lead back

+33 (0)6 77 86 66 44      

   



Ce message et ses pièces jointes peuvent contenir des informations confidentielles ou privilégiées et ne doivent donc pas être diffusés, exploités ou copiés sans autorisation. 
Si vous avez reçu ce message par erreur, veuillez le signaler a l'expéditeur et le détruire ainsi que les pièces jointes. 
Les messages électroniques étant susceptibles d'altération, DCbrain décline toute responsabilité si ce message a été altéré, déformé ou falsifié. Merci. 

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, DCbrain is not liable for messages that have been modified, changed or falsified. Thank you.

David Bechberger

unread,
Dec 7, 2020, 10:59:41 PM12/7/20
to gremli...@googlegroups.com
Yes thsoe two would equivilent but the example I provided was not really the best.  

The main driver behind that signature is to support scenarios where that provide a stream of data as shown here: https://gist.github.com/spmallette/5cd448f38d5dae832c67d890b576df31#upsert-with-stream

Dave



Stephen Mallette

unread,
Dec 8, 2020, 3:06:35 PM12/8/20
to gremli...@googlegroups.com
I do think HadoopMarc has a point here:
 
David nicely explains the meaning of the additionalCreateProperties, but this does not address the explosion of method signatures. If I put everything together, I arrive at:

g.upsertV('person', ['name': 'marko']).with(['age',29]).property('foo', 'bar')

g.upsertV('person', has('name','marko')).with(property('age',29)).property('foo', 'bar')

though I'm not sure what to do about it. I'd be concerned about using with() in this context because that modulator is meant for "configuration" of step behavior rather than supplying standard operational arguments. We would also be adding two new with() signatures with that proposal which I don't think have value outside of this particular usage at this time. The by() modulator is better suited for this situation I think and would satisfy the second example:

g.upsertV('person', has('name','marko')).by(property('age',29)).property('foo', 'bar')

I had actually considered proposing that a while back, but it doesn't have a solution for a Map argument. Ultimately, that's what bugs me about upsert() in general as it introduces Map without trying to think of how Map fits as a whole. If we had by(Map) to allow:

g.upsertV('person', ['name': 'marko']).by(['age',29]).property('foo', 'bar')

then where else would that modulator overload find use? I'm not even sure by() flows properly in this style. by() doesn't imply to me that these are "additional properties" and therefore isn't terribly intuitive. In that sense, I think that despite the method signature explosion, I think I prefer Dave's proposed syntax more. 

Bringing this discussion to the user list has made me realize that we have more thinking to do on this syntax and perhaps we need to head back to the dev list a bit to refine the gist/plan further. Please see the updated gist given some of the discussion here and on the dev list:


I think that in going back to the dev list we should think about a bigger picture. In other words, if we bring in Map as an argument to Gremlin how will the introduction of this new argument apply to other existing steps. Surely addV/E() will look odd without something analogous and what other steps might introduce themselves or improve as a result of Map and the syntax that builds around it? 





Stephen Mallette

unread,
Dec 8, 2020, 3:57:48 PM12/8/20
to gremli...@googlegroups.com
As fast as I sent that last post, this idea came to me for using by() more nicely:

// match on name and age
g.upsertV('person', [name:'marko',age:29])

// match on name only
g.upsertV('person', [name:'marko',age:29]).by('name')

// explicitly match on name and age
g.upsertV('person', [name:'marko',age:29]).
  by('name').by('age')

// match on id only
g.upsertV('person', [(T.id): 100, name:'marko',age:29]).by(T.id)

// match on whatever the by(Traversal) predicate defines
g.upsertV('person', [name:'marko',age:29]).
  by(has('name', 'marko'))

// match on id, then update age
g.upsertV('person', [(T.id): 100, name:'marko']).by(T.id).
  property('age',29)


Interestingly, that change on its own makes me wonder about upsertV() a bit. Is upsertV() really just addV(String label, Map properties).by()?? I'll admit it makes addV() do "more" than perhaps it's name entails but I thought it was worth pointing out. Modulating a step might be better than naming up a new one, but as I alluded to in my last post we'll see where the wider discussion on the topic of mutations will go on the dev list. I just thought I'd post this idea here since it directly addresses the issues HadoopMarc is concerned about.


HadoopMarc

unread,
Dec 9, 2020, 4:48:23 AM12/9/20
to Gremlin-users
@ Stephen, your latest upsertV().by() or addV().by() proposal is really elegant!

Marc

Op dinsdag 8 december 2020 om 21:57:48 UTC+1 schreef spmal...@gmail.com:

Dave Bechberger

unread,
Dec 9, 2020, 9:14:33 PM12/9/20
to gremli...@googlegroups.com
I’ll switch this conversation back to the dev list after I’ve thought this through all the ramifications but I’m liking the proposed use of by() to simplify this. 

Dave Bechberger


On Dec 9, 2020, at 12:48 AM, HadoopMarc <bi...@xs4all.nl> wrote:



Shay Nehmad

unread,
Dec 10, 2020, 11:17:39 AM12/10/20
to Gremlin-users
I'd like to expand on our use case. In our case, some property upsert logic is rather complex - not just an overwrite. We're using the Python language variant, so the examples are Pythonic.

For different properties, we need different property upsert logic. For example, we have a property that indicates duration. If it already exists, we don't want to just overwrite, we want to sum the values. We use the following code:

```
traversal_to_append_to.property(prop_name, union(values(prop_name), constant(int(prop_value))).sum())
```

We have similar use cases for min and max as well. 

We also have use cases in which one property depends on another. For example, we want to update the "name" property only if the "update_time" property is bigger than what exists:

```
traversal_to_append_to.as_("original_traversal").choose(
            values(self.time_prop_name).is_(P.gt(current_log_time)),
            __.select("original_traversal"),
            __.select("original_traversal").property(prop_name, prop_value),
        )
```

Seems to me like adding "native" support for other property upsert flows that aren't overwrite will be very useful. I imagine the interface could look something like 
// match on id, then sum duration

g.upsertV('person', [(T.id): 100, name:'marko']).by(T.id).
  property('duration',100,PropertySum)

Stephen Mallette

unread,
Dec 11, 2020, 11:09:16 AM12/11/20
to gremli...@googlegroups.com
Thanks for sharing your use case. It's an interesting idea actually. I'll adjust your example slightly as the more I stare at this the more I think "upsert" is a new step and it's just addV/E(Map).by() to get the upsert logic:

  g.addV('person', [(T.id): 100, name:'marko']).by(T.id).
     property('duration',100,PropertySum)  

I wonder if the definition you're looking for here is just some form of:

property(String, Object, Operator)

which would then be like:

g.addV('person', [(T.id): 100, name:'marko']).by(T.id).
   property('duration',100,sum)    

Haven't thought it all through but it looks interesting. I think booleans end up interesting that way as this:

gremlin> g.addV().property('enabled',false)
==>v[0]
gremlin> g.V().property('enabled',coalesce(has('enabled',true).constant(false),constant(true))).elementMap()
==>[id:0,label:vertex,enabled:false]
gremlin> g.V().property('enabled',coalesce(has('enabled',true).constant(false),constant(true))).elementMap()
==>[id:0,label:vertex,enabled:true]
gremlin> g.V().property('enabled',coalesce(has('enabled',true).constant(false),constant(true))).elementMap()
==>[id:0,label:vertex,enabled:false]


changes to something like:

g.V().property('enabled',false,and)

interesting that under the covers our current property(k,v) would then really be:

property(k,v,assign)

ha - does that really work as nicely as all that?





Shay Nehmad

unread,
Feb 11, 2021, 9:33:52 AM2/11/21
to Gremlin-users
Hey, any updates on this feature? 

I'd love to get involved in the development as well if I can help expedite it.

Stephen Mallette

unread,
Feb 11, 2021, 9:43:47 AM2/11/21
to gremli...@googlegroups.com
Discussion went back to the dev list. You might want to read though that thread to get a better grip on where things are. 


I'm not sure we really settled on the full syntax for it yet. There remain some things to sort out before any development can begin iirc. Thanks for offering to contribute. I'd say that the best way to do so on this particular issue would be to try to help move the conversation forward on the dev list given where we left off. 



Reply all
Reply to author
Forward
0 new messages