On the fly edge creation

173 views
Skip to first unread message

Olav Laudy

unread,
Jan 9, 2018, 5:09:04 PM1/9/18
to Gremlin-users
Hello,


Pardon my novice question here. I'm looking for create edges based on equalities in the nodes.

A simple example: 

I have a set of nodes labeled as 'fruit' and the property 'name' equal to the fruit name. 

For example:

g.addV('fruit').property('name','apple'))
g.addV('fruit').property('name','orange'))


Next, I have a set of nodes labels as 'people, the property 'name' equal to the person's name and property 'like' equal to one (and one only) particular fruit. 

For example:

g.addV('person').property('name','john').property('like','apple'))
g.addV('person').property('name','jane').property('like','orange'))


I'm struggling to create a query where I use .addE to create an edges between the person and its fruit. 


Thanks!


Olav



Kelvin Lawrence

unread,
Jan 9, 2018, 6:20:33 PM1/9/18
to Gremlin-users
See if this helps Olav, not fruits but airports but the pattern you need is here.

graph=TinkerGraph.open()
g=graph.traversal()
g.addV('airport').property('code','AUS').as('aus').
addV('airport').property('code','DFW').as('dfw').
addV('airport').property('code','LAX').as('lax').
addV('airport').property('code','JFK').as('jfk').
addV('airport').property('code','ATL').as('atl').
addE('route').from('aus').to('dfw').
addE('route').from('aus').to('atl').
addE('route').from('atl').to('dfw').
addE('route').from('atl').to('jfk').
addE('route').from('dfw').to('jfk').
addE('route').from('dfw').to('lax').
addE('route').from('lax').to('jfk').
addE('route').from('lax').to('aus').
addE('route').from('lax').to('dfw')

Olav Laudy

unread,
Jan 9, 2018, 6:55:29 PM1/9/18
to Gremlin-users
Hi Kelvin,

I tremendously enjoy your book!!! This is a great help!

Is there a way too use a .each kind of syntax so I don't have to enumerate all edges manually?

In my example, the person likes a fruit and there's a fruit vertex. Somehow I'm looking to say:

For each fruit, look at all the persons who like that fruit and create a link between that fruit and the person.



Olav

Daniel Kuppitz

unread,
Jan 9, 2018, 9:18:35 PM1/9/18
to gremli...@googlegroups.com
Hi Olav,

here you go:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('fruit').property('name','apple')
==>v[0]
gremlin> g.addV('fruit').property('name','orange')
==>v[2]
gremlin> g.addV('person').property('name','john').property('like','apple')
==>v[4]
gremlin> g.addV('person').property('name','jane').property('like','orange')
==>v[7]
gremlin> g.V().hasLabel('person').as('p').
           V().hasLabel('fruit').as('f').
             where('p', eq('f')).
               by('like').
               by('name').
             addE('likes').
               from('p')
==>e[10][4-likes->0]
==>e[11][7-likes->2]
gremlin> g.V().hasLabel('person').outE().inV().path().by('name').by(label)
==>[john,likes,apple]
==>[jane,likes,orange]

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/87425f2b-c7d7-4a7c-8302-22fc946413d6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Olav Laudy

unread,
Jan 9, 2018, 9:20:17 PM1/9/18
to gremli...@googlegroups.com

SUPER-AWESOME!!!!

Thank you so much!


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/heIgIX6ZveA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CA%2Bf9seXNKdfHc49XLLQQS%3D%3DDDBtFE_rYzPOoAC6dg-nG%2BQXWnA%40mail.gmail.com.

Olav Laudy

unread,
Jan 10, 2018, 4:12:37 PM1/10/18
to Gremlin-users
I've been trying all morning to get one extension to work:

what if the person's preference is a multi-property?


gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('fruit').property('name','apple')
==>v[0]
gremlin> g.addV('fruit').property('name','orange')
==>v[2]
gremlin> g.addV('person').property('name','john').property('like','apple').property('like','orange')
==>v[4]
gremlin> g.addV('person').property('name','jane').property('like','orange')
==>v[7]

This query:

gremlin> g.V().hasLabel('person').as('p').
           V().hasLabel('fruit').as('f').
             where('p', eq('f')).
               by('like').
               by('name').
             addE('likes').
               from('p')


now results in:

('g','Multiple properties exist for the provided key, use Vertex.properties(like)')


Thank you for your time and consideration!









To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Daniel Kuppitz

unread,
Jan 10, 2018, 6:45:14 PM1/10/18
to gremli...@googlegroups.com
In this case, you'll have to unfold the values first.

gremlin> g.V().hasLabel('person').as('p').
......1>     values('like').as('fn').
......2>   V().hasLabel('fruit').as('f').
......3>     where('fn', eq('f')).
......4>       by().
......5>       by('name').
......6>     addE('likes').
......7>       from('p')
==>e[11][4-likes->0]
==>e[12][4-likes->2]
==>e[13][8-likes->2]
gremlin> g.V().hasLabel('person').outE().inV().path().by('name').by(label)
==>[john,likes,apple]
==>[john,likes,orange]
==>[jane,likes,orange]

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/36237674-c1b2-4bf7-a799-c5402fbec9e0%40googlegroups.com.

Olav Laudy

unread,
Jan 11, 2018, 12:06:13 PM1/11/18
to gremli...@googlegroups.com

Amazing how powerful!!!



For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/heIgIX6ZveA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CA%2Bf9seVbCf8Y2M4RuCESkXokX%3DHsA6m9a%3D0t2ONeWj1easQ13g%40mail.gmail.com.

Olav Laudy

unread,
Jan 11, 2018, 7:26:11 PM1/11/18
to Gremlin-users
Hello,

Although I understand the query, I'm still confused about the use of variables.

Say, I extend your query to add a property:

g.V().hasLabel('person').as('p').
           V().hasLabel('fruit').as('f').
             where('p', eq('f')).
               by('like').
               by('name').
             addE('likes').
               from('p').property('test','p')

In this case, 'p' is added as a literal property. How is the use of 'p' in the property definition different from  from('p') definition you use? 

Does my question make sense?


Thanks!










On Tuesday, January 9, 2018 at 7:18:35 PM UTC-7, Daniel Kuppitz wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Daniel Kuppitz

unread,
Jan 11, 2018, 10:20:38 PM1/11/18
to gremli...@googlegroups.com
by(), from() and to() are modulators that interpret a String as a reference to a named step. If you want to use another element's property value in property(k, v), you can do so:

g.V().hasLabel('person').as('p').
  V().hasLabel('fruit').as('f').
    where('p', eq('f')).
      by('like').
      by('name').
    addE('likes').
      from('p').
      property('test', select('p').by('aPersonProperty'))

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/8856b028-047a-419d-a10f-ef2b86992eac%40googlegroups.com.

Daniel Kuppitz

unread,
Jan 11, 2018, 10:26:29 PM1/11/18
to gremli...@googlegroups.com
by()from() and to() are modulators that interpret a String as a reference to a named step

Sorry, strike that by(), that was meant to be where(x, predicate(y)).

Cheers,
Daniel

Olav Laudy

unread,
Jan 17, 2018, 10:15:19 PM1/17/18
to Gremlin-users
I'm running into some timeout issues in Amazon Neptune that are due to slow matching on properties:


The following is a simplification of what we are trying to achieve:

We have many 'observation' nodes:

g.addV('observation') 
  .property('type','apple')
  .property('obs_date','2018-01-14')
  .property('color','green')

 
g.addV('observation') 
  .property('type','apple')
  .property('obs_date','2018-01-15')
  .property('color','green')

 
 
g.addV('observation') 
  .property('type','orange')
  .property('obs_date','2018-01-15')
  .property('color','orange')
  .property('country','spain')

 
Some characteristics are constant within the type (in this case: color and country, while 'obs_date' varies per observation).

First, I create 'object' nodes: one node for each type:

 
g.V().hasLabel('observation')
        .dedup().by('type').as('observation')
        .addV('object')
        .property('name',
               select('observation')
                    .by('type'))

This creates the following nodes:

g.addV('object').property('name','apple')
g.addV('object').property('name','orange')


Since not all properties are the same for all objects, I can't copy them while creating the nodes (or I don't know how to). So, the next query -and this is where the timeout happens - is used to copy properties:

g.V().hasLabel('observation')
         .has('color').dedup().by('name').as('obs') 
    .V().hasLabel('object').as('object') 
                 .where('obs',eq('object')) 
                 .by('type')
                 .by('name') 
                 .property('color',
                        select('obs').by('color'))

This is done for every characteristic in turn. 

In terms of node counts:

With some reductions, there are currently 
-250K vertices in the database
-140K are of type 'observations'
-8K are of type 'object.

As such, the query that we are trying to do:
- look for 140K nodes
- dedup() them => give me a random node of each type, which results in a list of 8K objects
- look which object has the same name => search in an 8K list (and do this 8K times).
- when the match is found, assign the relevant property over from the 'observation' to the 'object' node.

So, in the loop, there's 64M comparisons done. If a lookup (find the right entry out of 8K entries) takes 50ms, then the list of 8K entries should be resolved in 8k*50ms= 400 secs ~7 mins. As such, I feel this should be more or less doable. 

Can you please comment on the lookup process?


Thanks!

Daniel Kuppitz

unread,
Jan 18, 2018, 1:10:58 PM1/18/18
to gremli...@googlegroups.com
Hi Olav,

you can do it all in a single iteration:

g.V().hasLabel('observation').
  group().
    by('type').
    by(valueMap().fold()).
  unfold().select(values).as('v').
  addV('object').as('o').
  select('v').unfold().unfold().as('kv').
  select('o').property(set,
      select('kv').select(keys).
        choose(__.is('type'), constant('name')),
      select('kv').select(values).unfold())

The result for your small sample graph will be:

gremlin> g.V().hasLabel('object').valueMap()
==>[obs_date:[2018-01-15],country:[spain],color:[orange],name:[orange]]
==>[obs_date:[2018-01-14,2018-01-15],color:[green],name:[apple]]

That looks like the expected result, but note, that this query will only work once. If you execute it once more, you'll end up having duplicated object nodes. Let me know if that's an issue and I can try to come up with a query that prevents the duplication. I won't be a simple coalesce() since that would again lead to expensive scan operations.

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/8494054b-fccf-4232-8ce7-b6fcb463d6d5%40googlegroups.com.

Olav Laudy

unread,
Jan 22, 2018, 3:38:30 PM1/22/18
to Gremlin-users
Wow! It took me about an hour to take this query apart and fully appreciate it! I learned a lot! Thank you so much!!

Olav Laudy

unread,
Jan 25, 2018, 11:28:50 AM1/25/18
to Gremlin-users
Hi Daniel,


I see that the query above is executed for as many times as there are properties. Yet, the first time, the query runs, all the properties are already copied:

This is the result from observation 'orange' only:

['Vertex:',
 {'id': 'b6b09711-b7b4-e14a-98ae-5a60560640d9',
  'label': 'object',
  'properties': {'color': 'orange',
   'country': 'spain',
   'name': 'orange',
   'obs_date': '2018-01-15'}},
 'Vertex:',
 {'id': 'b6b09711-b7b4-e14a-98ae-5a60560640d9',
  'label': 'object',
  'properties': {'color': 'orange',
   'country': 'spain',
   'name': 'orange',
   'obs_date': '2018-01-15'}},
 'Vertex:',
 {'id': 'b6b09711-b7b4-e14a-98ae-5a60560640d9',
  'label': 'object',
  'properties': {'color': 'orange',
   'country': 'spain',
   'name': 'orange',
   'obs_date': '2018-01-15'}},
 'Vertex:',
 {'id': 'b6b09711-b7b4-e14a-98ae-5a60560640d9',
  'label': 'object',
  'properties': {'color': 'orange',
   'country': 'spain',
   'name': 'orange',
   'obs_date': '2018-01-15'}}]








Olav

Daniel Kuppitz

unread,
Jan 25, 2018, 1:13:39 PM1/25/18
to gremli...@googlegroups.com
I am not sure what you're trying to say here. Is it good or bad? :) And what is your observation based on?
The query is actually supposed to run only once, not once for each property.

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/d676cd95-a418-47cc-9ade-6baa2055e25d%40googlegroups.com.

Olav Laudy

unread,
Jan 25, 2018, 1:34:11 PM1/25/18
to Gremlin-users
Hi Daniel,

The query works as advertised, however, it looks like it runs as many times as there are properties. That's some overhead and I wondered if that was as intended. Since you mention it should run once:

I create the query in Python as a string and send it via REST to Neptune (Gremlin 3.3.0). When creating edges/nodes, the object that is returned is always the full set of vertices/edges + properties. I know in the console this is shortened to display the ID only.

I wrote a function in Python that takes that the raw return (like: {'@type': 'g:List', '@value': [{'@type': 'g:Int64', '@value': 303456}]}) and unpacks it into a more human readable format. That's how I showed you the edges for object 'orange' that is created 4 times.








Olav

Daniel Kuppitz

unread,
Jan 25, 2018, 2:24:50 PM1/25/18
to gremli...@googlegroups.com
Hi Olav,

I still don't get it. Are you saying that you run the query for every vertex you're inserting? That, of course, would be a huge overhead. You should either run it once when you're done inserting all vertices OR once for each new observation vertex, but then the query needs to be tweaked and check for existing vertices.

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/9b4990aa-9bf9-4f59-b1f0-9173c579a48e%40googlegroups.com.

Olav Laudy

unread,
Jan 25, 2018, 2:43:08 PM1/25/18
to Gremlin-users
Hi,

I run the exact query that you provided, on the vertices that I provided, and it runs for as many times as there are properties per object.

Olav


Daniel Kuppitz

unread,
Jan 25, 2018, 3:12:48 PM1/25/18
to gremli...@googlegroups.com
I believe you're talking about the number of results..? Of course, this number matches the number of results, but that that doesn't mean that the query runs this many times, it only runs once. But ultimately the query's last step is to copy all properties and that's what the query emits. If that's what bothers you, then just append an .iterate() step (which would be smart anyway, as I don't think you really care about those return values, so the result serialization would only be an unnecessary overhead).

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/fe36a8d2-7ba2-47b9-b1db-5a4ff98d34e1%40googlegroups.com.

Olav Laudy

unread,
Jan 26, 2018, 12:35:37 AM1/26/18
to gremli...@googlegroups.com

Ah! Thank you for the clarification!



For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/heIgIX6ZveA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CA%2Bf9seVsi_CfW4BwyoVHa32gzPcjXpdXQSKqofmux6yJ0-12aQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages