Recommended way to get the ID of the edge with a maximum value

1,227 views
Skip to first unread message

Kelvin Lawrence

unread,
Jul 8, 2016, 2:11:54 PM7/8/16
to Gremlin-users
I have an airline route network where the distance between nodes is stored as a property on the edge called 'dist'. The airport IATA code is stored on each vertex as 'code'. I have been doing some experiments using the Gremlin console and Tinkergraph (3.2.0)

I wanted to write a simple query to return the distance of the longest route between any two airports in the graph. I successfully achieved this with:

r=g.E.hasLabel('route').values('dist').max().next()

However, what I really wanted was to also know a bit about the nodes on each end of that edge as well. My first thought was to get the ID of the edge with the maximum route value so I could then retrieve more details using the ID to avoid looking at multiple edges more than once. So I tried this:

g.E.hasLabel('route').as('e').dist.max.select('e').id

This did not return anything - I assume max() does not allow me to go back with a select()?

So then I did this (which works but is not optimal as it looks at a lot of the edges twice):

r=g.E.hasLabel('route').values('dist').max().next()
g
.E().has('dist',r).bothV().code    

  
So then I tried this which also works and avoids looking at a lot of edges twice but has a potential issue in that if two or more routes share the maximum distance then it will not pick that up:

r=g.E().hasLabel('route').as('e').order().by('dist', decr).limit(1).select('e').id().next()
g
.E(r).dist  // The distance
g
.E(r).bothV.code  // The airport codes

Given what I tried is there a way I could have made max() work or some better way that I did not think of to do this?

Many thanks in advance.
Kelvin

Daniel Kuppitz

unread,
Jul 8, 2016, 3:08:54 PM7/8/16
to gremli...@googlegroups.com
This should do the trick:

g.E().hasLabel("route").order().by("dist", decr).limit(1).bothV()

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/460554a5-f78c-455c-9638-959d13d2d95c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kelvin Lawrence

unread,
Jul 8, 2016, 3:38:15 PM7/8/16
to Gremlin-users
Hi Daniel, thanks for the quick reply.  Unfortunately that will not quite always work (as I tried but maybe did not do well) to explain above for cases where more than one route shares the exact same maximum distance (between two different pairs of airports) . For instance if A->B and C->D both are 8574 miles the query only finds A and B

This is an edge case (no pun intended!) that I actuallly do not have in the real graph right now but I wanted to be sure my query allowed for it.

Kelvin

Daniel Kuppitz

unread,
Jul 8, 2016, 5:40:06 PM7/8/16
to gremli...@googlegroups.com
I'll use the modern sample graph to show the solution:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.E().order().by("weight", decr).store("w").by("weight").filter(values("weight").as("cw").select("w").by(limit(local, 1)).as("mw").where("cw", eq("mw"))).project("from","to","weight").by(outV()).by(inV()).by("weight")
==>[from:v[1], to:v[4], weight:1.0]
==>[from:v[4], to:v[5], weight:1.0]
gremlin>

Hence your query should be:

g.E().hasLabel("route").
  order().by("dist", decr).
  store("d").by("dist").
  filter(values("dist").as("cd").select("d").by(limit(local, 1)).as("md").
    where("cd", eq("md"))).
  project("from","to","dist").by(outV()).by(inV()).by("dist")


Cheers,
Daniel


Kelvin Lawrence

unread,
Jul 8, 2016, 7:09:31 PM7/8/16
to Gremlin-users
Thank's Daniel - that is extremely helpful.

I have to say though that I am not sure I would ever have come up with that without your help! I get the feedback a lot from people that Gremlin is easy to use for very simple things but that some queries that look like they should be simple take a lot of effort to get right and things get difficult for people.

Again thanks so much for your help
Kelvin

Jason Plurad

unread,
Jul 11, 2016, 10:35:55 AM7/11/16
to Gremlin-users
I was trying something along the lines of this. Why is path() information lost when max() is used?

gremlin> g.V().outE().as('e').values('weight').path()
==>[v[1], e[9][1-created->3], 0.4]
==>[v[1], e[7][1-knows->2], 0.5]
==>[v[1], e[8][1-knows->4], 1.0]
==>[v[4], e[10][4-created->5], 1.0]
==>[v[4], e[11][4-created->3], 0.4]
==>[v[6], e[12][6-created->3], 0.2]

gremlin
> g.V().outE().as('e').values('weight').max().path()
==>[1.0]

-- Jason

Daniel Kuppitz

unread,
Jul 11, 2016, 10:48:36 AM7/11/16
to gremli...@googlegroups.com
Marko and I have had this discussion before. We currently treat all reducing barriers the same (min, max, mean, etc.). But they are indeed different when it comes to path computations. Some of them could preserve the path history, others could not. Not sure if we already have a ticket for that.

Cheers,
Daniel


Jason Plurad

unread,
Jul 11, 2016, 12:16:56 PM7/11/16
to Gremlin-users
Thanks, Daniel. I found a couple related tickets with lots of solid reading material:

Some basic mathematical functions / steps
https://issues.apache.org/jira/browse/TINKERPOP-761

Local aggregation should not destroy path
https://issues.apache.org/jira/browse/TINKERPOP-781

-- Jason

Kelvin Lawrence

unread,
Jul 11, 2016, 6:39:59 PM7/11/16
to Gremlin-users
FWIW, I think from an end users point of view it would be nice if what I had initially tried, as it seemed consistent with other things I had learned to do with Gremlin that involve using select() to go back in the path had worked after a max(). Given the gap between that and what it ultimately took with Daniel's help to get to a query that worked I think it is doubly important, that if possible, from an end user's perspective, things that seem simple to do are in fact simple to do.

So, it would be great if you could indeed in a future update make:

g.V().outE().as('e').values('weight').max().path()


return the "expected" path.


Cheers,
Kelvin

Marko Rodriguez

unread,
Jul 11, 2016, 6:44:54 PM7/11/16
to gremli...@googlegroups.com
Hi,

FWIW, I think from an end users point of view it would be nice if what I had initially tried, as it seemed consistent with other things I had learned to do with Gremlin that involve using select() to go back in the path had worked after a max(). Given the gap between that and what it ultimately took with Daniel's help to get to a query that worked I think it is doubly important, that if possible, from an end user's perspective, things that seem simple to do are in fact simple to do.

So, it would be great if you could indeed in a future update make:

g.V().outE().as('e').values('weight').max().path()


return the "expected" path.

For max() and min(), we could preserve the path history. In fact, in this respect, max() and min() would NOT be ReducingBarrierSteps, but in fact be some sort of “barrier” FilterStep.

If you make a ticket, we can think through how this would look internally. Also, if there are other many-to-one steps that could save a meaningful path, please list them in the ticket.

Thanks,
Marko.





Kelvin Lawrence

unread,
Jul 11, 2016, 10:08:26 PM7/11/16
to Gremlin-users
Hi Marko,

I am out of the office this week so Jason kindly opened a ticket for this.

Take care,
Kelvin
Reply all
Reply to author
Forward
0 new messages