gremlin> r = new Random()==>java.util.Random@35d5ac51gremlin> directions = ['north','south','east','west']==>north==>south==>east==>westgremlin> g = TinkerGraph.open().traversal()==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]gremlin> g.addV().property(id, 1).as('1').......1> addV().property(id, 2).as('2').......2> addV().property(id, 3).as('3').......3> addV().property(id, 4).as('4').......4> addV().property(id, 5).as('5').......5> addE('connection').from('1').to('2').property('direction', directions[r.nextInt(directions.size())]).......6> addE('connection').from('2').to('4').property('direction', directions[r.nextInt(directions.size())]).......7> addE('connection').from('4').to('5').property('direction', directions[r.nextInt(directions.size())]).......8> addE('connection').from('2').to('3').property('direction', directions[r.nextInt(directions.size())]).......9> addE('connection').from('3').to('4').property('direction', directions[r.nextInt(directions.size())]).iterate()
gremlin> g.V(1).......1> repeat(outE().inV().simplePath()).......2> until(hasId(5)).......3> path()==>[v[1],e[0][1-connection->2],v[2],e[1][2-connection->4],v[4],e[2][4-connection->5],v[5]]==>[v[1],e[0][1-connection->2],v[2],e[3][2-connection->3],v[3],e[4][3-connection->4],v[4],e[2][4-connection->5],v[5]]gremlin> g.V(1).......1> repeat(outE().inV().simplePath()).......2> until(hasId(5)).......3> path().......4> by().......5> by('direction')==>[v[1],north,v[2],east,v[4],east,v[5]]==>[v[1],north,v[2],east,v[3],west,v[4],east,v[5]]
gremlin> cost = ['north': 3, 'south': 2, 'east': 1, 'west': 2]==>north=3==>south=2==>east=1==>west=2gremlin> g.withSideEffect('cost', cost).V(1).......1> repeat(outE().inV().simplePath()).......2> until(hasId(5)).......3> path().......4> by().......5> by(values('direction').as('d').......6> union(identity(),......7> select('cost').select(select('d'))).fold())==>[v[1],[north,3],v[2],[east,1],v[4],[east,1],v[5]]==>[v[1],[north,3],v[2],[east,1],v[3],[west,2],v[4],[east,1],v[5]]
gremlin> g.withSack(0).......1> withSideEffect('cost', cost).V(1).......2> repeat(outE().sack(sum).......3> by(values('direction').as('d').......4> select('cost').select(select('d'))).......5> inV().simplePath()).......6> until(hasId(5)).......7> path().......8> by().......9> by(values('direction').as('d')......10> union(identity(),.....11> select('cost').select(select('d'))).fold())......12> project('path','cost')......13> by()......14> by(sack())==>[path:[v[1],[north,3],v[2],[east,1],v[4],[east,1],v[5]],cost:5]==>[path:[v[1],[north,3],v[2],[east,1],v[3],[west,2],v[4],[east,1],v[5]],cost:7]
gremlin> g.withSack(0).......1> withSideEffect('cost', cost).V(1).......2> repeat(outE().sack(sum).......3> by(values('direction').as('d').......4> select('cost').select(select('d'))).......5> inV().simplePath()).......6> until(hasId(5)).......7> path().......8> by().......9> by(values('direction').as('d')......10> union(identity(),.....11> select('cost').select(select('d'))).fold())......12> order()......13> by(sack(), incr)......14> limit(1)==>[v[1],[north,3],v[2],[east,1],v[4],[east,1],v[5]]gremlin> g.withSack(0).......1> withSideEffect('cost', cost).V(1).......2> repeat(outE().sack(sum).......3> by(values('direction').as('d').......4> select('cost').select(select('d'))).......5> inV().simplePath()).......6> until(hasId(5)).......7> path().......8> order().......9> by(sack(), incr)......10> limit(1)==>[v[1],e[0][1-connection->2],v[2],e[1][2-connection->4],v[4],e[2][4-connection->5],v[5]]gremlin> g.withSack(0).......1> withSideEffect('cost', cost).V(1).......2> repeat(flatMap(outE().sack(sum).......3> by(values('direction').as('d').......4> select('cost').select(select('d'))).......5> inV()).simplePath()).......6> until(hasId(5)).......7> path().......8> order().......9> by(sack(), incr)......10> limit(1)==>[v[1],v[2],v[4],v[5]]
I really need to run (or wrap) the query with(in) multiple starting nodes
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/8a88fdea-75a8-4d15-aa90-4997afc42965%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to gremli...@googlegroups.com.
g.withSack(0).withSideEffect('cost', cost).V(1).repeat(flatMap(outE().sack(sum).
by(values('direction').as('d').
select('cost').select(select('d'))).
or(__.not(select('m').unfold()),select('m').select('cost').project('current','min').by(sack()).by().where('current', lt('min'))).inV()).simplePath()).until(hasId(5).group('m').by(constant('cost')).by(sack().min())).path().limit(1)
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/2ff4ec55-fd2c-4b58-ba6a-49a136c36f8b%40googlegroups.com.
gm.withSack(0).withSideEffect('cost', cost).V().as('a').
repeat(outE().sack(sum).by(values('direction').as('dt').select('cost').select(select('dt'))).
inV().as('a').simplePath()).
until(hasId(within(3,4,5))).
select(all, 'a').map(unfold().id().fold()).group().by(sack()).unfold().
order().by(keys, incr) //.limit(1)
==>1=[[4, 5], [4, 3]]
==>100=[[1, 4]]
==>10000=[[1, 3], [6, 3]]
// but because I want ALL the shortest paths and don't really care about the actual cost itself
order().by(keys, incr).limit(1).select(values).unfold()
==>[4,5]
==>[4,3]
Does that look ok? It works fine on my own graph, I'm just crippled by having to evaluate every possible path in full
Thanks in advance
p.s. is it possible to do an initialising group('m') step before going into the repeat loop to avoid doing that or()...unfold() test on every iteration?
I do not want the edges returned and cannot work out how to skip them
I also hear path() is a cpu hog
Does that look ok? It works fine on my own graph, I'm just crippled by having to evaluate every possible path in full
is it possible to do an initialising group('m') step before going into the repeat loop to avoid doing that or()...unfold() test on every iteration?
.withSideEffect("m", ["cost":Integer.MAX_VALUE])
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/9e4396bf-54b7-418a-8970-68b547df224c%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAD4-su%2BwaBwJeruLStkm8Wkxj04kkC_jej-oG88GFugxELSySg%40mail.gmail.com.
g.withSack(0).withSideEffect('m', ['cost': Integer.MAX_VALUE]).
withSideEffect('cost', cost).V().as('a').repeat(outE().sack(sum).
by(values('direction').as('d').select('cost').select(select('d'))).
filter(select('m').select('cost').
project('current','min').by(sack()).by().where('current', lt('min'))).
inV().simplePath().as('a')).until(hasId(5).group('m').by(constant('min')).by(sack().min())).group().by(sack()).by(select(all, 'a').fold()).select(select('m').select('min')).unfold()
what has changed in between versions?
I expected (hoped) that I would only see the first and last entry here as the second and third would not have passed the where() test.
gm.withSack(0).withSideEffect('m', ['min': Integer.MAX_VALUE]).
withSideEffect('cost', cost).V().as('a').
repeat(outE().sack(sum).by(values('direction').as('d').
select('cost').select(select('d'))).
filter(select('m').select('min').
project('current','min').by(sack()).by().where('current', lte('min'))).
inV().simplePath().as('a')).until(hasId(3).group('m').by(constant('min')).by(sack().min())).
map(union(path(), sack()).fold())
gremlin> gm.withSack(0).......1> withSideEffect('m', ['min': Integer.MAX_VALUE]).......2> withSideEffect('cost', cost).V().as('a').......3> repeat(outE().sack(sum).......4> by(values('direction').as('d').......5> select('cost').select(select('d'))).......6> filter(select('m').select('min').......7> project('current','min').......8> by(sack()).......9> by()......10> where('current', lte('min')))......11> inV().simplePath().as('a'))......12> until(hasId(3).group('m').by(constant('min')).by(sack().min()))......13> map(union(path(), sack()).fold())==>[[v[1],e[9][1-created->3],v[3]],1]==>[[v[6],e[12][6-created->3],v[3]],1]
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/dd8bb9ed-3a64-4f2a-845c-0d0ac8fc119b%40googlegroups.com.
Hi Daniel,
Thanks for your help again. The good news is that this works and I am able to find solutions that were previously timing out. However, the time taken on my graph ranges from a second when solution(s) are found around 9 edges out to nearly a minute when a solution is at least 18 edges out. I'm looking for something that is closer to a second each time! There are only around 4000 vertices in this particular graph, so for other graphs it could be a huge problem!
Here's the tail end of a profile showing where the time was lost in that slower traversal (the values for the easier query are approximately 1/100th the value for each major step):
SackValueStep(sum,[JanusGraphPropertiesStep([... 1582816 1582816 18466.441
JanusGraphPropertiesStep([DependencyType],v... 1582816 1582816 7648.156
SelectOneStep(last,cost) 1582816 1582816 1891.864
TraversalSelectStep([SelectOneStep(last,d)]) 1582816 1582816 4942.446
TraversalFilterStep([SelectOneStep(last,m), P... 404393 404393 20234.997
SelectOneStep(last,m) 1582816 1582816 3234.110
SelectOneStep(last,min) 1582816 1582816 3052.630
ProjectStep([current, min],[[SackStep, Prof... 1582816 1582816 7969.661
SackStep 1582816 1582816 1200.949
WherePredicateStep(current,lte(min)) 3135.707
EdgeVertexStep(IN) 404393 404393 613.888
PathFilterStep(simple)@[a] 247777 247777 2279.732
RepeatEndStep 2 2 1904.180
TraversalMapStep([UnionStep([[PathStep, Profile... 2 2 15.625 0.03
UnionStep([[PathStep, ProfileStep, EndStep, P... 4 4 0.076
PathStep 2 2 0.013
EndStep 2 2 0.012
SackStep 2 2 0.003
EndStep 2 2 0.004
FoldStep 2 2 5.346
>TOTAL - - 53267.709
Looking at the graph as a tree, I guess the problem is that the traversal is effectively doing a breadth-first search, so when it's hit a target at the 18th edge on the given path, every other possible path has already been expanded up to at least the 17th edge(?). At that point we set the 'min' constant and short-circuit other paths that exceed this, but most of the damage has been done in that most if not all of those paths exceeded this min cost some time before.
I hoped to optimise this with some kind of pre-query 'greedy search' that would set a reasonable (possibly optimal) initial 'min' value, but I don't think it could work. Shame, as I re-ran the 52 sec query replacing Integer.MAX_VALUE with a value I knew was only slighter bigger than the optimal one, and it brings the time down to a reasonable 2 secs.
So that leaves me looking to implement something like dijkstra's shortest path algorithm. I wouldn't know where to start in Gremlin. I think it would require data structures that could have entries removed/popped, not just added/pushed as sack() and store() seem to be. Sad thing is, I could do this quite easily in something like C# by quickly sucking in the 4000 nodes, but then what is the point in having a graph database in the first place?
Any ideas, please?
Looking at the graph as a tree
If it's a tree, then your traversal should start at the leaf nodes, because then the maximum number of paths equals the number of start vertices.
I guess the problem is that the traversal is effectively doing a breadth-first search
You can run into the same problem if it would be DFS (if a target is only found in any of the last possible paths).
so when it's hit a target at the 18th edge on the given path, every other possible path has already been expanded up to at least the 17th edge(?).
data structures that could have entries removed/popped, not just added/pushed as sack() and store() seem to be.
Yeah, that's a big thing missing in Gremlin.
Any ideas, please?Well, if it's really a tree, then starting from the opposite side should give you your desired sub-second results.
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/4ade6e91-40a1-4159-b4a7-746c1bb30834%40googlegroups.com.