Hi there,
Firstly as a new joiner to this discussion group, I should say many thanks for contributing this excellent framework!
I have been experimenting over the past couple of weeks with using Gremlin to handle some graph traversal algorithms that attempt to filter possible paths through a graph using a number of different metrics. As an example consider the problem of finding all paths through a graph that do not contain a loop.
To do this I started experimenting with the following trivial graph.
g = new TinkerGraph()
a = g.addVertex()
b = g.addVertex()
c = g.addVertex()
d = g.addVertex()
// now create a simple path through the graph
g.addEdge(a, b, "path", [:])
g.addEdge(b, c, "path", [:])
g.addEdge(c, d, "path", [:])
// finally create a loop in the graph
g.addEdge(c, d, "path", [:])
After a lot of reading of code/docs and tinkering I figured this should do what I wanted:
g.v(0).out.simplePath().loop(2){it.loops < 5}{true}.path()
But when I ran this the simplePath filter does not work as I'd have expected. This expression emits paths that are clearly circular:
==>[v[0], v[1]]
==>[v[0], v[1], v[2]]
==>[v[0], v[1], v[2], v[3]]
==>[v[0], v[1], v[2], v[3]]
==>[v[0], v[1], v[2], v[0]]
==>[v[0], v[1], v[2], v[0], v[1]]
I could easily be doing something wrong but I suspect it's a bug - I'll come to this at the end.
After further tinkering I found a much more clumsy method of achieving basically what I wanted by recreating the logic of simplePath as a filter
g.v(0).out.loop(1){it.loops < 10}{true}.path().filter{new HashSet(it).size() == it.size()}
running this in gremlin gives me (almost right, I can live with the duplicate last element)
==>[v[0], v[1]]
==>[v[0], v[1], v[2]]
==>[v[0], v[1], v[2], v[3]]
==>[v[0], v[1], v[2], v[3]]
However this version does not scale for larger graphs and deeper traversals from the initial vertex as using a post filter means gremlin traverses all paths(!) then filters them down to the valid set. I'd rather stop traversing paths as soon as I discover a path contains a circular reference.
On further debug I found what I think is a bug in the handling of paths when combined with loop. It means that simplePath only see's a few steps back in the path so cannot detect cycles properly. I've written a test and submitted a patch [1] to a forked version of the pipes framework which seems to fix the bug. When I use this in gremlin instead it allows the initial expression to work as expected.
As a novice to this framework I think the patch could definitely be improved, there are a couple of untidy edges but I have added a test as part of this patch to assert the behaviour and all the remaining tests in the pipes framework still pass, I haven't run a wider regression test though.
Please let me know if this solution looks valid or if there is an existing way of achieving what I want that I've missed.
Kind regards,
Dave