Re: Efficient detection of circular paths during traversal

838 views
Skip to first unread message

David Savage

unread,
May 18, 2013, 11:04:40 AM5/18/13
to gremli...@googlegroups.com
Damm, how annoying, I meant to say:

// finally create a loop in the graph
g.addEdge(c, a, "path", [:])

On Saturday, May 18, 2013 3:59:30 PM UTC+1, David Savage wrote:
Hi there,

Firstly as a new joiner to this discussion group, I should say many thanks for contributing this excellent framework!

I have been experimenting over the past couple of weeks with using Gremlin to handle some graph traversal algorithms that attempt to filter possible paths through a graph using a number of different metrics. As an example consider the problem of finding all paths through a graph that do not contain a loop. 

To do this I started experimenting with the following trivial graph.

g = new TinkerGraph()
a = g.addVertex()
b = g.addVertex()
c = g.addVertex()
d = g.addVertex()

// now create a simple path through the graph
g.addEdge(a, b, "path", [:])
g.addEdge(b, c, "path", [:])
g.addEdge(c, d, "path", [:])

// finally create a loop in the graph
g.addEdge(c, d, "path", [:])

After a lot of reading of code/docs and tinkering I figured this should do what I wanted:

g.v(0).out.simplePath().loop(2){it.loops < 5}{true}.path()

But when I ran this the simplePath filter does not work as I'd have expected. This expression emits paths that are clearly circular:

==>[v[0], v[1]]
==>[v[0], v[1], v[2]]
==>[v[0], v[1], v[2], v[3]]
==>[v[0], v[1], v[2], v[3]]
==>[v[0], v[1], v[2], v[0]]
==>[v[0], v[1], v[2], v[0], v[1]]

I could easily be doing something wrong but I suspect it's a bug - I'll come to this at the end.

After further tinkering I found a much more clumsy method of achieving basically what I wanted by recreating the logic of simplePath as a filter

g.v(0).out.loop(1){it.loops < 10}{true}.path().filter{new HashSet(it).size() == it.size()}

running this in gremlin gives me (almost right, I can live with the duplicate last element)

==>[v[0], v[1]]
==>[v[0], v[1], v[2]]
==>[v[0], v[1], v[2], v[3]]
==>[v[0], v[1], v[2], v[3]]

However this version does not scale for larger graphs and deeper traversals from the initial vertex as using a post filter means gremlin traverses all paths(!) then filters them down to the valid set. I'd rather stop traversing paths as soon as I discover a path contains a circular reference.

On further debug I found what I think is a bug in the handling of paths when combined with loop. It means that simplePath only see's a few steps back in the path so cannot detect cycles properly. I've written a test and submitted a patch [1] to a forked version of the pipes framework which seems to fix the bug. When I use this in gremlin instead it allows the initial expression to work as expected.

As a novice to this framework I think the patch could definitely be improved, there are a couple of untidy edges but I have added a test as part of this patch to assert the behaviour and all the remaining tests in the pipes framework still pass, I haven't run a wider regression test though.

Please let me know if this solution looks valid or if there is an existing way of achieving what I want that I've missed.

Kind regards,

Dave



Daniel Kuppitz

unread,
May 18, 2013, 12:37:04 PM5/18/13
to gremli...@googlegroups.com
Hi Dave,

you almost had the solution. Try:

g = new TinkerGraph()
a = g.addVertex()
b = g.addVertex()
c = g.addVertex()
d = g.addVertex()

// now create a simple path through the graph
g.addEdge(a, b, "path", [:])
g.addEdge(b, c, "path", [:])
g.addEdge(c, d, "path", [:])

// finally create a loop in the graph
g.addEdge(c, a, "path", [:])

g.v(0).out().loop(1){it.loops < 5}{true}.simplePath().path()

Returns:

==>[v[0], v[1]]
==>[v[0], v[1], v[2]]
==>[v[0], v[1], v[2], v[3]]

Cheers,
Daniel

David Savage

unread,
May 18, 2013, 2:04:31 PM5/18/13
to gremli...@googlegroups.com
Hi Daniel,

Thanks for the quick reply, but sorry I should have said I tried that too. My problem is that I want to make the traversal as efficient as possible. 

Isn't putting simplePath at the end doing a post filter after doing all the loops (i.e. post filter) vs stop walking as soon as simplePath detects an invalid path?

If I extend the number of loops, say it.loops < 1000 then this will take a /very/ long time with a post filter. However if simplePath can be called prior to the loop it filters out any steps so the looping stops as soon as it has passed all valid paths.

The patch I've created allows simple path to be called during the loop so makes filtering via path possible for complex graphs.

Kind regards,

Dave

Daniel Kuppitz

unread,
May 18, 2013, 3:04:11 PM5/18/13
to gremli...@googlegroups.com
Hi Dave,

if your loop condition is it.loops < 1000 and you expect the path to cycle after 5 or even 10 steps, then IMO there's something wrong with your graph model :).
However, let's assume you have a real world scenario for this case. You can store the visited nodes in a Set and use this Set in the except pipe within the loop.

Same initialization as above and then:

gremlin> start = g.v(0); s = [start] as Set; start.out().except(s).store(s).loop(3){it.loops < 10}{true}.path()
==>[v[0], v[1]]
==>[v[0], v[1], v[2]]
==>[v[0], v[1], v[2], v[3]]

Cheers,
Daniel

David Savage

unread,
May 18, 2013, 3:42:41 PM5/18/13
to gremli...@googlegroups.com
Hi Daniel,

I like that syntax for solving the cyclic path problem, and I considered something like that too but unfortunately the cyclic path example is just that - an example. For my actual use case I need to be able to know the path that has been traversed so simply storing the passed nodes in a set is not enough as this doesn't allow me to test how the iteration got there.

I appreciate this is a pretty bizarre use case and sorry I can't give any details, but I guess I'm just asking whether there is anything /wrong/ with the patch I created - it doesn't seem to break any tests that I'm aware of. I appreciate there's cost in supporting a feature but if this could be incorporated in some form it would save me a whole bunch of hassle maintaining a branch for this case, and hey others may have need of this too at some point?

Happy to iterate on this to get this right, there's no immediate urgency as I've solved my use case via this branch but I figured contributing back was a good thing to do and will save me maintenance in the future.

Kind regards,

/Dave

David Savage

unread,
May 20, 2013, 7:45:12 AM5/20/13
to gremli...@googlegroups.com
Hi Daniel,

On further reflection the exclude/store approach is recreating what simplePath does internally and seems to be a work around for the fact that that simplePath is dependent on whether it is used inside or outside of a loop. 

I think it would be good to clear up whether Pipe.getCurrentPath() is intentionally not available inside of a loop, if it is not then it would be good if this were documented and/or caught as an exception to avoid confusion for others.

However if it is just a bug that it's not supported, I've added a pull request for this patch [1] which I believe addresses the issue, if it can be incorporated that would be great!

Kind regards,

/Dave

David Savage

unread,
May 20, 2013, 7:45:57 AM5/20/13
to gremli...@googlegroups.com

Daniel Kuppitz

unread,
May 20, 2013, 8:57:57 AM5/20/13
to gremli...@googlegroups.com
Hi Dave,

I think the behaviour - as we have it now - is expected. I mean, see the output of toString():

gremlin> g.v(0).out().simplePath().loop(2){it.loops < 5}{true}.path().toString()
==>[StartPipe, LoopPipe([OutPipe, CyclicPathFilterPipe]), PathPipe]

IMO that clearly shows where the path for the CyclicPathFilterPipe starts.

Cheers,
Daniel

David Savage

unread,
May 21, 2013, 5:47:33 AM5/21/13
to gremli...@googlegroups.com
Hi Daniel,

I'm not sure I follow why the output of toString defines the behaviour of currentPath, but this could be because I'm new to this project, can you explain?

Also can you think of any other way I can access the full path from the initial vertex inside of the loop? I've been experimenting with various options of storing the path, but all suffer from the same problem that currentPath only see's a few hops back.

Many thanks for your help.

Kind regards,

/Dave

Daniel Kuppitz

unread,
May 21, 2013, 6:33:16 AM5/21/13
to gremli...@googlegroups.com
Hi Dave,

the loop pipe wraps a part of the full path.

[StartPipe, LoopPipe([OutPipe, CyclicPathFilterPipe]), PathPipe]

Consequently everything inside this wrapped part can't/shouldn't see what's outside the loop. Only the loop closures (2nd and 3rd parameter for the loop pipe) have access to the full path (that's why your initial approach with the HashSet does also (almost) work). However, I guess that it's slower to create a new HashSet in each iteration than simply storing new elements in an existing Set, but I haven't really compared the performance of both approaches.

I don't know why the store/except approach does not work for you. Can you give any specific example of what you need? I'm quite sure we'll find a working solution. You can also send me a GraphSON export to my email address, if you don't want to share sensitive data in a public group.


Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages