Re: [TinkerPop] General advice for writing a Dijkstra-like traversal

495 views
Skip to first unread message

Matthias Broecheler

unread,
Aug 1, 2012, 9:42:08 PM8/1/12
to gremli...@googlegroups.com
I think what you want to implement is A* (http://en.wikipedia.org/wiki/A*_search_algorithm) which is an extension of Dijkstra for the use case you are describing. If you could come up with a reasonable admissible heuristic, then A* is typically super fast.

The most important part of the implementation is the data structure for holding the vertices - simple PriorityQueue, or if you are touching thousands of vertices and performance is of concern, a Fibonacci queue. For A* you should just be able to implement the pseudo code with gremlin/blueprints. You would want to modify it slightly to keep searching for good paths once its found one to accumulate the scores.


On Wed, Aug 1, 2012 at 1:57 PM, Andrew Ross <andrewsl...@gmail.com> wrote:
Context: I'm trying to find the "most related" vertices to a particular vertex A. The algorithm we're using involves finding _all_ paths (in practice, all short paths) from A to some other vertex X, and summing 1 / the cost of that path, over all paths. So a node X is highly related to A if there are lots of paths from A to X, and if all of those paths have a very low total cost. It's analogous to finding the nodes of least resistance to A, if path cost is like a resistor and we're adding them in series within the path but in parallel between the paths.

Currently this is implemented in a Cypher query plus some post-processing. But given that it's very slow (we've limited how many paths it looks for, so it doesn't take forever, but in principle it's extremely slow), we were thinking of speeding it up by using something like Dijkstra's algorithm. Essentially the idea would be to find the ~50 cheapest-to-get-to nodes X_i, because those are likely to be the most related, and then afterwards find _all_ paths from A to each X_i and calculate the actual relatedness. I think there is some built-in support for Dijkstra in Gremlin, but we need to be able to exclude connections through certain kinds of nodes, and also get the path cost not from the graph's edges but from the nodes' properties (since the cost of getting to a node is related to the that node's centrality in the graph, it's stored as a property on the node, not on the edge), so I was thinking of just writing my own.

Never having used Gremlin, my initial impulse is just to implement a priority queue and then do some looping through v.out, but I imagine there might be some better ways of doing things. Anyway, if anyone has insight into the best direction to go in, I'd be very grateful.

Thanks,
Andrew



--
Matthias Broecheler, PhD
http://www.matthiasb.com
E-Mail: m...@matthiasb.com

Andrew Ross

unread,
Aug 2, 2012, 9:34:43 AM8/2/12
to gremli...@googlegroups.com
Well, I don't know if we need A*, or if we can come up with a suitable heuristic. A* might be best if we were looking for the shortest path from vertex A to a faraway vertex X, but instead we're just looking for the closest 50 vertices X_i, so I think Dijkstra is just fine.

So Groovy comes with a PriorityQueue class -- that is awesome. Seems like I can just create a new priority queue with a custom comparator, start a Dijkstra search through the graph starting at A, add whatever I get from priority_queue.poll() to a list, then stop when my list has 50 elements. Sweet.

Thanks!

On Wednesday, August 1, 2012 9:42:08 PM UTC-4, Matthias Broecheler wrote:
I think what you want to implement is A* (http://en.wikipedia.org/wiki/A*_search_algorithm) which is an extension of Dijkstra for the use case you are describing. If you could come up with a reasonable admissible heuristic, then A* is typically super fast.

The most important part of the implementation is the data structure for holding the vertices - simple PriorityQueue, or if you are touching thousands of vertices and performance is of concern, a Fibonacci queue. For A* you should just be able to implement the pseudo code with gremlin/blueprints. You would want to modify it slightly to keep searching for good paths once its found one to accumulate the scores.
Reply all
Reply to author
Forward
0 new messages