Context: I'm trying to find the "most related" vertices to a particular vertex A. The algorithm we're using involves finding _all_ paths (in practice, all short paths) from A to some other vertex X, and summing 1 / the cost of that path, over all paths. So a node X is highly related to A if there are lots of paths from A to X, and if all of those paths have a very low total cost. It's analogous to finding the nodes of least resistance to A, if path cost is like a resistor and we're adding them in series within the path but in parallel between the paths.
Currently this is implemented in a Cypher query plus some post-processing. But given that it's very slow (we've limited how many paths it looks for, so it doesn't take forever, but in principle it's extremely slow), we were thinking of speeding it up by using something like Dijkstra's algorithm. Essentially the idea would be to find the ~50 cheapest-to-get-to nodes X_i, because those are likely to be the most related, and then afterwards find _all_ paths from A to each X_i and calculate the actual relatedness. I think there is some built-in support for Dijkstra in Gremlin, but we need to be able to exclude connections through certain kinds of nodes, and also get the path cost not from the graph's edges but from the nodes' properties (since the cost of getting to a node is related to the that node's centrality in the graph, it's stored as a property on the node, not on the edge), so I was thinking of just writing my own.
Never having used Gremlin, my initial impulse is just to implement a priority queue and then do some looping through v.out, but I imagine there might be some better ways of doing things. Anyway, if anyone has insight into the best direction to go in, I'd be very grateful.
Thanks,
Andrew