Shortest Paths Between Unique Combinations of Nodes

52 views
Skip to first unread message

Tim Schultz

unread,
Apr 13, 2018, 4:36:36 PM4/13/18
to Gremlin-users
Hello again, everyone!

I have a Gremlin question relating to sub-graphing and/or shortest paths.  I'm looking to identify the best way to go about this specific type of inquiry. 

I have a set of IDs of nodes I'm interested in (sources) understanding the possible relationships with other node IDs (targets) representing concepts I'm already familiar with.  The target nodes are sort of like "gold standard" nodes - I'm interested to see how seemingly unrelated concepts individually and (ideally) jointly link to them.

This sounds like a shortest-path problem between all unique combinations of node source and target node IDs.

Say I have the following:

sourceIDs = 101,102,103
targetIDs = 1,2

You'd want to aggregate N number of shortest paths (where path <= 6, say) between [101,1], [101,2], [102,1], [102,2], [103,1], [103,2].  And let's see whether there are any meaningful paths to the target nodes, as well as any interesting merges which occur between the paths returned.

This seems like it utilizes some type of iteration I'm not familiar with how to do in the context of Gremlin.

Solutions I've tried so far:

1.) Create a subgraph from all the edges between all the included source + target nodes:

g.V(1,2,101,102,103).repeat(__.outE().subgraph('subgraph').inV()).times(3).cap('subgraph').next()

This doesn't really get me what I'm looking for, since I have to hardcode the number of .times() and hope for some meaningful connections to surface to the targets, and it returns back EVERY connected node - which is memory intensive.

2.) Use path:

g.V(101,102,103).repeat(out().simplePath()).until(hasId(1)).path().limit(5);
g.V(101,102,103).repeat(out().simplePath()).until(hasId(2)).path().limit(5);

But this does not guarantee that the next node ID (i.e. 102) will ever be considered (at least this is my understanding).

What I need is more like an aggregation of all paths via #2 of all unique source/target combinations.

Is there any way I can accomplish this in Gremlin?

Thanks all and have a great weekend!

Tim

Daniel Kuppitz

unread,
Apr 13, 2018, 9:16:01 PM4/13/18
to gremli...@googlegroups.com
Using the modern graph, this would be the simplest solution:

gremlin> sourceIDs = [1,2,3]
==>1
==>2
==>3
gremlin> targetIDs = [5,6]
==>5
==>6

gremlin> g = TinkerFactory.createModern().traversal()==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().hasId(within(sourceIDs)).as("a").
......1>   repeat(both().simplePath()).
......2>     until(hasId(within(targetIDs))).as("b").
......3>   dedup("a","b").
......4>   path().limit(sourceIDs.size() * targetIDs.size())
==>[v[1],v[3],v[6]]
==>[v[1],v[4],v[5]]
==>[v[2],v[1],v[3],v[6]]
==>[v[2],v[1],v[4],v[5]]
==>[v[3],v[6]]
==>[v[3],v[4],v[5]]

However, it generates a lot more traversers than necessary (it follows every non-cyclic path for every start vertex until it reaches a target vertex or a dead end, even if it already found X paths for a specific start vertex - where X is the number of target vertices). To keep the number of traversers at a minimum, you could do something like this:

gremlin> g.V().hasId(within(sourceIDs)).as("a").
......1>   repeat(__.not(select("x").map {it.get().get(it.path("a"))}.
......2>                 count(local).is(eq(targetIDs.size()))).
......3>          both().simplePath().
......4>          sideEffect(hasId(within(targetIDs)).
......5>                     group("x").
......6>                       by(select("a")).
......7>                       by(unfold().dedup().fold()))).
......8>     emit(hasId(within(targetIDs))).
......9>   path()
==>[v[1],v[3],v[6]]
==>[v[1],v[4],v[5]]
==>[v[2],v[1],v[3],v[6]]
==>[v[2],v[1],v[4],v[5]]
==>[v[3],v[6]]
==>[v[3],v[4],v[5]]

Which of these is gonna be faster pretty much depends on your graph. My bet is that the second variant would outperform the first on larger graphs.

Cheers,
Daniel



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/3c88b36d-f5cd-4b36-9863-467c9f37c6b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Schultz

unread,
Apr 30, 2018, 2:25:24 PM4/30/18
to Gremlin-users
Hi Daniel,

Thank you so much for the detailed response - and apologies for the delayed reply.  I had to wipe the graph in question and start anew due to some unforeseen issues with the data loader I created.

I'm going to revisit this in the next few days - will let you know how this goes and which query seems to be the better approach.

Thanks again!

Regards,

Tim
Reply all
Reply to author
Forward
0 new messages