TP3 SubGraph Step

324 views
Skip to first unread message

Stephen Mallette

unread,
Apr 10, 2014, 10:27:59 AM4/10/14
to gremli...@googlegroups.com
There have been a fair amount of "subgraphing" questions on the mailing list and StackOverflow over the years.  There's been no especially good one-line answer.

We tried to address this in TinkerPop3 with a new "subGraph" step that analyzes the traversal path and side-effects an edge-induced subgraph into another graph. So to get the "knows" subgraph around v[1] you could do:

g.v(1).outE().subGraph(g1, e -> e.getLabel().equals("knows"))

This step was made super-easy by the path information stored in the step (which Marko has mentioned on a few occasions now).  The subGraph step just consults the path, finds edges and loads them into graph instance supplied to it in the first parameter (i.e. g1), assuming the edge Predicate tests true.

The limitation to this step is that it must maintain mappings of element ids to build the subgraph (as most graphs don't support id assignment).  In that sense this subgraph function is meant for operations that can nicely fit in memory.  Of course, if you wrote a subgraph function yourself you'd likely end up with the same issues so having the feature to handle this common use case seemed to make sense.

Regards,

Stephen

Matthias Broecheler

unread,
Apr 10, 2014, 1:21:29 PM4/10/14
to gremli...@googlegroups.com
Hi Stephen,

the subgraph would become its own independent graph, right? Meaning, there is no way to go back to the source graph?
Thanks,
Matthias


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Matthias Broecheler
http://www.matthiasb.com

Stephen Mallette

unread,
Apr 10, 2014, 2:02:16 PM4/10/14
to gremli...@googlegroups.com
Right...it is an independent graph.  The use cases I've seen typically involve wanting to subgraph into a TinkerGraph to then export to GraphML for visualization or the like, so an independent and usually throwaway graph instance is what I wanted to cover.  That said, I guess there's nothing to stop you from passing the source graph into the subgraph function, thought the subgraph would have no connection to any of the existing graph elements that were already in there.

roald...@student.uclouvain.be

unread,
Jun 4, 2014, 5:44:27 AM6/4/14
to gremli...@googlegroups.com
Hi Stephen,

I was wondering if it was possible to combine the subGraph methods with traversal.

For example in a biochemical database I want to know what will happen if a delete a protein, so I want to extract the subgraph of all the nodes that are linked to that protein (only out edges)( with a depth of 5 for example). I'm searching on the net but I do not find a good tutorial.

Do you have any idea on how i can do this?

Thanks and regards,

Roald Targé

Stephen Mallette

unread,
Jun 4, 2014, 6:29:56 AM6/4/14
to gremli...@googlegroups.com
The new TinkerPop3 documentation on the subGraph step itself shows just that example (though with in edges as opposed to out edges):





--

Marko Rodriguez

unread,
Jun 4, 2014, 8:34:14 AM6/4/14
to gremli...@googlegroups.com
I just had a thought reading Matthias/yours back and forth.

What about making a HashSet<Edge> that is the subgraph so you have a "subgraph" (if the edge exists in the HashSet<Edge>) that is just references to existing edges in the original graph. … something like that.

?,
Marko.

Stephen Mallette

unread,
Jun 4, 2014, 9:01:34 AM6/4/14
to gremli...@googlegroups.com
I like that idea.  Basically use subGraph to pop-off an edge list from a traversal.  I wonder if you could then use that to somehow blind traversals using that edge list.  Like create a traversal strategy that limits the traversal to just that set of edges???  might not be super performant, but i think might have a good use case for ad-hoc analysis.  

Joshua Shinavier

unread,
Jun 4, 2014, 2:15:01 PM6/4/14
to gremli...@googlegroups.com
A hash map is one possible criterion for "blinding" traversals, but there are others.  What if you want to include all of the edges with timestamps newer than one hour (poor man's TTL), or all of the vertices which have been marked with a certain property, and all of their incident edges.  What if you want to create a logical subgraph which is too large to fit into memory?

I have just checked in a wrapper implementation, currently here:


...which lets you create create subgraphs based on arbitrary vertex and/or edge inclusion criteria.  E.g.

        Graph g = TinkerFactory.createClassic();

        Function<Vertex, Boolean> vertexCriterion = vertex -> (int) vertex.id() < 4;
        Function<Edge, Boolean> edgeCriterion = edge -> true;

        Subgraph sg = new Subgraph(g, vertexCriterion, edgeCriterion);

        // three vertices are included in the subgraph
        assertEquals(6, count(g.V().toList()));
        assertEquals(3, count(sg.V().toList()));

        // only two edges are present, even though edges are not explicitly excluded
        // (edges require their incident vertices)
        assertEquals(6, count(g.E().toList()));
        assertEquals(2, count(sg.E().toList()));

        assertEquals(2, count(g.v(1).out("knows").toList()));
        assertEquals(1, count(sg.v(1).out("knows").toList()));

or

        Set<Integer> includedEdgeIds = new HashSet<>();
        includedEdgeIds.add(8);
        includedEdgeIds.add(9);
        includedEdgeIds.add(10);

        Graph g = TinkerFactory.createClassic();

        Function<Vertex, Boolean> vertexCriterion = vertex -> true;
        Function<Edge, Boolean> edgeCriterion = edge -> includedEdgeIds.contains((int) edge.id());

        Subgraph sg = new Subgraph(g, vertexCriterion, edgeCriterion);

        // all vertices are here
        assertEquals(6, count(g.V().toList()));
        assertEquals(6, count(sg.V().toList()));

        // only the given edges are included
        assertEquals(6, count(g.E().toList()));
        assertEquals(3, count(sg.E().toList()));

        // wrapped Traversal<Vertex, Vertex> takes into account the edges it must pass through
        assertEquals(2, count(g.v(1).out("knows").toList()));
        assertEquals(1, count(sg.v(1).out("knows").toList()));


The implementation might require a bit more tweaking and testing, but you get the idea.


Best,

Josh


Stephen Mallette

unread,
Jun 4, 2014, 2:47:47 PM6/4/14
to gremli...@googlegroups.com
Hey Josh - what you have reminds me of a more generalized version of PartitionGraph.  Seems like there would be a use case for subgraphing in the manner you present that is complementary to the subGraph step itself.  I think that we can implement your SubGraph "wrapper" with less code using Graph/Traversal Strategy interfaces.  We might also be able to push down the filtering "criterion" to the graphdb which would keep us from filtering a lot of stuff in memory on those graphs that support such things.  All we need to do is generalize the existing PartitionGraphStrategy to SubGraphStrategy.  Ping me offline to discuss when you get a chance.

Matthias Broecheler

unread,
Jun 6, 2014, 12:23:55 AM6/6/14
to gremli...@googlegroups.com
Regarding the hashmap for subgraph - that might get tricky against a transactional graph. As Stephen said, a lot of use cases center around "popping" off a subgraph of interest and then doing something with it (i.e. viz, further analysis, hanging it on the wall). If the subgraph is just pointers to the source graph that would mean that the transaction which gave rise to the subgraph has to be kept open the entire time which might cause problems - in particular if the underlying database using pessimistic locking.

Joshua Shinavier

unread,
Jun 23, 2014, 3:20:16 PM6/23/14
to gremli...@googlegroups.com
By TinkerPopular demand, Subgraph is now SubgraphStrategy.  For example:

        Graph g = TinkerFactory.createClassic(); 
        
        Predicate<Vertex> vertexCriterion = vertex -> true;
        Predicate<Edge> edgeCriterion = edge -> (int) edge.id() >= 8 && (int) edge.id() <= 10;

        GraphStrategy strategy = new SubgraphStrategy(vertexCriterion, edgeCriterion);
        StrategyWrappedGraph sg = new StrategyWrappedGraph(g);
        sg.strategy().setGraphStrategy(strategy);

Pass into the strategy a vertex criterion, an edge criterion, or both.  All vertices in the base graph which pass the vertex criterion will appear in the StrategyWrappedGraph.  All edges which pass the edge criterion *and* whose in and out vertices pass the vertex criterion will appear in the StrategyWrappedGraph.

The strategy does not create any internal data structures other than traversal iterators, and inherits the base graph's transactions.

Josh

Stephen Mallette

unread,
Jun 23, 2014, 3:25:58 PM6/23/14
to gremli...@googlegroups.com
Very cool, Josh!  This will be a nice complement to the Subgraph step.  It's also yet another example of the power of GraphStrategy and TraversalStrategy.  It's really amazing how easy it is to pick apart a Traversal, analyze its contents, alter it based on those contents then execute it.  

Stephen

Marko Rodriguez

unread,
Jun 23, 2014, 3:33:26 PM6/23/14
to gremli...@googlegroups.com
Hey,

I just gave a quick look. I notice a lot of replacement of steps in the strategy. Note that in TraversalHelper I recently added TraversalHelper.replaceStep() which may make your implementation a little cleaner (less code). ?

Thanks Josh,
Marko.

MERAL Guillaume

unread,
Jun 24, 2014, 4:29:10 AM6/24/14
to gremli...@googlegroups.com
Please forgive me if i'm starting a fire here.

Could we considere a Subgraph as a type of graph of it's own (let's say like a Submarine).
Or is it just a Subpart of a Graph and then accurately named SubGraph ?


PS : (I just loved the June 12 commits)

Joshua Shinavier

unread,
Jun 24, 2014, 12:19:35 PM6/24/14
to gremli...@googlegroups.com
Ha.  Subgraph / SubGraph has turned into a serial comedy of the commit log.  Personally, I like "Subgraph", because "subgraph" is a word.  It is not a compound word.  Writing SubGraph in camel case singles out Sub as a distinct word and concept, which only makes sense if we are talking about graphs of submarines, teachers, sandwiches...?

Mmm, SubGraph.

Josh




Joshua Shinavier

unread,
Jun 24, 2014, 3:13:31 PM6/24/14
to gremli...@googlegroups.com
So we have converged on SubgraphXXX for both the step and the strategy.  You shouldn't see SubGraph again unless you are indeed dealing with sandwiches and the connections between them.  Create SubgraphStrategy like this:

    new SubgraphStrategy(vertexCriterion, edgeCriterion)

and use the step like this:

    g.E.subgraph(sg, {it.label == 'knows'})

It is likely that the step will be integrated with the strategy in future.

Josh


MERAL Guillaume

unread,
Jun 25, 2014, 3:39:51 AM6/25/14
to gremli...@googlegroups.com
Thanks Josh i like the sandwich graph concept very much.

I found this Subgraph Step very usefull.
I'm currently working on a Graph implementation named TulipGraph.

This step also allows me to convert any Tinkerpop3 Graph to a TulipGraph.
Thanks to this we can potentially visualize any Graph in Tulip.

Stephen Mallette

unread,
Jun 26, 2014, 8:03:44 AM6/26/14
to gremli...@googlegroups.com
Glad to hear that subgraph is useful to you.  Looking forward to hearing more about how your progress continues on Tulip.


--

Joshua Shinavier

unread,
Jun 26, 2014, 11:12:57 AM6/26/14
to gremli...@googlegroups.com
Same here.  I look forward to seeing what sort of visualizations are supported.  E.g. neighborhood views of a large graph -- dynamic or not -- would be especially handy; I wonder if that's why you are interested in subgraphs.

Josh
Reply all
Reply to author
Forward
0 new messages