Gremlin graph projections

40 views

Skip to first unread message

Jordi Aranda

unread,

Dec 1, 2016, 4:06:32 AM12/1/16

to Aurelius

Some days ago I posted a question on SO about gremlin graph projections with titan, hopefully this will get an answer here.

The use case is pretty simple: I would like to create smaller graphs from a source graph by extracting implicit edges. For instance, say we have the following scenario (with different node types):

A -> B -> C -> D

Since the path from A to D exists, I would create in my new graph the nodes A and D and the edge between them. These paths can be found with a simple traversal, e.g.:

g.V().as('a').out('aTob').out('bToC').out('cToD').as('d').select('a', 'd');

The problem is that traversals not starting from a specific set of nodes imply a full graph scan since they do not use any kind of index, which is not desirable. Similar use cases may use the subgraph step but it is not stated in the docs that these kind of queries are not performant (?).

I wonder what kind of actions/strategies can be followed so that this can be accomplished in an efficient way.

Best,

HadoopMarc

unread,

Dec 1, 2016, 4:06:06 PM12/1/16

to Aurelius

Hi Jordi,

The functionality you want from the query makes it an OLAP type query. TitanGraph will always give you the index warning for these kind of queries. Normally, for non-mutating queries, you would use TinkerPop's Hadoopgraph and feed it using Titan's HBaseInputformat. What I currently use myself for mutating queries is the following:
- batch load your graph with Spark and store all vertex id's
- use Spark again to do the query in a distributed way, and use the stored id's to access the vertices you want (of course you could also use some stored indexed key property). Here, each Spark task has its own embedded TitanGraph

Hope this helps,

Marc

Op donderdag 1 december 2016 10:06:32 UTC+1 schreef Jordi Aranda:

Reply all

Reply to author

Forward

0 new messages