Using ES for traversal queries ?

180 views
Skip to first unread message

Suny

unread,
Sep 20, 2017, 10:41:43 AM9/20/17
to JanusGraph users
Hi,

I am using JG with Cassandra and ES. 

I have a type attribute on all vertices based on which i can differentiate group of vertices. 

The query i want to do is

:> g.V().has('type',textContains('car')).inE().has('timestamp','').inV().valueMap()


I created an index on type attribute. If I just query for :> g.V().has('type',textContains('car')) it is coming back very fast with result. If I add the traversal part it is slowing down.


So, JG is using ES to retrieve the vertices of type 'car' and then for traversal it is using just JG. Can I make JG to use ES for traversal too ?


If i add index on edge attribute 'timestamp', does this fasten the query ?


Thanks





Daniel Kuppitz

unread,
Sep 20, 2017, 2:35:27 PM9/20/17
to JanusGraph users
You should
  • provide an edge label for inE()
  • have a vertex centric index on timestamp
  • use a simple filter, instead of traversing back to the previous vertex (unless you rely on duplicates in your result)
Something like this:

g.V().has('type', textContains('car')).
  filter(inE('edge-label').has('timestamp','')).
  valueMap()

If you expect very large results, you'd be better off using an OLAP query.

Can I make JG to use ES for traversal too ?

ES is only useful for initial / global vertex lookups.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5aeb2fdc-6199-45f4-8953-866118c2d81a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Suny

unread,
Sep 21, 2017, 10:07:31 AM9/21/17
to JanusGraph users
Thanks, All the in edges will have same label. Does it still traverse through all edges or directly find the edges with empty timestamp ?

So in cases where i need to do lot of traversal, ES is not helpful ?

Also can you explain a bit more on OLAP query
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.

Daniel Kuppitz

unread,
Sep 21, 2017, 10:28:47 AM9/21/17
to JanusGraph users
Thanks, All the in edges will have same label. Does it still traverse through all edges or directly find the edges with empty timestamp ?

I assume that Janus can only leverage a vertex centric index, if you specify the label, since that was the case in Titan and I don't think I heard about any changes in this area.

So in cases where i need to do lot of traversal, ES is not helpful ?

Again, ES is helpful for the initial vertex lookup. Once you start to traverse through the graph, you can only rely on vertex centric indices (or a good model that doesn't require much filtering).

Also can you explain a bit more on OLAP query

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/94e4f5f4-8670-403d-9038-fd64382b61a5%40googlegroups.com.

Suny

unread,
Sep 21, 2017, 10:57:27 AM9/21/17
to JanusGraph users
Thanks. Is the vertex-centric index not stored in Elastic search. Is it only in JG ? 

Daniel Kuppitz

unread,
Sep 21, 2017, 11:05:37 AM9/21/17
to JanusGraph users
Only in the underlying storage backend.


Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/e24480c8-52eb-4dd7-b511-b2722ad1bc2b%40googlegroups.com.

Suny

unread,
Sep 21, 2017, 1:40:27 PM9/21/17
to JanusGraph users
Is there a way to find out if the query is using vertex-centric index or not ?

Daniel Kuppitz

unread,
Sep 21, 2017, 1:57:48 PM9/21/17
to JanusGraph users
Maybe .explain() can give you some hints, but I'm not sure about that and don't have a test environment.
Perhaps Jason or somebody else using Janus can jump in at this point..?

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/ef59f2fc-b22a-431a-891f-a8adbe0ccb2d%40googlegroups.com.

Suny

unread,
Sep 21, 2017, 1:59:37 PM9/21/17
to JanusGraph users
I implemented vertex-centric index on edge label.

Here is the query i am doing

g.V().has('type',textContains('car')).inE().has('timestamp',eq('')).inV().valueMap()


The documentation says - JanusGraph is intelligent enough to use vertex-centric indices when available.


So can i assume that Janusgraph uses the vertex-centric index for this query ?

Jason Plurad

unread,
Sep 22, 2017, 8:54:54 AM9/22/17
to JanusGraph users
No, explain() doesn't give any clear indication currently whether an index will be utilized. I opened up an issue for that.

What counts are you dealing with here?

g.V().has('type',textContains('car')).count()
g.V().has('type',textContains('car')).inE().count()
g.V().has('type',textContains('car')).inE().has('timestamp',eq('')).count()

Suny

unread,
Sep 22, 2017, 10:07:26 AM9/22/17
to JanusGraph users
Thanks. I am dealing with count here. I need the attributes list on those vertices.

g.V().has('type',textContains('car')) - This is coming out very fast, which i assume is because of index in ES.

g.V().has('type',textContains('car')).inE().has('timestamp',eq('')).inV().valueMap() - This is slow. I have 1500 vertices and each vertex has about 3-5 attributes on it. I implemented vertex-centric index, on timestamp property but not sure if it is being used.

Jason Plurad

unread,
Sep 22, 2017, 10:10:57 AM9/22/17
to JanusGraph users
is this slow?

g.V().has('type',textContains('car')).inE().has('timestamp',eq('')).inV().id()

Suny

unread,
Sep 22, 2017, 10:12:44 AM9/22/17
to JanusGraph users
yes. the first time i query it took 129826 ms, and then 600-700 ms later on (which i believe is because of caching). Without that it is taking around 128000ms

Jason Plurad

unread,
Sep 22, 2017, 10:18:46 AM9/22/17
to JanusGraph users
How did you define the 'timestamp' property and the vertex-centric index that uses it?
Seems strange to me that you're looking for has('timestamp, '') rather than comparing Long values.

Suny

unread,
Sep 22, 2017, 10:21:45 AM9/22/17
to JanusGraph users
I defined timestamp as a string.

Here is how i created those 

        final PropertyKey myTimestamp = mgmt.makePropertyKey("timestamp").dataType(String.class).make();

        EdgeLabel connectedTo = mgmt.getOrCreateEdgeLabel("connected_to");


        mgmt.buildEdgeIndex(connectedTo, "forAllEdgesOnTimestamp", Direction.BOTH, myTimestamp);

Bhawesh Agarwal

unread,
Dec 1, 2017, 6:11:16 AM12/1/17
to JanusGraph users
Any Update for this post?
I am facing same issue.. 

Bhawesh Agarwal

unread,
Dec 1, 2017, 9:51:40 AM12/1/17
to JanusGraph users
Hello,
I am trying to create vertices and with one label.

I have created Schema Something like this:
mgmt = graph.openManagement()
entityInstance = mgmt.makeVertexLabel('entityInstance').make()
contains = mgmt.makeEdgeLabel('contains').multiplicity(MULTI).make()

typeInt = mgmt.makePropertyKey('vid').dataType(Integer.class).cardinality(Cardinality.SINGLE).make()
typeInt = mgmt.makePropertyKey('typeInt').dataType(Integer.class).cardinality(Cardinality.SINGLE).make()
eTypeInt = mgmt.makePropertyKey('eTypeInt').dataType(Integer.class).cardinality(Cardinality.SINGLE).make()
state = mgmt.makePropertyKey('state').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.commit()

Created Index

mgmt=graph.openManagement()
vid=mgmt.getPropertyKey('vid')
typeInt=mgmt.getPropertyKey('typeInt')
state=mgmt.getPropertyKey('state')
eTypeInt=mgmt.getPropertyKey('eTypeInt')
contains=mgmt.getEdgeLabel('contains')
mgmt.buildEdgeIndex(contains,'containsByName',Direction.BOTH, Order.decr, eTypeInt)

mgmt.buildIndex('vidIndex',Vertex.class).addKey(vid).buildMixedIndex("search")
mgmt.buildIndex('typeIntIndex',Vertex.class).addKey(typeInt).buildMixedIndex("search")
mgmt.buildIndex('stateIndex',Edge.class).addKey(state).buildMixedIndex("search")
mgmt.buildIndex('eTypeIntIndex',Edge.class).addKey(eTypeInt).buildMixedIndex("search")
mgmt.commit() 

Inserted Data

def entityEntry(i, eTypeInt, graph){
def instance = graph.addVertex(label,'entityInstance')
instance.property('vid',i)
instance.property('typeInt',eTypeInt)
return instance
}

def containsEntryEdge(a1, a2, state,typeInt){
def instance = a1.addEdge('contains', a2); //Type can be source or target or contains
instance.property('state',state)
instance.property('eTypeInt',typeInt)
return instance
}

a1 = entityEntry(1,1,graph);
a2 = entityEntry(2,1,graph);
a3 = entityEntry(3,1,graph);
a4 = entityEntry(4,1,graph);

b1 = entityEntry(1,2,graph);
b2 = entityEntry(2,2,graph);
b3 = entityEntry(3,2,graph);
b4 = entityEntry(4,2,graph);
b5 = entityEntry(5,2,graph);
b6 = entityEntry(6,2,graph);
b7 = entityEntry(7,2,graph);
b8 = entityEntry(8,2,graph);

c1 = entityEntry(1,3,graph);
c2 = entityEntry(2,3,graph);
c3 = entityEntry(3,3,graph);
c4 = entityEntry(4,3,graph);
c5 = entityEntry(5,3,graph);
c6 = entityEntry(6,3,graph);
c7 = entityEntry(7,3,graph);
c8 = entityEntry(8,3,graph);
c9 = entityEntry(9,3,graph);
c10 = entityEntry(10,3,graph);
c11 = entityEntry(11,3,graph);
c12 = entityEntry(12,3,graph);
c13 = entityEntry(13,3,graph);
c14 = entityEntry(14,3,graph);
c15 = entityEntry(15,3,graph);
c16 = entityEntry(16,3,graph);

containsEntryEdge(a1,b1,'pass',100);
containsEntryEdge(a1,b2,'fail',100);
containsEntryEdge(a2,b3,'pass',100);
containsEntryEdge(a2,b4,'fail',100);
containsEntryEdge(a3,b5,'pass',100);
containsEntryEdge(a3,b6,'fail',100);
containsEntryEdge(a4,b7,'pass',100);
containsEntryEdge(a4,b8,'fail',100);

containsEntryEdge(b1,c1,'fail',101);
containsEntryEdge(b1,c2,'fail',101);
containsEntryEdge(b2,c3,'fail',101);
containsEntryEdge(b2,c4,'fail',101);
containsEntryEdge(b3,c5,'fail',101);
containsEntryEdge(b3,c6,'fail',101);
containsEntryEdge(b4,c7,'fail',101);
containsEntryEdge(b4,c8,'fail',101);
containsEntryEdge(b5,c9,'fail',101);
containsEntryEdge(b5,c10,'fail',101);
containsEntryEdge(b6,c11,'fail',101);
containsEntryEdge(b6,c12,'fail',101);
containsEntryEdge(b7,c13,'fail',101);
containsEntryEdge(b7,c14,'fail',101);
containsEntryEdge(b8,c15,'fail',101);
containsEntryEdge(b8,c16,'fail',101);

Now I am running query

g.V().has('typeInt',1).out('contains').has('typeInt',2).profile() 
 Traversal Metrics for this command is:

==>Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

JanusGraphStep([],[typeInt.eq(1)])                                                         0.559    21.05

    \_condition=(typeInt = 1)

    \_isFitted=true

    \_query=[(typeInt = 1)]:typeIntIndex

    \_index=typeIntIndex

    \_orders=[]

    \_isOrdered=true

    \_index_impl=search

  optimization                                                                                 0.185

JanusGraphVertexStep(OUT,[contains],vertex)                                                0.840    31.62

    \_condition=type[contains]

    \_isFitted=true

    \_vertices=1

    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@81b4bf0f

    \_orders=[]

    \_isOrdered=true

  optimization                                                                                 0.122

  optimization                                                                                 0.066

  optimization                                                                                 0.074

  optimization                                                                                 0.065

HasStep([typeInt.eq(2)])                                                                   1.258    47.34

                                            >TOTAL                                         2.657        -


Now Another query with .filter()

g.V().has('typeInt',1).filter(out('contains').has('typeInt',2)).profile()


Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

JanusGraphStep([],[typeInt.eq(1)])                                                         0.422    25.58

    \_condition=(typeInt = 1)

    \_isFitted=true

    \_query=[(typeInt = 1)]:typeIntIndex

    \_index=typeIntIndex

    \_orders=[]

    \_isOrdered=true

    \_index_impl=search

  optimization                                                                                 0.153

TraversalFilterStep([JanusGraphVertexStep(OUT,[...                                         1.229    74.42

  JanusGraphVertexStep(OUT,[contains],vertex)                                              0.547

    \_condition=type[contains]

    \_isFitted=true

    \_vertices=1

    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@81b4bf0f

    \_orders=[]

    \_isOrdered=true

    optimization                                                                               0.102

    optimization                                                                               0.049

    optimization                                                                               0.045

    optimization                                                                               0.044

  HasStep([typeInt.eq(2)])                                                                     0.527

                                            >TOTAL                                         1.651        -



Query 3


g.V().has('typeInt',2).profile()

==>Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur

=============================================================================================================

JanusGraphStep([],[typeInt.eq(2)])                                                         0.427   100.00

    \_condition=(typeInt = 2)

    \_isFitted=true

    \_query=[(typeInt = 2)]:typeIntIndex

    \_index=typeIntIndex

    \_orders=[]

    \_isOrdered=true

    \_index_impl=search

  optimization                                                                                 0.148

                                            >TOTAL                                         0.427        -


In 1st and 2nd Query for typeInt 1 isFitted=True, and \query=<indexName> but for typeInt2 isFitted=True but \query=diskstorage.keycolumnvalue.SliceQuery@xxxxxx.


In 3rd Query typeInt 2 isFitted=True, and \query=<indexName>

For me It looks like Query1 and Query2 does not implement indexing on level 2. (I have created vertex centric index: mgmt.buildEdgeIndex(contains,'containsByName',Direction.BOTH, Order.decr, eTypeInt))


How can I improve performance on Query 1 and Query 2?
 
Is their any way to improve performance by increasing instances?
Reply all
Reply to author
Forward
0 new messages