Trying to improve query performance with vertex-centric indexes

792 views
Skip to first unread message

Clemens Viehl

unread,
Jul 25, 2018, 11:05:49 AM7/25/18
to JanusGraph users
Hello,

I'm trying to understand & use vertex-centric indexing in order to speed up traversals (like shortest path), but whatever I tried: I couldn't get them to be used in my queries.
Setup is JanusGraph 0.2.1 + Elasticsearch as provided with Janus + Cassandra 3.11.2 running with java.version 1.8.0_162.

In the following example I'm creating just a very small graph:

graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')
mgmt
= graph.openManagement()
// Vertex related initialisations
nodeIdKey
= mgmt.makePropertyKey('nodeId').dataType(String.class).make()
idxNodeId
= mgmt.buildIndex('idx_nodeId', Vertex.class).addKey(nodeIdKey).unique().buildCompositeIndex()
mgmt
.setConsistency(idxNodeId, ConsistencyModifier.LOCK)
idxNodeId
.getIndexStatus(nodeIdKey)
nodePropertyKey
= mgmt.makePropertyKey('nodeProperty').dataType(String.class).make()
idxNodeProperty
= mgmt.buildIndex('idx_nodeProperty', Vertex.class).addKey(nodePropertyKey).buildCompositeIndex()
idxNodeProperty
.getIndexStatus(nodePropertyKey)
// Edge related initialisations
relationLabel
= mgmt.makeEdgeLabel('relation').make()
edgePropertyKey
= mgmt.makePropertyKey('edgeProperty').dataType(String.class).make()
idxVertexCentric
= mgmt.buildEdgeIndex(relationLabel, 'idx_VertexCentric', Direction.BOTH, edgePropertyKey)
idxVertexCentric
.getIndexStatus()
mgmt
.commit()

// Creating some nodes
g
= graph.traversal()
graph
.addVertex('node').property('nodeId','4711')
g
.V().has('nodeId','4711').property('nodeProperty','aaaa')
graph
.addVertex('node').property('nodeId','4712')
g
.V().has('nodeId','4712').property('nodeProperty','bbbb')
graph
.addVertex('node').property('nodeId','4713')
g
.V().has('nodeId','4713').property('nodeProperty','cccc')

from = g.V().has('nodeId','4711').next()
to
= g.V().has('nodeId','4712').next()

// Creating an edge
from.addEdge('relation', to).property('nodeProperty', 'dddd')

// Check if indexes are used - in the profile output I can see that the composite index is used
g
.V().has('node', 'nodeProperty','bbbb').profile()  

// Expected usage of vertex centric index - but it isn't used here; my question is: why?
g
.E().has('node', 'edgeProperty','dddd').profile()
g
.E().has('node', 'edgeProperty', textContains('dddd')).profile()

Something I have noticed during my testing that might be related: calling awaitRelationIndexStatus takes a lot of time (1m) and although the actualStatus flag look good (ENABLED) the succeeded=false looks alarming to me.

gremlin> mgmt.awaitRelationIndexStatus(graph, 'idx_VertexCentric', 'relation').call()
==>RelationIndexStatusReport[succeeded=false, indexName='idx_VertexCentric', relationTypeName='relation', actualStatus=ENABLED, targetStatus=[REGISTERED], elapsed=PT1M0.129S]


Is this the reason why the vertex centric index isn't used? If so: what's most likely the cause?
For these tests I'm starting Cassandra, ElasticSearch and Gremlin shell via their windows scripts as they are (out of the box) without configuration changes.

Please note that I'm still quite new to the graph DB world, so I might miss something pretty basic...
Thanks in advance for any help!

Best regards,

Clemens

Ted Wilmes

unread,
Jul 25, 2018, 11:12:31 AM7/25/18
to JanusGraph users
Hi Clemens,
VCI's are local to each vertex and won't help with g.E().has(...) queries which are global lookups. You will see them used in instances where you are traversing a vertex's edges and filtering on an edge property like this pattern: g.V(123).outE().has('tstamp', gt(100)).inV().

--Ted

Clemens Viehl

unread,
Jul 26, 2018, 5:20:46 AM7/26/18
to JanusGraph users
Hi,

thanks for your fast reply, Ted!

Using different traversing queries based on what you have suggested I recieve the following output:

gremlin> g.V(4296).outE().has('edgeProperty','eeee').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[4296])                                                                       2,694    89,40
JanusGraphVertexStep([edgeProperty.eq(eeee)])                                                  0,319    10,60
                                           
>TOTAL                     -           -           3,014        -

I've not seen such short profile output before and can't tell if index is used in this query. Is it?

However, when I addres nodes like I did before, the profile output shows me (as far as I can tell) that the index is not used:

gremlin> g.V(from).outE().has('edgeProperty','eeee').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[v[4168]])                                            1           1           0,329    50,96
JanusGraphVertexStep([edgeProperty.eq(eeee)])                          1           1           0,316    49,04
   
\_condition=(edgeProperty = eeee AND EDGE AND visibility:normal)
   
\_isFitted=false
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0,080
                                           
>TOTAL                     -           -           0,645        -
                                                                          
Using full-text search predicates in my query (like textContainsRegex) in order to ensure that elastic search & indexing should be involved (from my understanding) yields similiar profile results: no index is used.

gremlin> g.V(from).outE().has('edgeProperty', textContainsRegex('e..e')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[v[4168]])                                            1           1           0,106    23,04
JanusGraphVertexStep([edgeProperty.textContains...                     1           1           0,354    76,96
   
\_condition=(edgeProperty textContainsRegex e..e AND EDGE AND visibility:normal)
   
\_isFitted=false
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0,078
                                           
>TOTAL                     -           -           0,461        -

What about the RelationIndexStatusReport[succeeded=false, ...]? Shouldn't I be worried about it?

Bye,

Clemens

Jason Plurad

unread,
Jul 26, 2018, 1:14:02 PM7/26/18
to JanusGraph users
You need to make sure you match your query against the index definition. With this VCI definition:

// Edge related initialisations
relationLabel
= mgmt.makeEdgeLabel('relation').make()
edgePropertyKey
= mgmt.makePropertyKey('edgeProperty').dataType(String.class).make()
idxVertexCentric
= mgmt.buildEdgeIndex(relationLabel, 'idx_VertexCentric', Direction.BOTH, edgePropertyKey)


You'd need a query like the one below which includes the edge label and the edge property. The composite index is used on the vertex lookup by nodeId, then the VCI kicks in on the edge property. Note that if you don't specify the edge label on the outE() step, the query is going to scan all out-going edges on the vertex.

gremlin> g.V().has('nodeId', '4711').outE('relation').has('edgeProperty', 'dddd').profile()

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[nodeId.eq(4711)])                                   1           1           1.285    64.09
   
\_condition=(nodeId = 4711)
   
\_isFitted=true
   
\_query=multiKSQ[1]@2147483647
   
\_index=idx_nodeId
   
\_orders=[]
   
\_isOrdered=true
  optimization                                                                                
0.424
JanusGraphVertexStep([edgeProperty.eq(dddd)])                          1           1           0.720    35.91
   
\_condition=(edgeProperty = dddd AND type[relation])
   
\_isFitted=true
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@b68d20af
   
\_orders=[]
    \_isOrdered=true
  optimization                                                                                 0.406
                                           
>TOTAL                     -           -           2.006        -



Regarding the index status:


gremlin> mgmt.awaitRelationIndexStatus(graph, 'idx_VertexCentric', 'relation').call()
==>RelationIndexStatusReport[succeeded=false, indexName='idx_VertexCentric', relationTypeName='relation', actualStatus=ENABLED, targetStatus=[REGISTERED], elapsed=PT1M0.129S]

With the awaitRelationIndexStatus() method, if you don't specific a target status, it defaults to waiting on REGISTERED status. As you can see from the result, the index is already ENABLED, which is the state beyond REGISTERED. Check out the index lifecycle wiki page.

Clemens Viehl

unread,
Jul 27, 2018, 11:26:32 AM7/27/18
to JanusGraph users
Hi Jason,

The difference is in the details. Now, the VCI is used and you helped me understand JanusGraph a little better. Thank you very much!

For anyone who lands here and wants to get better understandig of profile output: in this thread, Robert Dale was so kind and gave some good insights:

Bye,

Clemens
Reply all
Reply to author
Forward
0 new messages