Trying to improve query performance with vertex-centric indexes

417 views
Skip to first unread message

Clemens Viehl

unread,
Jul 20, 2018, 8:36:52 AM7/20/18
to Gremlin-users
Hello,

in my test scenario I have trouble reaching good performance results. The setup is as follows:

- JanusGraph 0.2.1 with Cassandra 3.11.2 and ElasticSearch 6.0.1 (as provided with Janus)
- Graph size is ~ 17 Million nodes and ~ 40 Million Edges; very much properties on nodes, less on edges.

What I try to achieve is to get several (up to ten) shortest paths between two given nodes, whilst checking some of the properties during traversal.

My gremlin query looks like this (in java):

final List list = new ArrayList();

g
.V(fromNode).repeat(timeLimit(30000).both().
    has
("field1", value1).
    has
("field2").
    has
("field3", value3)).
    simplePath
()).until(timeLimit(30000).is(toNode).and().filter(maxHopsPredicate)).limit(10).path()
   
.fill(list);  

   
  
Most of the tests (> 95%) run into the timeout. In order to speed up things I tried using vertex-centric indexes (for testing purposes, the same properties are available on vertices and edges):

final PropertyKey field1Key = mgmt.getPropertyKey("field1");
final PropertyKey field2Key = mgmt.getPropertyKey("field2");
final PropertyKey field3Key = mgmt.getPropertyKey("field3");

final EdgeLabel relationLabel = mgmt.getEdgeLabel("relation");

mgmt
.buildEdgeIndex(relationLabel, "edge_permissions_index", Direction.BOTH, field1Key, field2Key, field3Key);

And modified my query accordingly:

final List list = new ArrayList();
g
.V(fromNode).repeat(timeLimit(30000).bothE().
    has
("field1", value1).
    has
("field2").
    has
("field3", value3)).bothV().
    simplePath
()).until(timeLimit(30000).is(toNode).and().filter(maxHopsPredicate)).limit(10).path()
   
.fill(list);      
   
However, the difference was almost negligible. A few queries were faster than before, but still most of them (> 95%) went into the timeout.

So I'm not even sure if the index has been used at all. Is there some easy way to verify this?
In gremlin shell I see that the index is there, but how can I tell if it's used in my query?
(I tried interpreting the profile() output, but I'm not able to see it inside.)

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@1cd2143b
gremlin> rindex = mgmt.getRelationIndex(mgmt.getRelationType('relation'), 'edge_permissions_index')
==>edge_permissions_index
gremlin> mgmt.awaitRelationIndexStatus(graph, 'edge_permissions_index', 'relation').call()
==>RelationIndexStatusReport[succeeded=false, indexName='edge_permissions_index', relationTypeName='relation', actualStatus=ENABLED, targetStatus=[REGISTERED], elapsed=PT1M0.124S]


Any hints on what I might be doing wrong or how to improve the query performance are highly welcome.

Bye,

Clemens

Daniel Kuppitz

unread,
Jul 20, 2018, 1:37:56 PM7/20/18
to gremli...@googlegroups.com
The index can not be used as you don't provide a value for field2. Try to create an index that only covers field1 and field3.
Furthermore, instead of bothV() you should really use otherV().

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/77974063-26f1-4efc-8b58-ffa04a38a9f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Clemens Viehl

unread,
Jul 23, 2018, 8:07:01 AM7/23/18
to Gremlin-users
Hello,

thanks four your input, Daniel.

When I change the check on field2 to not check just for it's existance but to check against an value the existing index still is not used.

However, in order to get any indexes to be used I simplified things further and created a new index on just one property:

final PropertyKey field1Key = mgmt.getPropertyKey("field1");
final EdgeLabel relationLabel = mgmt.getEdgeLabel("relation");

mgmt
.buildEdgeIndex(relationLabel, "simple_edge_permissions_index", Direction.BOTH, field1Key);
mgmt
.commit();

And I can see it's available and enabled:

gremlin> mgmt.awaitRelationIndexStatus(graph, 'simple_edge_permissions_index', 'relation').call()
==>RelationIndexStatusReport[succeeded=false, indexName='simple_edge_permissions_index', relationTypeName='relation', actualStatus=ENABLED, targetStatus=[REGISTERED], elapsed=PT1M0.072S]

By the way: what about the success=false? In janus sources (RelationIndexStatusWatcher) I can see that it's because there has been an timeout.

When I run my simplified query:

g.V(fromNode).repeat(bothE().
  has
('field1', 'value1').otherV().
  simplePath
()).until(is(toNode)).limit(10).path().profile()

the index still is not used:

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[v[4216]])                                            1           1           0,064    11,19
RepeatStep([JanusGraphVertexStep([field1.eq(val...                                             0,304    52,52
 
JanusGraphVertexStep([field1.eq(value1)])                                                    0,187
   
\_condition=(field1 = value1 AND EDGE AND visibility:normal)
   
\_isFitted=false
   
\_vertices=1
   
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@801a60ee
   
\_orders=[]
   
\_isOrdered=true
    optimization                                                                              
0,036
 
EdgeOtherVertexStep                                                                          0,026
 
PathFilterStep(simple)                                                                       0,011
 
RepeatEndStep                                                                                0,027
RangeGlobalStep(0,10)                                                                          0,188    32,54
PathStep                                                                                       0,021     3,75
                                           
>TOTAL                     -           -           0,579        -
 
Is the sccuess=false/timeout issue the reason why indexes are not used? If so, what am I doing wrong?

Daniel Kuppitz

unread,
Jul 23, 2018, 2:30:07 PM7/23/18
to gremli...@googlegroups.com
Is the sccuess=false/timeout issue the reason why indexes are not used?

Probably yes. But you have a better chance to get answers on this if you post your question on the JanusGraph users list. It might also be worth to look for existing answers - questions about indexing problems have been answered quite often.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages