Hello,
in my test scenario I have trouble reaching good performance results. The setup is as follows:
- JanusGraph 0.2.1 with Cassandra 3.11.2 and ElasticSearch 6.0.1 (as provided with Janus)
- Graph size is ~ 17 Million nodes and ~ 40 Million Edges; very much properties on nodes, less on edges.
What I try to achieve is to get several (up to ten) shortest paths between two given nodes, whilst checking some of the properties during traversal.
My gremlin query looks like this (in java):
final List list = new ArrayList();
g.V(fromNode).repeat(timeLimit(30000).both().
has("field1", value1).
has("field2").
has("field3", value3)).
simplePath()).until(timeLimit(30000).is(toNode).and().filter(maxHopsPredicate)).limit(10).path()
.fill(list);
Most of the tests (> 95%) run into the timeout. In order to speed up things I tried using vertex-centric indexes (for testing purposes, the same properties are available on vertices and edges):
final PropertyKey field1Key = mgmt.getPropertyKey("field1");
final PropertyKey field2Key = mgmt.getPropertyKey("field2");
final PropertyKey field3Key = mgmt.getPropertyKey("field3");
final EdgeLabel relationLabel = mgmt.getEdgeLabel("relation");
mgmt.buildEdgeIndex(relationLabel, "edge_permissions_index", Direction.BOTH, field1Key, field2Key, field3Key);
And modified my query accordingly:
final List list = new ArrayList();
g.V(fromNode).repeat(timeLimit(30000).bothE().
has("field1", value1).
has("field2").
has("field3", value3)).bothV().
simplePath()).until(timeLimit(30000).is(toNode).and().filter(maxHopsPredicate)).limit(10).path()
.fill(list);
However, the difference was almost negligible. A few queries were faster than before, but still most of them (> 95%) went into the timeout.
So I'm not even sure if the index has been used at all. Is there some easy way to verify this?
In gremlin shell I see that the index is there, but how can I tell if it's used in my query?
(I tried interpreting the profile() output, but I'm not able to see it inside.)
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties')
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@1cd2143b
gremlin> rindex = mgmt.getRelationIndex(mgmt.getRelationType('relation'), 'edge_permissions_index')
==>edge_permissions_index
gremlin> mgmt.awaitRelationIndexStatus(graph, 'edge_permissions_index', 'relation').call()
==>RelationIndexStatusReport[succeeded=false, indexName='edge_permissions_index', relationTypeName='relation', actualStatus=ENABLED, targetStatus=[REGISTERED], elapsed=PT1M0.124S]Any hints on what I might be doing wrong or how to improve the query performance are highly welcome.
Bye,
Clemens