Hello everyone,
I'm rather new to gremlin and I have encountered a performance problem.
I have a graph with the dfollowing configuration : storage backend : cassandra, index backend : elasticsearch (default configuration for gremlin server with janusgraph.)
My graph contains around 30 million nodes, and I'm looking to retrieve nodes based on a numeric (Double) value called "beginning", on whoch I have created an index with the following code:
//properties creation
val beginning = mgmt.makePropertyKey("beginning").dataType(jDouble.getClass).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey("active").dataType(jBoolean.getClass).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey("lastTimestamp").dataType(jDouble.getClass).cardinality(Cardinality.SINGLE).make()
//index creation
mgmt.buildIndex("journeyBeginning",classOf[Vertex]).addKey(beginning).buildCompositeIndex()
later in my code I reindex like this :
graph.tx().rollback()
mgmt.commit()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("journeyBeginning"),SchemaAction.REINDEX).get()
mgmt.commit()
I know that the index is working because when I try this query,
:> g.V().has("beginning", 1.522135450632E12)
I get my result instantly
However when I try this query,
:> g.V().has("beginning",inside(1522060884498,1522363884498)).next(1)
execution time goes over the timeout (which is set to around 10 mins)
this last query sort of works whet I put a next clause in it like this:
:> g.V().has("beginning",inside(1522060884498,1522363884498)).next(100)
However even this is not really fast (getting 100 items takes several seconds.) and I cannot limit the number of objectes returned (I'm expecting several thousands, maybe tens of thousands objects returned by the query)
I suppose that I am not using indexes properly, but I am at loss on how to do this any other way.
Thanks in advance for your answers !