inside() function is really slow.

23 views
Skip to first unread message

johan....@gmail.com

unread,
May 18, 2018, 9:56:27 AM5/18/18
to Gremlin-users
Hello everyone, 

I'm rather new to gremlin and I have encountered a performance problem. 
I have a graph with the dfollowing configuration : storage backend : cassandra, index backend : elasticsearch (default configuration for gremlin server with janusgraph.)

My graph contains around 30 million nodes, and I'm looking to retrieve nodes based on a numeric (Double) value called "beginning", on whoch I have created an index with the following code: 


//properties creation
val beginning = mgmt.makePropertyKey("beginning").dataType(jDouble.getClass).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey("active").dataType(jBoolean.getClass).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey("lastTimestamp").dataType(jDouble.getClass).cardinality(Cardinality.SINGLE).make()

//index creation
mgmt.buildIndex("journeyBeginning",classOf[Vertex]).addKey(beginning).buildCompositeIndex()

later in my code I reindex like this : 

graph.tx().rollback()
mgmt.commit()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("journeyBeginning"),SchemaAction.REINDEX).get()
mgmt.commit()

I know that the index is working because when I try this query,
:> g.V().has("beginning", 1.522135450632E12)
 I get my result instantly

However when I try this query, 
:> g.V().has("beginning",inside(1522060884498,1522363884498)).next(1)
execution time goes over the timeout (which is set to around 10 mins)

this last query sort of works whet I put a next clause in it like this: 
:> g.V().has("beginning",inside(1522060884498,1522363884498)).next(100)
However even this is not really fast (getting 100 items takes several seconds.) and I cannot limit the number of objectes returned (I'm expecting several thousands, maybe tens of thousands objects returned by the query)

I suppose that I am not using indexes properly, but I am at loss on how to do this any other way. 

Thanks in advance for your answers !




Stephen Mallette

unread,
May 18, 2018, 3:19:48 PM5/18/18
to Gremlin-users
You might get some help here from JanusGraph folks but this is definitely a question better posted on their user mailing list:


TinkerPop really doesn't provide any optimizations when it comes to filtering/indexing as part of the framework - that is up to the individual graph providers when they implement our interfaces. That's their time to shine!

My guess is that JanusGraph isn't properly optimizing the inside() predicate for your composite index (though I think that's a known issue as I feel like i've seen folks ask about that here before). 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/75fd3f11-27d9-4aea-ab1e-363e07c1716c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

HadoopMarc

unread,
May 18, 2018, 3:39:48 PM5/18/18
to Gremlin-users
Hi Johan,

The JanusGraph CompositeIndex is only valid for equality matches. JanusGraph supports additional predicates when using a MixedIndex, which delegates indexing to the indexing backend, see:

According to the example in section 21.4, the inside() predicate, as a composition of gt() and lt(), should also work with a MixedIndex.

HTH,    Marc

Op vrijdag 18 mei 2018 15:56:27 UTC+2 schreef johan....@gmail.com:
Reply all
Reply to author
Forward
0 new messages