How to know whether a specific gremlin query performed from the gremlin console used defined indices or not ?

385 views
Skip to first unread message

Amit Kumar

unread,
Dec 9, 2014, 1:53:26 AM12/9/14
to gremli...@googlegroups.com
Experts,

I have index created on a property 'idxp' on each vertex and edge for my graph (neo4j for example). Is there a way to find out which out of the following query on gremlin console is really using the index on 'idxp' (not just in theory) ?

1. gremlin>g.V('idxp', 'foo');
2. gremlin>g.V.has('idxp','foo');
3. gremlin>g.getVertices('idxp','foo');
4. gremlin>g.query().has('idxp','foo').vertices();

My theoretical understanding so far says -

(1) will surely use index
(2) will most probably not use index (full scan on V)
(3) uses core blueprint API that gremlin performance tuning guidelines indicated should be faster than (1) or (2). Most likely uses index.
(4) just like (2), my guess is that it wont use index. However, I read in tinkerpop forum that getVertices() and getEdges() will be replaced by this query() API eventually and that, it will expect the underlying databases to decide whether to use index or not.

I feel lost and confused with the knowledge I collected so far on this topic. I want to hear from the experts on the theory with a possible proof. Also, I do not know any 'knob' that can indicate index usage (similar to explain-plans in the relational world).




Stephen Mallette

unread,
Dec 9, 2014, 6:43:53 AM12/9/14
to gremli...@googlegroups.com
It is up the underlying graph database to decide how to implement these functions so you would need to learn the nature of the backend you intend to use to know the answer to all four of these approaches.  I will say that, the expectation from the TinkerPop perspective is that 1 and 3 should use an index for every graph implementation.  Generally speaking, 2 and 4 would be an optional optimization and I think Titan is the only graph that will optimize that to use an index (though I think OrientDB does do some query optimization around graph query).

As of TinkerPop2 there is some mystery as to what goes on in traversal.  We hope to rectify that to some degree in TP3.  In TP2 you can do this:

gremlin> g.V("idxp","foo").toString()
==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]
gremlin> g.V.has("idxp","foo").toString()
==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]

to give you some idea as to how the Gremlin compiles down to pipes which can tell you a more about what's going on under the hood.



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/dbd96312-258a-4178-9b1b-fb60b60dcc77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amit Kumar

unread,
Dec 9, 2014, 12:22:42 PM12/9/14
to gremli...@googlegroups.com

Thanks Stephen for your comments. Here is what I see for each option mentioned.

1) gremlin> g.V('idxp','foo').toString()

==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]

2) gremlin> g.V.has('idxp','foo').toString()

==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]

3) gremlin> g.getVertices('idxp','foo').toString()

==>com.tinkerpop.blueprints.impls.neo4j2.Neo4j2VertexIterable@45c6c4cf

4) gremlin> g.query().has('idxp','foo').toString()

==>com.tinkerpop.blueprints.util.DefaultGraphQuery@7eb03b9a

Does above output indicate that (4) is not using indices based on https://groups.google.com/forum/#!searchin/gremlin-users/getVertices/gremlin-users/SIB0W5CUdlk/E-AWsu9Q2zQJ ("For those engines that don't have indices, then there are always DefaultQueries")

Considering the following recommendation from https://github.com/tinkerpop/gremlin/wiki/Gremlin-Groovy-Path-Optimizations -

"In general, forREPLing around a graph, use the more concise representation. For production traversals, use native Blueprints API calls and avoid reflection."

Does it mean that for production, I should avoid using (1) and (2) if I have to locate vertices with 'idxp' == 'foo' ?

Reply all
Reply to author
Forward
0 new messages