Warning: Query requires iterating over all vertices [()]. For better performance, use indexes

1,270 views
Skip to first unread message

Lisa Fiedler

unread,
Jul 3, 2019, 3:47:07 AM7/3/19
to Gremlin-users
Hi everyone,

I was wondering about the Warning message:
Query requires iterating over all vertices [()]. For better performance, use indexes

I am using composite indecees on all properties, also some mixed indecees (via elasticsearch).

However, every time I am posing a query such as
g.V()...
this warning always shows up.
Even when doing
g.V().limit(1)...

Am I doing something wrong? Are there any other kind of indecees one should use for performance boosts?

Thanks

Antriksh Shah

unread,
Jul 3, 2019, 4:03:10 AM7/3/19
to Gremlin-users
Hey,

For basics on indexing do go to: https://docs.janusgraph.org/latest/indexes.html

g.V() -> will print all the vertices, which will be a full scan.
g.V().limit(1) -> will again read all data but stop after finding one vertex.

To use index you need to:
For composite Index:g.V().has(<indexedKey>,<values>) 
For mixed Index: g.V().has(<indexedKey>,textContains(value))

For more reading go to:

As for your other question, no there are no other index apart from mixed and composite. 
For understanding the  performance you can use the profile() step to figure out why your query are taking time.
g.V().has("key","value").profile()

Lisa Fiedler

unread,
Jul 3, 2019, 4:12:44 AM7/3/19
to Gremlin-users
Hi,

Thanks for suggestions.
Am I getting this right then:
If I am doing a query like
g.V().groupCount().by('weight')
I will always get this warning message, because per construction the groupcount needs to look at every vertex?

Antriksh Shah

unread,
Jul 3, 2019, 4:27:38 AM7/3/19
to Gremlin-users
How I understand is:
Indexing will help you when you want to fire pointed lookups. 
If any query that is going to require scanning all the vertices of a graph, indexing is not going to help you much.
If you can narrow down your input with g.V().has(key,value) and then apply a groupCount you might still be in the OLTP world depending on the amount of data you narrow down to.

Answering your question,
Yes with the query you are firing g.V().groupCount().by('weight'), you will get a warning message. 



You can also use query.force-index = TRUE to ensure you never fire a full scan OLTP query.

query.force-index

Whether JanusGraph should throw an exception if a graph query cannot be answered using an index. Doing solimits the functionality of JanusGraph’s graph queries but ensures that slow graph queries are avoided on large graphs. Recommended for production use of JanusGraph.

Boolean

false

MASKABLE


Lisa Fiedler

unread,
Jul 3, 2019, 4:48:25 AM7/3/19
to Gremlin-users
Hi,

I was trying to use graphcomputer to pose these queries as OLAP.
I therefore used graph.traversal().withComputer();
By means of the TimeUtil.clock() utility I however found that this version took considerably longer.

I am using Janusgraph with a Cassandra backend in embedded mode. I however, so far did not use hadoop.
Does that mean it is possible to conduct the above query without hadoop but poorly? Or what is happening if I am not using hadoop?

Antriksh Shah

unread,
Jul 3, 2019, 4:59:27 AM7/3/19
to Gremlin-users
Hey Lisa,

When you say OLAP query took more time compared to OLTP query, what is the size of data are you working on?
My guess is it will be a small set of data.

The moment you have data that cannot be loaded into your Memory, your OLTP query will not perform adequately. 
I would not say the query is constructed poorly. I will say if you have very small data or if you can load your entire graph into memory then you will always be fine with the query you are executing,
But the moment your data size grows, to fire queries that require full edgestore table scan, you are faster and better of with OLAP query. 
Reply all
Reply to author
Forward
0 new messages