Selection of indexCandidates is empty when using disjuctive top level query

56 views
Skip to first unread message

Sylvain Julmy

unread,
Sep 18, 2020, 8:31:39 AM9/18/20
to JanusGraph users
Dear all,

within our project, we find out that query of the following form

Or(
    And(has('field1','value1'),has('~label','label1'),
    And(has('field2','value2'),has('~label','label1'),
...
)

with composite indexes on 'field1' and 'field2', does not use indexes for the sub And query.

It seems that the condition at GraphCentricQueryBuilder.java:261 filter out the Or condition, is there any reason for that ?

Sylvain Julmy

HadoopMarc

unread,
Sep 18, 2020, 9:32:47 AM9/18/20
to JanusGraph users
Hi Sylvain,

Could you please add your findings to:


Maybe, the gremlin union() step can offer a workaround?

Best wishes,    Marc


Op vrijdag 18 september 2020 om 14:31:39 UTC+2 schreef sylvai...@gmail.com:

BO XUAN LI

unread,
Sep 18, 2020, 9:34:04 AM9/18/20
to janusgra...@googlegroups.com
Hi Sylvain,

Looks like the other Or condition of your query does not utilize index, and needs a full scan. Under this circumstance, JanusGraph does not bother firing index queries for your given Or condition.

Best regards,
Boxuan

-- 
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/8750b910-fd50-4247-b3d0-57e86d74d508n%40googlegroups.com.

BO XUAN LI

unread,
Sep 18, 2020, 9:46:59 AM9/18/20
to janusgra...@googlegroups.com
Sorry I was wrong. JanusGraph should still fire index queries for your given Or query, even if one or more other Or conditions requires full scan.

Sylvain, how did you know JanusGraph did not use indexes for your sub And query? Does it use indexes when you only have this And condition?

Best regards,
Boxuan

Sylvain Julmy

unread,
Sep 21, 2020, 12:38:12 AM9/21/20
to JanusGraph users
Hi HadoopMarc,

we are using the Transaction API, and there are no and() or union() step defined in it. We just do the work with the or() and has() step.

Like the following (its in scala, but I don't think it matters) :

val queryBuilder = transaction.query().asInstanceOf[GraphCentricQueryBuilder]
val subQuery1 = transaction.query().asInstanceOf[GraphCentricQueryBuilder].has("field1",v1).has("~label", label)
val subQuery2 = transaction.query().asInstanceOf[GraphCentricQueryBuilder].has("field2",v2).has("~label", label)
queryBuilder.or(subQuery1).or(subQuery2)

and, internally, the query is transformed into Or(And(...),And(...))

Best wishes,
Sylvain

Sylvain Julmy

unread,
Sep 21, 2020, 12:41:26 AM9/21/20
to JanusGraph users
Hi Boxuan,

well we know that JanusGraph does not use indexes because it log a warning message that it would do a fullscan and, with the debugger, you can look at the precise part of the code I give, when selecting index candidates, if the top level condition is an Or, nothing is selected.

When we only have the And() condition, indexes are selected correctly.

Best wishes,
Sylvain

BO XUAN LI

unread,
Sep 21, 2020, 9:17:52 AM9/21/20
to janusgra...@googlegroups.com
Hi Sylvain,

> well we know that JanusGraph does not use indexes because it log a warning message that it would do a fullscan

This is not completely correct. If you see this warning message, it means JanusGraph does not use indexes for at least one condition in your query. It could have used indexes for other conditions.

If I understand correctly, you have 2 “and" conditions, and each of which when used independently, is satisfied by some index. However, when they are combined using a “Or” clause, indexes are not being used. If true, then this looks like a bug to me, but I cannot reproduce it on 0.5.2. Which version are you using? Can you provide a minimal example which could showcase it?

> you can look at the precise part of the code I give, when selecting index candidates, if the top level condition is an Or, nothing is selected

It does not work in the way you presume. You could set a debug point at that line and observe how it is invoked multiple times. JanusGraph tries to first pick up a single mixed index which can cover both conditions in the “Or" (as you described - nothing is selected), and then picks up indexes for each condition in the “Or” clause respectively, so that it can merge the results later. If one condition uses some index while another condition does not, then a full scan is still needed and you would still see the full scan warning message.

Hope this helps,
Boxuan

BO XUAN LI

unread,
Sep 21, 2020, 9:25:13 AM9/21/20
to janusgra...@googlegroups.com
If you are using a version older than 0.3.0, then it would make sense to me because seems index support for “Or” clause is added in 0.3.0. See https://github.com/JanusGraph/janusgraph/pull/927

Sylvain Julmy

unread,
Sep 21, 2020, 10:21:39 AM9/21/20
to JanusGraph users
Hi Boxuan,

I put an example of a query we try to make working with indexes at the end of the message, it is a test case I wrote in the QueryTest.java file.

> It does not work in the way you presume.

from GraphCentricQueryBuilder.java:261
indexType -> indexType.getElement() == resultType && !(conditions instanceof Or && (indexType.isCompositeIndex() || !serializer.features((MixedIndexType) indexType).supportNotQueryNormalForm()))));

Maybe I am just stupid and I don't see it, but the conditions instanceof Or would is always true (if the toplevel query is an Or, which is the case for our queries) and we only have compositeIndex, so the indexType would never be picked in the indexCandidates Set, right ?
Therefore all indexType would be filtered out of the collection and no index would be used for the query.

And we are using JanusGraph 0.5.2 (and impatient to go with the 0.6 :) ! )

Thx for your time and best wishes !
Sylvain

--------------------

@Test
public void testTopLevelOrUseIndexesForSubQuery() {
JanusGraphManagement mgmt = graph.openManagement();
PropertyKey prop1Key = mgmt.makePropertyKey("prop1").dataType(String.class).make();
PropertyKey prop2Key = mgmt.makePropertyKey("prop2").dataType(String.class).make();

mgmt.buildIndex("prop1_idx", Vertex.class).addKey(prop1Key).buildCompositeIndex();
mgmt.buildIndex("prop2_idx", Vertex.class).addKey(prop2Key).buildCompositeIndex();

mgmt.commit();

for (int i = 0; i < 20; i++) {
tx.addVertex("file").property("prop1", "p1_" + i).element().property("prop2", "p2_" + i);
}

GraphCentricQueryBuilder andQueryBuilder = (GraphCentricQueryBuilder) tx.query();
andQueryBuilder.has("prop1", "p1_9").has("~label", "file");

// this is good, andQuery.indexQuery.backendQuery.queries contain one JointIndexQuery and use the prop1_idx:multiKSQ[1]@2005 index
GraphCentricQuery andQuery = andQueryBuilder.constructQuery(ElementCategory.VERTEX);

Iterable<JanusGraphVertex> resultAnd = andQueryBuilder.vertices();

GraphCentricQueryBuilder orQueryBuilder = (GraphCentricQueryBuilder) tx.query();

GraphCentricQueryBuilder subQuery1 = (GraphCentricQueryBuilder) tx.query();
GraphCentricQueryBuilder subQuery2 = (GraphCentricQueryBuilder) tx.query();

subQuery1.has("prop1", "p1_9").has("~label", "file");
subQuery2.has("prop2", "p2_9").has("~label", "file");

orQueryBuilder.or(subQuery1).or(subQuery2);

// this is good, andQuery.indexQuery.backendQuery.queries contain nothing
GraphCentricQuery orQuery = orQueryBuilder.constructQuery(ElementCategory.VERTEX);

Iterable<JanusGraphVertex> resultOr = orQueryBuilder.vertices();
}

BO XUAN LI

unread,
Sep 21, 2020, 12:06:04 PM9/21/20
to janusgra...@googlegroups.com
Hi Sylvain,

I think I got where your confusion came from.

Your understanding of GraphCentricQueryBuilder.java:261 is absolutely correct (and not stupid!). The problem is with the way you create your query.

Rather than building a GraphCentricQuery by yourself (which is not recommended because it is an internal interface), you should do a gremlin query:

g.V().hasLabel("file").or(__.has("prop1", "p1_9"), __.has("prop2", "p2_9")).toList();

By using the query above, JanusGraph should be able to use indexes.

FYI, The magic is at JanusGraphStep (see the usage of hasLocalContainers), where each condition in the “Or” clause will fire a index query separately. This will not be effective if you are not using a gremlin query (which explains why you got confused by my words! :P).

Btw, the following query seems to trigger a full scan:

g.V().or(__.and(__.hasLabel("file"), __.has("prop1", "p1_9")), __.and(__.hasLabel("file"), __.has("prop2", "p2_9"))).toList();
which is worth investigating. But anyway, you could use the first gremlin query which hopefully works as expected.

Hope this helps,
Boxuan


Sylvain Julmy

unread,
Sep 25, 2020, 1:17:32 AM9/25/20
to JanusGraph users
Hi Boxuan,

thank you very much for the clarification :) !

I've applied the fixes (moving from the transaction API to the Gremlin one) and worked perfectly.
I don't know why we used the transaction API instead of the Gremlin one, I've to ask my team mates...

Thanks for the time passed on this, kind regards !

Sylvain
Reply all
Reply to author
Forward
0 new messages