Centric Indexes failing to support all conditions for better performance.

71 views
Skip to first unread message

chrism

unread,
Dec 9, 2020, 10:01:56 PM12/9/20
to JanusGraph users
is describing usage of Vertex Centrix Index [edge=battled + properties=(rating,time)]
g.V(h).outE('battled').has('rating', 5.0).has('time', inside(10, 50)).inV()

From my understanding profile() of above is reporting \_isFitted=true
to indicate that backend-query delivered all results as conditions: 
\_condition=(rating = 0.5 AND time > 10 AND time < 50 AND type[battled])

Two things are obvious from above: centric index is supporting multiple property keys, and equality and range/interval constraints.
However isFitted is false for all kind of conditions or combinations which are not really breaking the above rules, still in range constraints:

a) g.V(h).outE('battled').has('rating',lt(5.0)).has('time', inside(10, 50)).inV()   // P.lt used for first key
b) g.V(h).outE('battled').has('rating',gt(5.0)) // P.gt used
c) g.V(h).outE('battled').or( hasNot('rating'), has('rating',eq(5.0)) ) // OrStep() used

Even b) can be "fitted" by  has('rating',inside(5.0,Long.MAX_VALUE)) 
all that is very confusing, and probably not working as expected, what I am doing wrong? 
as from my experience only one property key can be used for query conditions and using index, the second is ignored.

Having isFitted=false is not really improving performance, from my understanding,
when one only condition allows to get most of my edges and is asking to filter them in memory,  as this is stated by implementation of BasicVertexCentricQueryBuilder.java.
Are there limitations not described in the JG doco? It is a glitch?

Can you offer explanation how to utilize Centric Indexes for edges in full support? 

Christopher

BO XUAN LI

unread,
Dec 13, 2020, 3:24:13 AM12/13/20
to janusgra...@googlegroups.com
Hi Christopher,

isFitted = true basically means no in-memory filtering is needed. If you see isFitted = false, it does not necessarily mean vertex-centric indexes are not used. It could be the case that some vertex-centric index is used, but further in-memory filtering is still needed.
If you see isFitted = false, it does not necessarily mean any index is used. It could be the case that you are fetching all edges of a given vertex.

I totally understand your confusion because the documentation does not explain how the vertex-centric index is built. In JanusGraph, vertices and edges are stored in the “edgestore” store, while composite indexes are stored in the “graphindex” store. Mixed indexes
are stored in external index store like Elasticsearch. This might be a bit counter-intuitive, but vertex-centric indexes are stored in the “edgestore” store. Recall how edges are stored (https://docs.janusgraph.org/advanced-topics/data-model/#individual-edge-layout): 

 
Roughly speaking, If you don’t have any vertex-centric index, then your edge is stored once for one endpoint. If you have one vertex-centric index, then applicable edges are stored twice. If you have two vertex-centric indexes, then applicable edges are stored
three times… These edges, although seemingly duplicate, have different “sort key”s which conform to corresponding vertex-centric indexes. Let’s say you have built an “battlesByRating” vertex-centric index based on the property “rating”, then apart from the
ordinary edge, JanusGraph creates an additional edge whose “sort key” is the rating value. Because the “column” is sorted in the underlying data storage (e.g. “column” in JanusGraph model is mapped to “clustering column” in Cassandra), you essentially gain
the ability to search an index by “rating” value/range.

What happens when your vertex-centric index has two properties like the following?

mgmt.buildEdgeIndex(battled, 'battlesByRatingAndTime', Direction.OUT, Order.asc, rating, time)

Now your “sort key” is a combination of “rating” and “time” (note “rating” comes before “time”). Under this vertex-centric index, “sort key”s look like this:

(rating=1, time=2), (rating=1, time=3), (rating=2, time=1), (rating=2, time=5), (rating=4, time=2), …

This explains why isFitted = true when your query is has('rating', 5.0).has('time', inside(10, 50)) but not when your query is  has(time', 5.0).has(‘rating', inside(10, 50)).Again, note that isFitted = false does not necessarily
mean your query is not optimized by vertex-centric index. I think the profiler shall be improved to state whether and which vertex-centric index is used.

I am not quite sure about the case b) you mentioned. Seems it’s a design consideration but right now I cannot tell why it is there.

“hasNot" almost never uses indexes because JanusGraph cannot index something that does not exist. (Note that “null” value is not valid in JanusGraph).

Hope this helps.

Best regards,
Boxuan

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/f8fb537e-216a-462d-928b-ac906eb707a3n%40googlegroups.com.

chrism

unread,
Dec 13, 2020, 10:56:57 PM12/13/20
to JanusGraph users
Thank you Boxuan Li, 

It is obvious that your are an expert, is  any other way apart of isFitted=true to know that index is used or not?
(It may be even debugging JanusGraph server or Cassandra)

We need to construct Gremlin query, to utilize these indexes in full, and always,... problem is just what to type,
as our implementation requires more complicated than above conditions to match, using above as sample it would be:
(rating >= value AND time < value) OR HasNot( time )  - means that "time" was not specified.
What is visible from profile() is that we cannot use coalesce() or or() steps, and trying all kind of workarounds 
cannot be verified easily having isFitted=false and no other "good" indication of using indexes.

Cheers, Christopher

Boxuan Li

unread,
Dec 14, 2020, 10:26:02 AM12/14/20
to JanusGraph users
Hi Christopher,

I don't have any workaround in mind except testing and comparing query latencies.

I have created https://github.com/JanusGraph/janusgraph/issues/2283 which hopefully can be addressed before the next release. That being said, there is no planned date for the next release yet.

Btw as I mentioned earlier, if you use "hasNot" it almost never leverages index - no matter if it's a mixed or composite or vertex-centric index.

Best regards,
Boxuan

chrism

unread,
Dec 14, 2020, 6:31:24 PM12/14/20
to JanusGraph users
Thank you, looking forward to have profile() with such information added.
Cheers, CM
Reply all
Reply to author
Forward
0 new messages