Hello,
I am trying to optimize a simple recommendation query, and am somewhat stuck.
*Our Setup*
- Janusgraph 0.5.1
- Storage Backend: Scylla DB 3.2.4
*The Graph*
Our Graph contains millions of vertices and edges. In the relevant part, we have the following
Vertex: user (with several properties)
Vertex: query (with several properties, one being "title")
user is linked to query by an edge "searched".
Each user can have multiple searches, and it is possible that a user has different searches with the same title (but then other properties would differ)
*The Scenario*
I know that a user searched for something, let's say "Snowboard", and I want to present him with related search terms by doing an "other users searching for Snowboard also searched for" query.
Originally I started with the following query:
g.V().has('query', 'title', 'snowboard').in('searched').out('searched').has('query', 'title', neq('snowboard')).has('title').dedup().as("related").select("related").by('title').groupCount().order(Scope.local).by(Column.values, Order.desc).profile()
But the query time was beyond acceptable, thus I decided to do do the grouping and counting rather in the code (Java) then via the gremlin query. (after a timeLimit didn't bring the hoped for improvement, respetively really bad results)
The simplified gremlin query now looks as follows
g.V().has('search', 'title', within('snowboard', 'Snowboard')).in('searched').dedup().out('searched').values('title')
Doing a profile on that query, I see that the "values" step, costs a lot of time. I already tried with the query-fast option, but that didn't help any.
The profile step returns me the following
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(search), sear... 30246 30246 290.388 3.72
\_condition=(~label = search AND (title = snowboard OR title = Snowboard))
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[2]@4000
\_index=bySearchTitle
optimization 0.030
optimization 14.736
backend-query 0.000
\_query=bySearchTitle:multiKSQ[2]@4000
\_limit=4000
JanusGraphVertexStep(IN,[searched],... 30245 30245 1811.447 23.22
\_condition=type[searched]
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@a4661abd
\_multi=true
\_vertices=30246
optimization 4.640
backend-query 30245 1592.684
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@a4661abd
DedupGlobalStep 10296 10296 11.328 0.15
JanusGraphVertexStep(OUT,[searched]... 79241 79241 1293.578 16.58
\_condition=type[searched]
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@a46616dd
\_multi=true
\_vertices=10296
optimization 0.174
backend-query 79241 557.447
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@a46616dd
NoOpBarrierStep(2500) 79241 79241 52.760 0.68
JanusGraphPropertiesStep([title],value) 47709 47709 4322.742 55.42
\_condition=type[title]
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@8121f1dd
\_multi=true
\_vertices=79241
optimization 2.133
backend-query 47709 3969.057
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@8121f1dd
NoOpBarrierStep(2500) 47709 2293 18.347 0.24
>TOTAL - - 7800.592 -
What am I missing? Where is some room for improvement?
Gladly looking forward to any hint.
Regards
Claire