Gremlin query: Get the top 10 vertices with the most number of incoming edges with edgeLabel "XXXXX"

1,736 views
Skip to first unread message

Kabira

unread,
Jul 16, 2015, 3:24:54 PM7/16/15
to aureliu...@googlegroups.com
Hello,

Titan Version: Titan 0.9.0-M2 - TP3

I'm trying to write a gremlin query to retrieve the top 10 vertices with most number of incoming edges of edge label 'xxxxxx'.

For eg: In TP2 and Gremlin older versions --> Top ten users with most number of 'votes_for' edges.

g.V.transform{[it,it.inE.count()]}.order{it.b[1] <=> it.a[1]}[0..9]

I've written a bunch of queries using the TP3 gremlin but with no success. Can some one help me out here please.

Thanks.

Kabira

unread,
Jul 16, 2015, 4:31:54 PM7/16/15
to aureliu...@googlegroups.com
Hello All,

I came up with the following queries which satisfy the purpose but I'm not sure about the performance. Please make any comments.

A) g.V().outE('votes_for').inV().order().by(inE('votes_for').count(), decr).dedup().limit(10)

    This query plan first fetches all the vertices that have incoming 'votes_for' edges and then orders them by the count of the same, which I think will filter out a good chunk of vertices in the initial step. And if we use Vertex-Centric indices it can be optimized even more.

B) g.V().order().by(inE('votes_for').count(), decr).limit(10)

    This query iterates through all the vertices which might be very slow. And, Here we are ordering all the vertices where as the in the above query we are ordering only a small chunk. Please correct me if I'm wrong.

Please kindly leave any useful comments on whether the above queries and my approaches are correct and efficient enough.

Daniel Kuppitz

unread,
Jul 16, 2015, 7:17:27 PM7/16/15
to aureliu...@googlegroups.com
Both traversals will iterate through all vertices. In fact, a solution that doesn't iterate all vertices, simply doesn't exist.

g.V().order().by(inE('votes_for').count(), decr).limit(10)

This one is pretty straightforward. Easy to read and write. However, note that inE('votes_for').count() will be evaluated a lot, not just 1 time per  vertex. The following query has the best execution plan:

g.E().inV().groupCount().order(local).by(valueDecr).limit(local, 10).mapKeys()  // next Titan release
g.E().inV().groupCount().order(local).by(valueDecr).limit(local, 10).flatMap {it.get().keySet().iterator()}  // current Titan release (0.9.0-M2)

Cheers,
Daniel

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/8e37d3dd-a06a-4482-8b85-ad545987df19%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Kabira

unread,
Jul 17, 2015, 1:08:55 PM7/17/15
to aureliu...@googlegroups.com
Thanks Daniel for the useful comments.


-In fact, a solution that doesn't iterate all vertices, simply doesn't exist.
 
       In your solution, how are you iterating through all the vertices? Its just the edges right - g.E()? And can you please hint or comment on why it is the best execution plan?

Thanks for your time. 

Daniel Kuppitz

unread,
Jul 17, 2015, 1:46:54 PM7/17/15
to aureliu...@googlegroups.com
 In your solution, how are you iterating through all the vertices? Its just the edges right - g.E()?

Right. The traversal won't touch any vertices without incoming edges. But given that you're looking for the top 10 vertices (order by number of incoming edges), I thought it may be okay to skip vertices w/o incoming edges.

And can you please hint or comment on why it is the best execution plan?

I simply tried several different traversals and appended .profile().cap(TraversalMetrics.METRICS_KEY) to see how many vertices were touched, how many traversers generated, etc..

Cheers,
Daniel


Kabira

unread,
Jul 17, 2015, 2:13:44 PM7/17/15
to aureliu...@googlegroups.com
Thanks Daniel. Those are some useful tips/techniques.

Thanks for your time.
Reply all
Reply to author
Forward
0 new messages