find vertex with highest number of edges

1,327 views
Skip to first unread message

Topping Bowers

unread,
Apr 25, 2013, 8:08:16 AM4/25/13
to orient-...@googlegroups.com
Hi!

We're starting to use Orient and it's been a little bit of a road, but exciting stuff.  I'm starting to put a graph in that looks like:

user ---mentioned---> url

I'm trying to figure out the best way (using tinkerpop) to find the url "with the most in edges."  I was able, in 1.3.0 to do this pretty easy with SQL:  select url, in.size() as in_size from Links ORDER_BY in_size DESC

That seems a lot harder in 1.4.0 sql - where it seems even in_mentioned.size() doesn't work... but... more importantly - what's the best way with gremlin or pure tinkerpop to get the "highest."  - I was able to use groupCount to get *all* and then sort after that, but that doesn't sound like it'll scale...

That query is like:

graph.v("@class" => "User").out_e(:mentioned).in_v.groupCount(:url).sort {|a,b| a[1] <=> b[1] }

Forgive the weird syntax... we're using Pacer in Jruby, but i'm more interested in the philosophy of tinkerpop/gremlin rather than actual syntax.

So to sum it up:

"find the top 10 link vertexes with the highest count of in_edges, using Tinkerpop and/or gremlin"

Thanks for any help!  I'm sure it's a newbie question.

Topper

Marko Rodriguez

unread,
Apr 25, 2013, 9:35:38 AM4/25/13
to orient-...@googlegroups.com
Hi,

graph.v("@class" => "User").out_e(:mentioned).in_v.groupCount(:url).sort {|a,b| a[1] <=> b[1] }

Forgive the weird syntax... we're using Pacer in Jruby, but i'm more interested in the philosophy of tinkerpop/gremlin rather than actual syntax.

So to sum it up:

"find the top 10 link vertexes with the highest count of in_edges, using Tinkerpop and/or gremlin"

g.V('@class','User').transform{[it, it.inE.count()]}.order{it.b[1] <=> it.a[1]}.next(10)

Here is your answer using the toy TinkerGraph deployed with Gremlin. --- note I don't do @class='User' as that doesn't exist in the dataset.

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V.transform{[it,it.inE.count()]}
==>[v[3], 3]
==>[v[2], 1]
==>[v[1], 0]
==>[v[6], 0]
==>[v[5], 1]
==>[v[4], 1]
gremlin> g.V.transform{[it,it.inE.count()]}.order{it.b[1] <=> it.a[1]}
==>[v[3], 3]
==>[v[2], 1]
==>[v[5], 1]
==>[v[4], 1]
==>[v[1], 0]
==>[v[6], 0]
gremlin> g.V.transform{[it,it.inE.count()]}.order{it.b[1] <=> it.a[1]}.next(3)
==>[v[3], 3]
==>[v[2], 1]
==>[v[5], 1]

I believe this is trivial to map over to Pacer syntax.

HTH,
Marko.

Yingshou Guo

unread,
Apr 25, 2013, 11:34:54 AM4/25/13
to orient-...@googlegroups.com
I'm doing something similar. IMHO the simplest and most scalable solution is add a count property to the vertex and increment it each time a new edge is added.


--
 
---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Luca Garulli

unread,
Apr 25, 2013, 11:40:36 AM4/25/13
to orient-database
Hi,
count() against edges is very cheap operation (OrientDB keeps track of such counter like you would do).

Lvc@

Yingshou Guo

unread,
Apr 25, 2013, 11:54:08 AM4/25/13
to orient-...@googlegroups.com
Hi Luca,

The problem with me using count() is that I don't know how to use it together with order by clause.

In the following example, Would you please tell me how should I order by the number of in_ edge?

orientdb> select in_ from V

---+---------
  #| RID     |
---+---------
  0|    #-2:1
  1|    #-2:2
  2|    #-2:3
  3|    #-2:4
  4|    #-2:5
  5|    #-2:6
  6|    #-2:7
  7|    #-2:8
  8|    #-2:9
  9|   #-2:10|#38:0              
 10|   #-2:11|[3]                
 11|   #-2:12|#38:2              
 12|   #-2:13|#38:3              
 13|   #-2:14|#38:4              
 14|   #-2:15|#38:5              
 15|   #-2:16|[2]                
 16|   #-2:17|#38:7              
 17|   #-2:18|[6]                
 18|   #-2:19|#37:0              
 19|   #-2:20|#37:1   

My Query does not work:

orientdb> select from V order by count(in_) desc

Error: com.orientechnologies.orient.core.sql.OCommandSQLParsingException: Error on parsing command at position #0: Error on parsing command at position #28: Ordering mode 'IN_' not supported. Valid is 'ASC', 'DESC' or nothing ('ASC' by default)
Command: select from V order by count(in_)

Topping Bowers

unread,
Apr 25, 2013, 12:13:07 PM4/25/13
to orient-...@googlegroups.com
I think this:

select *in_linked.size(AS out_size from ORDER BY out_size DESC

That's because I have an edge type "linked" - so it's in_#{edge.className}.size()


You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/0RgCgSE0yuQ/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Yingshou Guo

unread,
Apr 25, 2013, 12:20:50 PM4/25/13
to orient-...@googlegroups.com
Hi,

Thanks and I'll try it in my usecase.

Best,

Guo Yingshou

Luca Garulli

unread,
Apr 26, 2013, 2:47:34 AM4/26/13
to orient-database
Hi,
Topping says well. However you can do something similar to what you did with 1.3.0 by using the new in() function:

select url, in().size() as in_size from Links ORDER_BY in_size DESC

To count only the edges of type "linked" (label = linked) you can do:

select url, in('linked').size() as in_size from Links ORDER_BY in_size DESC

or again:

select url, edges('in', 'linked').size() as in_size from Links ORDER_BY in_size DESC

Lvc@

Ronnie

unread,
Jul 1, 2016, 6:18:14 AM7/1/16
to OrientDB
Hi,

I tried to use the below query to get the count of vertices with most edges:

select *, in().size() as size from FamilyMember ORDER_BY size DESC

I have about 33 million vertices and the query seems to be taking ages.
Please suggest how i can improve the query or suggest possible cause for such a long execution time... 

I am using 2.2.0 community edition on a stable dedicated machine with decent configuration (xeon e5 with 28G RAM)

Ron

Luca Garulli

unread,
Jul 1, 2016, 1:30:53 PM7/1/16
to OrientDB
Hi,
You're retrieving 33M of rows as result in this way + the order by consumes a lot of RAM:

select *, in().size() as size from FamilyMember ORDER_BY size DESC

Do you need 33M of results or can you se a limit? Furthermore please can you add the PARALLEL keyword at the end of the select?

This is the query:

select *, in().size() as size from FamilyMember ORDER_BY size DESC LIMIT 1000 PARALLEL


Best Regards,

Luca Garulli
Founder & CEO


For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages