find vertex with highest number of edges

Topping Bowers

unread,

Apr 25, 2013, 8:08:16 AM4/25/13

to orient-...@googlegroups.com

Hi!

We're starting to use Orient and it's been a little bit of a road, but exciting stuff. I'm starting to put a graph in that looks like:

user ---mentioned---> url

I'm trying to figure out the best way (using tinkerpop) to find the url "with the most in edges." I was able, in 1.3.0 to do this pretty easy with SQL: select url, in.size() as in_size from Links ORDER_BY in_size DESC

That seems a lot harder in 1.4.0 sql - where it seems even in_mentioned.size() doesn't work... but... more importantly - what's the best way with gremlin or pure tinkerpop to get the "highest." - I was able to use groupCount to get *all* and then sort after that, but that doesn't sound like it'll scale...

That query is like:

graph.v("@class" => "User").out_e(:mentioned).in_v.groupCount(:url).sort {|a,b| a[1] <=> b[1] }

Forgive the weird syntax... we're using Pacer in Jruby, but i'm more interested in the philosophy of tinkerpop/gremlin rather than actual syntax.

So to sum it up:

"find the top 10 link vertexes with the highest count of in_edges, using Tinkerpop and/or gremlin"

Thanks for any help! I'm sure it's a newbie question.

Topper

Marko Rodriguez

unread,

Apr 25, 2013, 9:35:38 AM4/25/13

to orient-...@googlegroups.com

Hi,

graph.v("@class" => "User").out_e(:mentioned).in_v.groupCount(:url).sort {|a,b| a[1] <=> b[1] }

Forgive the weird syntax... we're using Pacer in Jruby, but i'm more interested in the philosophy of tinkerpop/gremlin rather than actual syntax.

So to sum it up:

"find the top 10 link vertexes with the highest count of in_edges, using Tinkerpop and/or gremlin"

g.V('@class','User').transform{[it, it.inE.count()]}.order{it.b[1] <=> it.a[1]}.next(10)

Here is your answer using the toy TinkerGraph deployed with Gremlin. --- note I don't do @class='User' as that doesn't exist in the dataset.

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V.transform{[it,it.inE.count()]}
==>[v[3], 3]
==>[v[2], 1]
==>[v[1], 0]
==>[v[6], 0]
==>[v[5], 1]
==>[v[4], 1]
gremlin> g.V.transform{[it,it.inE.count()]}.order{it.b[1] <=> it.a[1]}
==>[v[3], 3]
==>[v[2], 1]
==>[v[5], 1]
==>[v[4], 1]
==>[v[1], 0]
==>[v[6], 0]
gremlin> g.V.transform{[it,it.inE.count()]}.order{it.b[1] <=> it.a[1]}.next(3)
==>[v[3], 3]
==>[v[2], 1]
==>[v[5], 1]

I believe this is trivial to map over to Pacer syntax.

HTH,

Marko.

http://markorodriguez.com

Yingshou Guo

unread,

Apr 25, 2013, 11:34:54 AM4/25/13

to orient-...@googlegroups.com

I'm doing something similar. IMHO the simplest and most scalable solution is add a count property to the vertex and increment it each time a new edge is added.

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Luca Garulli

unread,

Apr 25, 2013, 11:40:36 AM4/25/13

to orient-database

Hi,

count() against edges is very cheap operation (OrientDB keeps track of such counter like you would do).

Lvc@

Yingshou Guo

unread,

Apr 25, 2013, 11:54:08 AM4/25/13

to orient-...@googlegroups.com

Hi Luca,

The problem with me using count() is that I don't know how to use it together with order by clause.

In the following example, Would you please tell me how should I order by the number of in_ edge?

orientdb> select in_ from V

---+---------
#| RID     |
---+---------
0|    #-2:1
1|    #-2:2
2|    #-2:3
3|    #-2:4
4|    #-2:5
5|    #-2:6
6|    #-2:7
7|    #-2:8
8|    #-2:9
9|   #-2:10|#38:0
10|   #-2:11|[3]
11|   #-2:12|#38:2
12|   #-2:13|#38:3
13|   #-2:14|#38:4
14|   #-2:15|#38:5
15|   #-2:16|[2]
16|   #-2:17|#38:7
17|   #-2:18|[6]
18|   #-2:19|#37:0
19|   #-2:20|#37:1

My Query does not work:

orientdb> select from V order by count(in_) desc

Error: com.orientechnologies.orient.core.sql.OCommandSQLParsingException: Error on parsing command at position #0: Error on parsing command at position #28: Ordering mode 'IN_' not supported. Valid is 'ASC', 'DESC' or nothing ('ASC' by default)
Command: select from V order by count(in_)

Topping Bowers

unread,

Apr 25, 2013, 12:13:07 PM4/25/13

to orient-...@googlegroups.com

I think this:

select *, in_linked.size() AS out_size from V ORDER BY out_size DESC

That's because I have an edge type "linked" - so it's in_#{edge.className}.size()

You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/0RgCgSE0yuQ/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Topper Bowers

top...@toppingdesign.com

http://amicushq.com

http://blog.toppingdesign.com

Yingshou Guo

unread,

Apr 25, 2013, 12:20:50 PM4/25/13

to orient-...@googlegroups.com

Hi,

Thanks and I'll try it in my usecase.

Best,

Guo Yingshou

Luca Garulli

unread,

Apr 26, 2013, 2:47:34 AM4/26/13

to orient-database

Hi,

Topping says well. However you can do something similar to what you did with 1.3.0 by using the new in() function:

select url, in().size() as in_size from Links ORDER_BY in_size DESC

To count only the edges of type "linked" (label = linked) you can do:

select url, in('linked').size() as in_size from Links ORDER_BY in_size DESC

or again:

select url, edges('in', 'linked').size() as in_size from Links ORDER_BY in_size DESC

Lvc@

Ronnie

unread,

Jul 1, 2016, 6:18:14 AM7/1/16

to OrientDB

Hi,

I tried to use the below query to get the count of vertices with most edges:

select *, in().size() as size from FamilyMember ORDER_BY size DESC

I have about 33 million vertices and the query seems to be taking ages.

Please suggest how i can improve the query or suggest possible cause for such a long execution time...

I am using 2.2.0 community edition on a stable dedicated machine with decent configuration (xeon e5 with 28G RAM)

Ron

Luca Garulli

unread,

Jul 1, 2016, 1:30:53 PM7/1/16

to OrientDB

Hi,

You're retrieving 33M of rows as result in this way + the order by consumes a lot of RAM:

select *, in().size() as size from FamilyMember ORDER_BY size DESC

Do you need 33M of results or can you se a limit? Furthermore please can you add the PARALLEL keyword at the end of the select?

This is the query:

select *, in().size() as size from FamilyMember ORDER_BY size DESC LIMIT 1000 PARALLEL

Best Regards,

Luca Garulli

Founder & CEO

OrientDB LTD

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward