Re: [Neo4j] eigenvector centrality through the REST API?

352 views
Skip to first unread message

Marko Rodriguez

unread,
Jul 24, 2012, 11:37:13 AM7/24/12
to ne...@googlegroups.com
Hi,

Most graph databases on the market today are not optimized for global graph algorithms like eigenvector centrality. Why? -- given that you will be touching the entire graph (multiple times via each iteration), you will want to have the entire graph in-memory, else you take the hit of Neo4j running to disk. You might want to look into local iterative models that are resource sensitive like the Google Caffeine model.

HTH,
Marko.


On Jul 23, 2012, at 11:27 AM, Andrew Ross wrote:

We have a Ruby-based application on Heroku with a largely static neo4j graph (we connect to it via the neo4j heroku add-on) that we want to use for advanced searching/data exploration. For some of the features we want to implement, we'd like to know the (eigenvector) centrality of each node, and we'd like to cache that centrality on each node after we generate the graph.

It seems like neo4j has the graph centrality algorithms we're looking for built-in. However, we aren't sure how to use the REST API to ask neo4j to run any of those algorithms.

So, our questions:
  • Is it possible to use the REST API to ask neo4j to calculate the eigenvector centrality of each node (and store that as a property on every node)?
  • If not, would we have to generate the graph on a server we can actually access, then write some Java to get the information we want?
Thanks for any help.
Andrew

Andrew Ross

unread,
Jul 26, 2012, 2:58:41 PM7/26/12
to ne...@googlegroups.com
We wanted to use eigenvector centrality because in most cases it is a very helpful measure of how 'important' a node is, but given the particular data set we're working with, it turned out that calculating degree centrality was actually sufficient. And that was very easy to do via the REST interface, node by node. 

However, performance isn't a huge consideration for us, because our graph is relatively small and readonly -- we want to calculate centrality just once, when the graph is generated. If it takes half an hour, that's regrettable but not terrible. However, doing that over the REST interface seems tricky.

James Thornton

unread,
Jul 26, 2012, 3:22:16 PM7/26/12
to ne...@googlegroups.com


On Thursday, July 26, 2012 1:58:41 PM UTC-5, Andrew Ross wrote:

However, performance isn't a huge consideration for us, because our graph is relatively small and readonly -- we want to calculate centrality just once, when the graph is generated. If it takes half an hour, that's regrettable but not terrible. However, doing that over the REST interface seems tricky.

Use Gremlin (https://github.com/tinkerpop/gremlin/wiki) -- Neo4j Server has a Gremlin Plugin built in.

- James 

Jim Webber

unread,
Jul 27, 2012, 6:13:25 AM7/27/12
to ne...@googlegroups.com
> Most graph databases on the market today are not optimized for global graph algorithms like eigenvector centrality. Why? -- given that you will be touching the entire graph (multiple times via each iteration), you will want to have the entire graph in-memory, else you take the hit of Neo4j running to disk. You might want to look into local iterative models that are resource sensitive like the Google Caffeine model.

Though given sufficient memory allocated to caches, you pay the disk penalty relatively infrequently compared to the work undertaken. But you do have to prepare the database (and the box it runs on) to undertake this kind of work (aka give it RAM).

Jim

Michael Hunger

unread,
Jul 27, 2012, 6:31:41 AM7/27/12
to ne...@googlegroups.com
As your graph is mainly static you can also run a simple cypher script to store the degree centrality on each node.

start n=node(*)
match n--()
with n,count(*) as centrality
set n.centrality = centrality

Then you can use that property, you can also use the first 3 lines of the snippet in your actual query for the nodes you're looking at.

Michael

Andrew Ross

unread,
Jul 27, 2012, 9:28:22 AM7/27/12
to ne...@googlegroups.com
Michael: that's currently what we're doing (although we weren't using with/set, and that will be a nice improvement). After doing a bit more research though, as James suggested it definitely seems like Gremlin will be the way to go if we want to do more complex operations like finding eigenvector centrality.

Andrew

Peter Neubauer

unread,
Jul 27, 2012, 10:16:55 AM7/27/12
to ne...@googlegroups.com

Yes,
For complex stuff, Groovy/Gremlin is a full scripting language with all the pros and cons of sending arbitrary scripts to the server.

/peter

Send from mobile.

Reply all
Reply to author
Forward
0 new messages