Is There a Cypher (neo4j) Equivalent for NetworkX?

2,759 views
Skip to first unread message

Jim Rutt

unread,
Feb 2, 2015, 11:50:46 AM2/2/15
to networkx...@googlegroups.com
Is there a Cypher (neo4j) analog for NetworkX?  Or other method of "smart' querying the graph?

Jim Rutt

unread,
Feb 2, 2015, 12:34:16 PM2/2/15
to networkx...@googlegroups.com
As an example, a tool that would allow one to specify the following:
    
 Starting at Node "Frame_116" traverse Edges that have an attribute Type=Inheritance and collect Nodes that have attribute Species="dog".  Stop when done with distance of three links from Start.   Return the collection of Nodes.   

Daniel Schult

unread,
Feb 2, 2015, 1:52:06 PM2/2/15
to networkx...@googlegroups.com
 Starting at Node "Frame_116" traverse Edges that have an attribute Type=Inheritance and collect Nodes that have attribute Species="dog".  Stop when done with distance of three links from Start.   Return the collection of Nodes.   

something like(untested): [suppose G is the network with attributes]
H=nx.Graph()
H.add_nodes_from(G)
H.add_edges_from( (u,v) for (u,v,d) in G.edges(data=true) if d['Type']==Inheritance )
nodes=single_source_shortest_path_length(H, source="Frame_116", cutoff=3)
dog_nodes = set( n for n in nodes if G.node[n]['Species']=="dog" )

So you have to do the restrictions by hand, but it's straightforward.

Jim Rutt

unread,
Feb 2, 2015, 3:51:35 PM2/2/15
to networkx...@googlegroups.com
Daniel:  not exactly what I was looking for, but helpful none-the-less.  I neglected to specify it in the problem definition, but the cases i'm looking to address have Graphs with large numbers of Nodes and Edges ... let's say 100,000 nodes and 1,000,000 edges.  The target sets of collected nodes are at the most hundreds of nodes size.  Thus doing a brute force copy of the Graph would seem to be overkill and since this use case also has near-real-time requirements, it's probably not feasible from a time perspective.   

Query facilities such as Cypher in neo4j make exactly my kinds of queries fast, even on huge graphs. The query merely traverses the Node and Edge types as specified, collecting the target Node types, until the specified range is reached.   Using that style of traversal query, Cypher search time is proportional to the found set size, NOT the Graph size.  


This is my first day playing with NetworkX and I hadn't yet stumbled across 'single_source_shortest_path_length' that at least reduces the scope of the problem some.  

Closer to what i am looking for would be something like   'single_source_shortest_path_length' but with the additional filters: "only traverse an Edge if it has the attribute Type=Inheritance and only collect the Node on the other end of the Edge if it has the attribute Species=Dog".  

I could easily write that query by hand, and may well as a learning exercise, but  it would be handy to have a template style "Smart traverser and collector" like Cypher to be able to create them easily.

Here's a link that describes Neo4J's Cypher:  http://neo4j.com/developer/cypher-query-language/

Jim Rutt

unread,
Feb 2, 2015, 4:09:26 PM2/2/15
to networkx...@googlegroups.com
Also interesting to discover just now that 

collection =  G[node]

is like 35 times faster than 

collection = nx.single_source_shortest_path_length(G,node,cutoff=1)


On Monday, February 2, 2015 at 1:52:06 PM UTC-5, Daniel Schult wrote:

Moritz Beber

unread,
Feb 2, 2015, 7:31:28 PM2/2/15
to networkx...@googlegroups.com
Hey,

On Mon, Feb 2, 2015 at 9:51 PM, Jim Rutt <jim...@gmail.com> wrote:
Daniel:  not exactly what I was looking for, but helpful none-the-less.  I neglected to specify it in the problem definition, but the cases i'm looking to address have Graphs with large numbers of Nodes and Edges ... let's say 100,000 nodes and 1,000,000 edges.  The target sets of collected nodes are at the most hundreds of nodes size.  Thus doing a brute force copy of the Graph would seem to be overkill and since this use case also has near-real-time requirements, it's probably not feasible from a time perspective.   

 
Not that I want to discourage you from experimenting with networkx, a great library that I use and love a lot, but with networks of that size you might be better off using other implementations from the start. If speed and memory efficiency are your foremost concern and not ease of data integration, the programming style or neat tricks then I would go with something like Boost::Graph or igraph or whatever you may already know.

I hope you enjoy networkx, though.
Moritz

Jim Rutt

unread,
Feb 3, 2015, 8:14:38 AM2/3/15
to networkx...@googlegroups.com
i do use other graph engines for other purposes, such as neo4j and especially the custom hypergraph Db inside of http://opencog.org.

the reason I'm playing with networkX is to explore the domain of "computation that lives on a graph".  After just a day, it is clear that my conjecture that NetworkX would be good for that has been highly confirmed.   i love how easy it is to associate an Instance Object with a node, and actually invoke the methods of that Object thru the graph syntax ala:

class TestObj:
    flag = "wawa"
    def summit(self,a,b):
        return (a + b)

anObj = TestObj()

G.add_node("jetboy",myobj=anObj)

print  G.node["jetboy"]["myobj"].summit(2,3)
5

print  G.node["jetboy"]["myobj"].flag
wawa

I hadn't even anticipated that capability, I was originally exploring to use strings attached to nodes as input to Python's 'exec' statement.

Of course one could combine the two and use 'exec' to create new Classes on the fly! 

The performance for a lot of stuff ain't too shabby, better than i was expecting.  Node creation seems to be on the order of 1,000,000 per second.   Edges around 250,000 per second, not shabby.   The traversal stuff seems slow, so I'll probably have to write my own suh that query/search time scales with result set size.  That it will live in application space, rather than in a server space (as in neo4j) is probably the biggest limitation to networkx, but not a [problem within the scale of things I'm playing with.  Seems fine for 100K nodes and 1000K edges with a handful of attributes for each.    

Jim Rutt

unread,
Feb 3, 2015, 8:19:37 AM2/3/15
to networkx...@googlegroups.com
Of course the BIGGEST limitation is the damned Python GIL, making multi-threading useless for taking advantage of multicore processors.

Moritz Beber

unread,
Feb 3, 2015, 8:34:12 AM2/3/15
to networkx...@googlegroups.com
Cool, glad you like it :) A comment on your code below:

On Tue, Feb 3, 2015 at 2:19 PM, Jim Rutt <jim...@gmail.com> wrote:
Of course the BIGGEST limitation is the damned Python GIL, making multi-threading useless for taking advantage of multicore processors.


On Tuesday, February 3, 2015 at 8:14:38 AM UTC-5, Jim Rutt wrote:
i do use other graph engines for other purposes, such as neo4j and especially the custom hypergraph Db inside of http://opencog.org.

the reason I'm playing with networkX is to explore the domain of "computation that lives on a graph".  After just a day, it is clear that my conjecture that NetworkX would be good for that has been highly confirmed.   i love how easy it is to associate an Instance Object with a node, and actually invoke the methods of that Object thru the graph syntax ala:

class TestObj:
    flag = "wawa"
    def summit(self,a,b):
        return (a + b)

anObj = TestObj()

G.add_node("jetboy",myobj=anObj)

print  G.node["jetboy"]["myobj"].summit(2,3)
5

print  G.node["jetboy"]["myobj"].flag
wawa



You can turn this into:

I'd suggest inheriting from object, i.e., class TestObj(object), and then you can go further as class instances usually have a unique hash:

my_obj = TestObj()
G.add_node(my_obj)

Which, of course, isn't as useful when you need to keep a list of objects around to index into the graph but it would allow you to iterate over your graph and simply call the methods  on the nodes directly rather than via an attribute.

Cheers,
Moritz

Jim Rutt

unread,
Feb 3, 2015, 9:00:39 AM2/3/15
to networkx...@googlegroups.com
I actually started out with your proposed form, but ran into the problem you mentioned:  "Which, of course, isn't as useful when you need to keep a list of objects around to index into the graph".  I will keep it in the toolkit for when the nature of the problem fits that form.  Now that I'm getting a better "feel" for NetworkX,  it would seem likely I'll find such cases.  The whole area of "computation that lives on a graph" needs lots of exploring!

Simon Knight

unread,
Feb 4, 2015, 9:09:40 PM2/4/15
to networkx...@googlegroups.com
Yeah NetworkX is a fantastic library. My datasets are generally small
so I haven't pushed the efficiency limits of pure-Python vs Boost,
igraph, etc. But since the core of NetworkX is based on Python
dictionaries, the performance is generally pretty good.

Building a query language to traverse NetworkX graphs such as the
example you gave does sound feasible. I think that could work as a
function that takes a query and a graph, and returns the result. I
don't think it would require modifying the base NetworkX classes.

Building such an abstraction to allow these queries would be a benefit
to the larger NetworkX community - especially in the scientific/data
analyis space.
> --
> You received this message because you are subscribed to the Google Groups
> "networkx-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to networkx-discu...@googlegroups.com.
> To post to this group, send email to networkx...@googlegroups.com.
> Visit this group at http://groups.google.com/group/networkx-discuss.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages