Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Custom/persistent graph implementation?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 28 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
project2501  
View profile  
 More options Mar 20 2012, 10:55 am
From: project2501 <darreng5...@gmail.com>
Date: Tue, 20 Mar 2012 07:55:39 -0700 (PDT)
Local: Tues, Mar 20 2012 10:55 am
Subject: Custom/persistent graph implementation?

Hi,
  I'm interested in providing a persistent implementation of the underlying
graph representation for networkx.
I'm not talking about serializing a graph to and from storage. I'm talking
about storing all the node/edge/property information in
a storage implementation, transparent to networkx, such that when networkx
algorithms traverse nodes, edges, etc.
That information is retrieved from storage/cache using my dictionary
implementation.

Is it possible? Any tips appreciated!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aric Hagberg  
View profile  
 More options Mar 20 2012, 11:22 am
From: Aric Hagberg <aric.hagb...@gmail.com>
Date: Tue, 20 Mar 2012 09:22:13 -0600
Local: Tues, Mar 20 2012 11:22 am
Subject: Re: [networkx-discuss] Custom/persistent graph implementation?

On Tue, Mar 20, 2012 at 8:55 AM, project2501 <darreng5...@gmail.com> wrote:
> Hi,
>   I'm interested in providing a persistent implementation of the underlying
> graph representation for networkx.
> I'm not talking about serializing a graph to and from storage. I'm talking
> about storing all the node/edge/property information in
> a storage implementation, transparent to networkx, such that when networkx
> algorithms traverse nodes, edges, etc.
> That information is retrieved from storage/cache using my dictionary
> implementation.

> Is it possible? Any tips appreciated!

It's definitely possible and we have discussed it before.

There is a simple implementation using Python's shelve at
https://networkx.lanl.gov/trac/ticket/224
It hasn't been updated too recently.

More generally it would make sense to use one of the graph databases.
I've asked the community to suggest an appropriate one, but so far
nobody has replied.  I am willing to consider re-engineering parts the
NetworkX core to make it work with a persistent database solution if
that is necessary.

Aric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Mar 20 2012, 12:59 pm
From: project2501 <darreng5...@gmail.com>
Date: Tue, 20 Mar 2012 09:59:18 -0700 (PDT)
Local: Tues, Mar 20 2012 12:59 pm
Subject: Re: [networkx-discuss] Custom/persistent graph implementation?

A couple thoughts I had to make this work.

- Tinkerpop has a general interface to graph databases and Bulbs (bulbflow)
is a python API to the REST interfaces of the graph server.
However, this is my last suggestion since it involves a lot of moving parts.
- Another option I thought was to use a memcached or memcachedb server with
python client as the basis for a persistent dictionary used by networkx as
its data structure.
The graph is represented using adjacency matrices instead of direct
references.
- Separate from or underneath memcached can be mongodb, which autoshards
across a network and autoroutes. wordnic has implemented graphing functions
over mongodb.
And mongodb scales horizontally.

It would be very interesting to see if any of these robust, auto-scaling
mechanisms can underly networkx structures transparently such that
retrieval from persistence and caching are handled automatically. Yet
avoids bottlenecked performance issues (both memcached and mongo scale
well).

This would be a relatively easy implementation I think. All of the above
have python bindings.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Howard  
View profile  
 More options Mar 20 2012, 1:21 pm
From: Tim Howard <tghow...@gmail.com>
Date: Tue, 20 Mar 2012 13:21:15 -0400
Local: Tues, Mar 20 2012 1:21 pm
Subject: Re: [networkx-discuss] Custom/persistent graph implementation?
I've had great luck with SQLite and using it as storage for NetworkX.
Obviously, I currently write the entire graph structure to and from
SQLite and RAM and all the algorithms occur in RAM. I think it would
be great to have a persistent implementation as discussed below.

Tim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Mar 20 2012, 1:57 pm
From: project2501 <darreng5...@gmail.com>
Date: Tue, 20 Mar 2012 10:57:39 -0700 (PDT)
Local: Tues, Mar 20 2012 1:57 pm
Subject: Re: Custom/persistent graph implementation?
Yeah, because for some scales of graph, they won't fit in RAM anymore
or on a single machine.
That's the size I'm pursuing code for.

The general logic behind the data structure would go something like
this:

1. When a node/edge or whatever is to be de-referenced by its id, the
local memory cache is consulted.
2. If it doesn't reside in memory cache, then its queried from offline
storage (say mongodb). Mongo will take care of
identifying the server it actually resides on.
3. That node/edge is pulled into the cache where its properties are
available.

So consider a depth first search algorithm. It would only need the
starting node in the cache, along with
its out and in vertex information (edges). From there, as networkx
consults its internal matrix, the de-referencing
process causes lookups to occur transparently. This can mean
consulting a large DHT or routing to other machines,
i.e. crawling.

networkx only sees its usual internal data structure.

Which classes should I look at to implement this? I'm not all that
knowledgable on networkx code.

On Mar 20, 1:21 pm, Tim Howard <tghow...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Schult  
View profile  
 More options Mar 22 2012, 9:57 am
From: Dan Schult <dsch...@colgate.edu>
Date: Thu, 22 Mar 2012 09:57:49 -0400
Local: Thurs, Mar 22 2012 9:57 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

On Mar 20, 2012, at 1:57 PM, project2501 wrote:

> Which classes should I look at to implement this? I'm not all that
> knowledgable on networkx code.

The first place to look would be the Graph class:  networkx/classes/graph.py
Right now the API and the data storage are fairly interlaced for
improved performance.  We are experimenting with partial separations
of the graph interface from the data structure, e.g. using a scipy
sparse matrix representation instead of a dict-of-dict-of-dict.  

> The general logic behind the data structure would go something like
> this:

> 1. When a node/edge or whatever is to be de-referenced by its id, the
> local memory cache is consulted.
> 2. If it doesn't reside in memory cache, then its queried from offline
> storage (say mongodb). Mongo will take care of
> identifying the server it actually resides on.
> 3. That node/edge is pulled into the cache where its properties are
> available.

It sounds like you might want to replace the dicts used for adj lists
with a subclass of dict that overrides __getitem__ with a method that
checks the internal storage and if nothing is found goes to the database.
Something like a dict-of-DB_adj_dict-of-DB_egdedata_dict

Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tcb  
View profile  
 More options Mar 22 2012, 10:20 am
From: tcb <thecolourblu...@gmail.com>
Date: Thu, 22 Mar 2012 14:20:08 +0000
Local: Thurs, Mar 22 2012 10:20 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

Hi,

Its perhaps worth separating the two ideas...

The idea of a persistent storage for networkx graphs is a good one, and has
been brought up many times- for many people this will mean just writing a
large graph into some format which is easy to store and transport to other
machines or applications. SQLite would seem to be ideal for this job, and
I'm sure several people have already written their own readers and writers
for sqlite.

To change the backend of networkx from a dict of dicts to something else,
which implements persistent storage and is scalable across multiple
processors is another job altogether. It has been suggested to implement a
native backend in for eg. boost or igraph or something, but it seems that
you won't get much of a speedup doing that, unless you also rewrite the
algorithms. Since the beauty of networkx is the python algorithms- you may
as well then just use another package from the outset. I've been looking at
neo4j (http://neo4j.org/) for very large graphs, and while it certainly
scales very well, it is really missing good tools for network analysis.
There are python bindings for neo4j (
http://docs.neo4j.org/chunked/snapshot/python-embedded.html) which need
Jpype (http://jpype.sourceforge.net/) since neo4j is in java- it would be
an easy job to put together readers and writers which would load and store
networkx graphs in neo4j... probably a few have already done this?

I think it would also be possible to use neo4j as a backend for networkx-
neo4j has Nodes and Relationships which would map nicely to nodes and edges
in networkx. The Nodes and Relationships have property maps which could
store any attributes of nodes/edges. Neo4j would handle the memory caching
itself, so you would just send an access request (modify __getitem__ in
graph.py) for the nodes/edges and let neo4j consult its stores and figure
out how to return the data to you. While this might work for small graphs-
its not really how the scalable graph databases are designed to work. For
example: every write operation has to be wrapped in a transaction- this
will kill your performance if you do it for every individual write, but is
manageable if you batch the transactions together. This will likely require
most of the algorithms to be reworked. The advantages of neo4j probably
come from its indexes- you would have to figure out how to use these
efficiently in the networkx algorithms. Many algorithms in networkx start
with a list [] of nodes- this is not going to work either and would likely
have to be modified to use the Traversal framework (
http://docs.neo4j.org/chunked/stable/tutorial-traversal-concepts.html). You
might also look at Gremlin (https://github.com/tinkerpop/gremlin/wiki) to
get an idea for an efficient way of doing network analysis on distributed
graphs.

It would be nice to see how far you can get with a neo4j (or similar
backend). Modifying the graph class in networkx to use a different backend
should be straightforward enough, but ensuring all (or most) the algorithms
still work, and work efficiently will be the hard part.

regards,


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Mar 25 2012, 9:27 am
From: project2501 <darreng5...@gmail.com>
Date: Sun, 25 Mar 2012 06:27:43 -0700 (PDT)
Local: Sun, Mar 25 2012 9:27 am
Subject: Re: Custom/persistent graph implementation?
In fact, I came from the whole neo4j,orientdb graph realm before
considering this idea.

Couple observations:

1. They don't support native algorithms, so I would have to write my
own or patch a set of them over the database.
So that would be starting with the database and finding/writing the
algorithms (which I'm less adept at).
2.  Any of those databases could be used as the long-term storage
backend for what I'm describing above.
Some study of mapping algorithm code to efficient database queries
would need to be done (outside my scope).

So, the inverse of that approach is starting with the algorithm
package and layering the persistence underneath.
networkx not only has algorithms I need, but its python, and also has
export to javascript and SVG. two critical pieces for me.

Since I'm fairly decent with persistence, it _seems_ easier for me to
add that piece into networkx.

Of course, it won't perform the same as memory, but only on first
calls as I intend to cache the data in local memory,
backed by memcache (which is distributed and fast), backed by mongo
(long-term storage). If I implement the Graph.py
class correctly, networkx algorithms will solve the problem of what
queries to issue, in what order and when to solve
graph theoretic problems over the persistence layers. Both mongodb and
memcache scale horizontally and perform
exceptionally well. So even though it won't compare to memory, when
you're operating on graphs of enormous size
in "ad hoc" ways. Which is to say, user-driven request/response modes.
Its not possible to have ALL the graph
data in memory waiting for them.

Having said all that, I think a good first step (for study or general
use) is to see how this approach works or not.

:)

On Mar 22, 10:20 am, tcb <thecolourblu...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Moritz Emanuel Beber  
View profile  
 More options Mar 25 2012, 5:10 pm
From: Moritz Emanuel Beber <moritz.be...@googlemail.com>
Date: Sun, 25 Mar 2012 23:10:06 +0200
Local: Sun, Mar 25 2012 5:10 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?
Hi,

On 03/25/2012 03:27 PM, project2501 wrote:

I just want to highlight for you that, as Aric mentioned before, many of
the algorithms access and modify the dictionary structure of the Graph
class directly. This is done for performance gains but means that you
cannot simply define a class that exposes the same interface but you
actually need to mimic some of the internals as well.

To see where such access of internals occurs would actually be
interesting, imho, so that possibly those code sections could be altered
to use "public" methods and attributes only.

Good luck,
Moritz


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Mar 25 2012, 6:09 pm
From: project2501 <darreng5...@gmail.com>
Date: Sun, 25 Mar 2012 15:09:55 -0700 (PDT)
Local: Sun, Mar 25 2012 6:09 pm
Subject: Re: Custom/persistent graph implementation?
Thanks. Yeah, I understand the issue there. I will study this some and
see if there's an efficient way to loosely couple
the internal representation.

Just offhand, it would seem that if the Graph class deferred lookups
to another class using a notional syntax for
storing and retrieving adjacent or referential sets of dicts from the
other class, then it could be implemented various ways
with only the additive cost of a method call.

On Mar 25, 5:10 pm, Moritz Emanuel Beber <moritz.be...@googlemail.com>
wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pengkui Luo  
View profile  
 More options Mar 25 2012, 8:22 pm
From: Pengkui Luo <pengkui....@gmail.com>
Date: Sun, 25 Mar 2012 19:22:25 -0500
Local: Sun, Mar 25 2012 8:22 pm
Subject: Re: [networkx-discuss] Custom/persistent graph implementation?

Hi Aric,

I came across a library called CloudLight which extends NetworkX to support
in-disk graphs and uses SQLite3 for persistence.

http://www.assembla.com/spaces/cloudlight/wiki/Tutorial

I am not sure about the quality of this "wheel" though. What is your
opinion on it?

~Pengkui


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tcb  
View profile  
 More options Mar 26 2012, 9:57 am
From: tcb <thecolourblu...@gmail.com>
Date: Mon, 26 Mar 2012 14:57:31 +0100
Local: Mon, Mar 26 2012 9:57 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

That's about right.

You could also look at the previously mentioned cloudlight to see how a
persistent backend is done with SQLite.

Sure, but this is really the problem that the graph databases like neo4j
are trying to solve. If you can pull all your data from storage into
memory, then you have no problems running networkx algorithms directly. But
the graph databases are designed for those situations where you have more
data than you can fit in memory on a single machine. This is much more of a
problem for graph algorithms because of the non locality of the data. For
example, finding neighbours of a node might mean pulling data from several
different machines- there is usually no simple way of partitioning your
graph nicely among N processors so that each processor has just the nodes
and edges it needs.

Perhaps as first step, and to see how it might pan out, you could ignore
those issues and just implement the persistent storage. If the networkx
algorithms manipulate the Graph data through a standard api then they
should 'just work', but they will be dreadfully slow.

The next step then, would to take advantage of the ability to work with
large datasets, and perhaps to execute some algorithms in parallel using
the native abilities of the backend. A parallel search algorithm
implemented on a scalable graph database backend would be a tremendously
useful addition to networkx.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Apr 3 2012, 9:12 am
From: project2501 <darreng5...@gmail.com>
Date: Tue, 3 Apr 2012 06:12:57 -0700 (PDT)
Local: Tues, Apr 3 2012 9:12 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

Yeah, I get that graph databases are optimized for graph storage and
retrieval.

I would certainly use one in an implementation underneath networkx. The
problem is,
as I mention, that the graph databases do not come with the algorithms
networkx comes with.
Also, reading the required nodes into memory then doing networkx algorithms
presents a problem.
Which nodes to read into memory. And also, perhaps not all the required
ones fit into memory.

Replacing the networkx in-memory data model with an interface to a memory
or persistent
model will solve both of these problems.

As with most engineering problems, there are tradeoffs. Using networkx to
navigate a persistent
graph database would not be 'dreadfully' slow in my opinion for many common
use cases. Slow
is relative. If done using a forward cache in memory or DHT type system, it
would perform
acceptably given the power it brings - scale - and access to much more
information.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Apr 3 2012, 9:34 am
From: project2501 <darreng5...@gmail.com>
Date: Tue, 3 Apr 2012 06:34:27 -0700 (PDT)
Local: Tues, Apr 3 2012 9:34 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

For starters, inside Graph, instead of doing something like
neighbors[a][b][c],
make it a method call.

neighbors(a,b,c,...)

Let the default implementation use a dict of dicts inside of neighbors() if
it wants.
A graphdb implementation will construct the appropriate joined query and/or
consult its immediate cache.

This should be done. Vastly useful in today's big data world.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tcb  
View profile  
 More options Apr 3 2012, 12:19 pm
From: tcb <thecolourblu...@gmail.com>
Date: Tue, 3 Apr 2012 17:19:08 +0100
Local: Tues, Apr 3 2012 12:19 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

hmm, I'm not so sure. I wonder if you are underestimating the issues (or
maybe I'm over estimating them?).

The reason graph databases don't come with the kind of algorithms networkx
has is because they just don't work at that kind of scale. Replacing the
storage model does not solve the in-memory data problem from the point of
view of the algorithm. What the graph db's can do with reasonable
efficiency is graph traversals. So any algorithm in networkx which relies
on a traversal (and there are quite a few) could (possibly) be made to use
the scalable graph traversal, and could (possibly) work reasonably well.

An example of the problems I would envisage are (at random) from
core.py:core_number(G) - the first line in this algorithm does this:

degrees=G.degree()

that's a list of degrees of all nodes, and it makes another similar list
for the cores which it proceeds to modify. If you have a large enough graph
which requires distributed memory this just wont work (you just can't make
a single list with size of number_of_nodes). You could try to make a new
property on the node and update that property through the graph db backend-
but in many cases to avoid this issue will involve substantial reworking of
the algorithms.

If you can fit all your data in memory, it seems pointless to use a
persistent storage backend- it will never compete with networkx's own Graph
(neither in memory usage or speed). In this case, just use networkx
directly, run the standard algorithms and at the end use the readwrite
functions to write your graph back to persistent storage. So networkx
should have the ability to read and write graphs from persistent storage,
but it uses its own data model and algorithms.

If your graphs are too big for a single machine, then you need the
distributed graph databases (neo4j as an example). These can do distributed
storage and distributed graph traversal. But distributed computation is
another field altogether and networkx's algorithms are just not designed to
work for those problems. Maybe you can make some of the algorithms work
acceptably well- I'd love to be proved wrong... ;)

regards

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Apr 3 2012, 6:13 pm
From: project2501 <darreng5...@gmail.com>
Date: Tue, 3 Apr 2012 15:13:39 -0700 (PDT)
Local: Tues, Apr 3 2012 6:13 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

tcb,
   You make some valid points. However, I think this space of graphs,
graphdbs, algorithms and the like is accelerating quickly.
It has evolved quite a bite from just a few years ago. So my request to the
networkx team is to simply put a data access layer in the Graph class
to invite innovation in. Bundle with a default MemoryDictOfDicts
implementation, but make it easy for people in the graph world to push the
envelope.

Now,, I don't know the ultimate performance metrics, but I'm hoping to find
out what the tradeoffs are because, for me at least, its certain
that A) the nodes won't fit all into memory, and B) this capability
prevails over the performance for now. In other words, my customers would
rather
have the ability to perform insightful algorithms over vast quantities of
distributed data at whatever performance level can be achieved, than
to have nothing.

So let's open the doors on this! It seems a simple modification to Graph
class to add data access object or pattern (for one of the authors that
is). From there, I expect
we all will be surprised at what the community does after that!

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aric Hagberg  
View profile  
 More options Apr 3 2012, 10:14 pm
From: Aric Hagberg <aric.hagb...@gmail.com>
Date: Tue, 3 Apr 2012 20:14:50 -0600
Local: Tues, Apr 3 2012 10:14 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

Do you have some suggestions on how this might be implemented?  There
are lots of engineering trade-offs that will come into play but if
there was a clear design strategy that would help figure those out.

Aric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Apr 4 2012, 7:50 am
From: project2501 <darreng5...@gmail.com>
Date: Wed, 4 Apr 2012 04:50:34 -0700 (PDT)
Local: Wed, Apr 4 2012 7:50 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

I need to study/trace the algorithm execution some more and will have some
ideas after that.

But generally speaking, it would be something like this. Rather than
loading all the nodes
of a graph into memory before the algorithm runs, let the algorithm request
the nodes from
the underlying graph implementation that results in the graph
implementation providing them.
The default implementation may very well pre-load them into memory, but
other implementations
might fetch them for persistence if necessary - resulting in more of a
crawling effect.

For example, to initiate a breadth first type search, I need a starting
node only, not all nodes.
Then, as the algorithm requests edges, adjacent nodes etc. from the graph,
the graph retrieves
them if they are not already loaded into memory.

If the algorithms operate on a list of lists residing in memory already,
then this will require
more work to separate and delegate to the Graph class.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tcb  
View profile  
 More options Apr 4 2012, 8:52 am
From: tcb <thecolourblu...@gmail.com>
Date: Wed, 4 Apr 2012 13:52:12 +0100
Local: Wed, Apr 4 2012 8:52 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

that's a good place to start- most of the persistent graph databases
provide efficient traversal routines, and it should be possible to swap out
the networkx traversals and replace them with the graph db traversals.
There might still be some modifications to the algorithms, but hopefully
nothing too serious.

You might be interested to compare with the approach taken by boost bgl to
implement their library on top of mpi:

http://osl.iu.edu/research/pbgl/documentation/overview.html

They manage to provide a fairly transparent interface to a distributed
graph storage backend, but the algorithms still require fairly major
surgery (although their focus is to make the algorithms run efficiently).
Using an existing distributed graph db (like neo4j) means most of the
backend stuff is done for you. Another key piece of the parallel bgl is a
distributed map/list object. You could possibly design an object with a
similar interface to a list/dict which uses the underlying distributed
graph db for storage. There would then be minimal changes to the algorithms
which would use the distributed lists/dicts when running in 'distributed'
mode.

You might also look at other existing approaches to graph processing at
scale, neatly summarised here

http://stackoverflow.com/questions/2986010/any-open-source-pregel-lik...

regards


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Apr 4 2012, 3:38 pm
From: project2501 <darreng5...@gmail.com>
Date: Wed, 4 Apr 2012 12:38:41 -0700 (PDT)
Local: Wed, Apr 4 2012 3:38 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

tcb,
  Thanks for the links. However, the one about boost bgl seems dated and
the page looks unsupported (images don't event load), but I will dig deeper.

I'm aware of Pregel, but its proprietary I believe. I use Neo4j already and
would probably pursue that as the engine, but would appeal to networkx
engineers to devise a good way to separate data access from the algorithms
making the coupling easier.

Perhaps down the road for networkx something as this emerges...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aric Hagberg  
View profile  
 More options Apr 4 2012, 5:44 pm
From: Aric Hagberg <aric.hagb...@gmail.com>
Date: Wed, 4 Apr 2012 15:44:25 -0600
Local: Wed, Apr 4 2012 5:44 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

On Wed, Apr 4, 2012 at 1:38 PM, project2501 <darreng5...@gmail.com> wrote:
> tcb,
>   Thanks for the links. However, the one about boost bgl seems dated and the
> page looks unsupported (images don't event load), but I will dig deeper.

> I'm aware of Pregel, but its proprietary I believe. I use Neo4j already and
> would probably pursue that as the engine, but would appeal to networkx
> engineers to devise a good way to separate data access from the algorithms
> making the coupling easier.

Can you be more specific about what would be required "to separate
data access from the algorithms".
We do have some separation and a (almost complete?) functional
interface to get the data with object methods.
In some places (algorithms) we access the data using Python built-in
methods on the data dictionary because that is much faster.

Unfortunately the Neo4j licenses are not compatible with NetworkX's
BSD.  So we can't create a dependency on that package and keep the
BSD.  Are there other graph databases with BSD-compatible licenses?

Aric

Aric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
project2501  
View profile  
 More options Apr 5 2012, 6:28 pm
From: project2501 <darreng5...@gmail.com>
Date: Thu, 5 Apr 2012 15:28:44 -0700 (PDT)
Local: Thurs, Apr 5 2012 6:28 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

Aric,
   The simple answer to this is to force the algorithms to use the Graph
class methods exclusively.
This would allow different Graph implementations to be used easily. Yes,
accessing a native
python dictionary with all the graph nodes will perform well, but of
course, not scale. This is one tradeoff
to consider.

However, it would be possible for a Graph implementation to return any kind
of native dictionary with
its node data. I thought also, this might work with a custom dictionary
implementation but I tend to prefer
doing this kind of abstraction at the Class/method level.

Yet another approach would be to associate the algorithms with a Graph
class such that a Graph object
can return instances of the algorithms. In this way, an algorithm
implementation can be returned from
a Graph object and both implementations can be encapsulated so they perform
best for that implementation.
This would allow a native python object/list implementation of a Graph to
return algorithm instances that
expect those data structures and performance would be the same as now.

For a Graph implementation that uses something like Neo4j, it can return an
algorithm implementation
that takes advantage of its implementation.

As for Neo4j, networkx need not bind to any of its code because the server
exposes REST/HTTP interfaces.
Other graphdb do this as well. There is a native/embedded python wrapper
for Neo4j.

As for license incompatibilities, its a good point. But if a class wrapped
neo4j and was interface compatible
with networkx, but didn't load networkx, then an end-user could load that
class and tell networkx to use it
as a Graph implementation. Thus keeping things separate in the libraries.

Other graph databases include orientdb, infinitegraph, dex. I've only used
neo4j and orientdb.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aric Hagberg  
View profile  
 More options Apr 7 2012, 12:57 pm
From: Aric Hagberg <aric.hagb...@gmail.com>
Date: Sat, 7 Apr 2012 10:57:39 -0600
Local: Sat, Apr 7 2012 12:57 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

On Thu, Apr 5, 2012 at 4:28 PM, project2501 <darreng5...@gmail.com> wrote:
> Aric,
>    The simple answer to this is to force the algorithms to use the Graph
> class methods exclusively.
> This would allow different Graph implementations to be used easily. Yes,
> accessing a native
> python dictionary with all the graph nodes will perform well, but of course,
> not scale. This is one tradeoff
> to consider.

> However, it would be possible for a Graph implementation to return any kind
> of native dictionary with
> its node data. I thought also, this might work with a custom dictionary
> implementation but I tend to prefer
> doing this kind of abstraction at the Class/method level.

At this point its arguable whether the Python dict methods have become
the de facto public interface.   I don't think we've ever been very
clear on that one way or another, and we have exposed the dictionary
methods in some places to get (significant) performance gains.

So certainly one approach is to provide a dictionary-like alternative
data structure.  I have a SciPy sparse graph implementation (not ready
for release) that uses that technique.

> Yet another approach would be to associate the algorithms with a Graph class
> such that a Graph object
> can return instances of the algorithms. In this way, an algorithm
> implementation can be returned from
> a Graph object and both implementations can be encapsulated so they perform
> best for that implementation.
> This would allow a native python object/list implementation of a Graph to
> return algorithm instances that
> expect those data structures and performance would be the same as now.

> For a Graph implementation that uses something like Neo4j, it can return an
> algorithm implementation
> that takes advantage of its implementation.

I've generally been resisting that approach (e.g. adding more
algorithms as graph class methods) since there are so many varied uses
of NetworkX.  Some people only want the  base classes and a few
algorithms (or even none), etc.

But perhaps, considering distributed memory or database backends,
there is a subset of core algorithms that most other algorithms can
use and that provide high performance access, traversal, or other
functions?  If such a set existed then I can see including those as
base class methods.

> As for Neo4j, networkx need not bind to any of its code because the server
> exposes REST/HTTP interfaces.
> Other graphdb do this as well. There is a native/embedded python wrapper for
> Neo4j.

> As for license incompatibilities, its a good point. But if a class wrapped
> neo4j and was interface compatible
> with networkx, but didn't load networkx, then an end-user could load that
> class and tell networkx to use it
> as a Graph implementation. Thus keeping things separate in the libraries.

Yes, that would work.  But it wouldn't be part of NetworkX.

> Other graph databases include orientdb, infinitegraph, dex. I've only used
> neo4j and orientdb.

Thanks, I'll take a look.  Anyone else have suggestions?

Aric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Schult  
View profile  
 More options Apr 7 2012, 2:07 pm
From: Dan Schult <dsch...@colgate.edu>
Date: Sat, 7 Apr 2012 14:07:49 -0400
Local: Sat, Apr 7 2012 2:07 pm
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?
Some algorithms don't need much from the Graph class.

Shortest_path uses G[n] or G.successors/G.predecessors
(bidirectional).  I count G[n] being used for iterating over neighbors
as "using the Graph class methods".

Algorithms that use the syntax G[u].items() or G[u][v] are counting
on G[n] returning an adjacency dict of edge attribute dicts.  That can
be a performance burden on data structures that don't use dicts.

Perhaps we could adjust the algorithms to use a new (and fast)
method __call__ like:    G(u, attr=None) to iterate over 2-tuples
(nbr, attr_dict)  if attr is None; or (nbr, attr_value)  otherwise.
This reduces time creating dicts for data strorage that doesn't use  
dicts.
If an algorithm need dicts, it can create one with dict(G
(u,attr=weight).

I think the change in the algorithms would be something like:

 >>> for nbr,wt in G(n,attr=weight):
 >>>    ....

Instead of

 >>> for nbr,edata in G[n].items():
 >>>    wt=edata.get(weight,1)
 >>>   ....

To directly replace G[u][v][weight] the algorithms would need G(u)[v]
[weight]

I guess what I'm trying to say is that your idea to "use the Graph  
class methods
exclusively" doesn't mean we have to give up the speed of G[u] or G[u]
[v] for
our base classes.  But there may be some corner cases I'm not taking  
into account.

Thoughts?
Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aric Hagberg  
View profile  
 More options Apr 8 2012, 11:09 am
From: Aric Hagberg <aric.hagb...@gmail.com>
Date: Sun, 8 Apr 2012 09:09:59 -0600
Local: Sun, Apr 8 2012 11:09 am
Subject: Re: [networkx-discuss] Re: Custom/persistent graph implementation?

That could work.

But I think the missing piece is the design document that that
describes what the (Python language) public interface is for the Graph
class(es) and data objects.  We have discussed aspects of this many
times but right now the only documentation is the code.

If a clear specification what is holding us back from engineering
something that can use alternative/distributed data representations
then maybe it is time to address that.  (But in a different
thread....).

Aric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 28   Newer >
« Back to Discussions « Newer topic     Older topic »