Which graph library is best suited for large graphs?

10 views
Skip to first unread message

Wolodja Wentland

unread,
Dec 11, 2009, 5:12:12 AM12/11/09
to pytho...@python.org
Hi all,

I am writing a library for accessing Wikipedia data and include a module
that generates graphs from the Link structure between articles and other
pages (like categories).

These graphs could easily contain some million nodes which are frequently
linked. The graphs I am building right now have around 300.000 nodes
with an average in/out degree of - say - 4 and already need around 1-2GB of
memory. I use networkx to model the graphs and serialise them to files on
the disk. (using adjacency list format, pickle and/or graphml).

The recent thread on including a graph library in the stdlib spurred my
interest and introduced me to a number of libraries I have not seen
before. I would like to reevaluate my choice of networkx and need some
help in doing so.

I really like the API of networkx but have no problem in switching to
another one (right now) .... I have the impression that graph-tool might
be faster and have a smaller memory footprint than networkx, but am
unsure about that.

Which library would you choose? This decision is quite important for me
as the choice will influence my libraries external interface. Or is
there something like WSGI for graph libraries?

kind regards

--
.''`. Wolodja Wentland <went...@cl.uni-heidelberg.de>
: :' :
`. `'` 4096R/CAF14EFC
`- 081C B7CD FF04 2BA9 94EA 36B2 8B7F 7D30 CAF1 4EFC

signature.asc

Bearophile

unread,
Dec 11, 2009, 6:03:03 AM12/11/09
to
Wolodja Wentland:

> Which library would you choose?

This one probably uses low memory, but I don't know if it works still:
http://osl.iu.edu/~dgregor/bgl-python/

Bye,
bearophile

Wolodja Wentland

unread,
Dec 11, 2009, 6:17:43 AM12/11/09
to pytho...@python.org

That project looks not that maintained and graph-tool [1] is based on
boost as well, so I don't see the advantage in choosing bgl-python over
graph-tool.

The point is, that I am not sure if using graph-tool has any advantages
over networkx at all. It looks like a great library, supports filtered
graphs which I find pretty useful, but have not used it yet.

[1] http://projects.forked.de/graph-tool/

signature.asc

Neal Becker

unread,
Dec 11, 2009, 8:55:19 AM12/11/09
to pytho...@python.org
Bearophile wrote:

How about python interface to igraph?

Wolodja Wentland

unread,
Dec 11, 2009, 9:07:41 AM12/11/09
to pytho...@python.org

> How about python interface to igraph?

Don't know :-) as I have not yet worked with it. Why do you recommend it?

signature.asc

IngoognI

unread,
Dec 11, 2009, 10:31:18 AM12/11/09
to
On Dec 11, 11:12 am, Wolodja Wentland <wentl...@cl.uni-heidelberg.de>
wrote:

>
> Which library would you choose?

looking at the galery at networx, it seems to be all balls 'n sticks,
how about writing the data to a file POV-Ray can read and render it
there?

Wolodja Wentland

unread,
Dec 11, 2009, 11:04:31 AM12/11/09
to pytho...@python.org

Huh? I am not really concerned about rendering the graphs but after a
library with a small memory footprint. Preferably one that contains a
number of typical algorithms.

signature.asc

Neal Becker

unread,
Dec 11, 2009, 11:57:38 AM12/11/09
to pytho...@python.org
Wolodja Wentland wrote:

> On Fri, Dec 11, 2009 at 08:55 -0500, Neal Becker wrote:
>> Bearophile wrote:

>> How about python interface to igraph?
>
> Don't know :-) as I have not yet worked with it. Why do you recommend it?

My understanding is that igraph is a high performance graph library (all
implemented in C). It seems to have a very active user community.

There is also boost graph library, which IIRC also has a python interface.

anand jeyahar

unread,
Dec 11, 2009, 3:59:30 PM12/11/09
to pytho...@python.org
On 12/11/2009 10:27 PM, Neal Becker wrote:
> Which library would you choose?
>
Hmm.... i have tried python-graph and was happy with it....but the most
use i did was for complete graphs of 60-65 nodes..

Also there is an experimental branch for faster implementations, which
is under development.

--
==============================================
Anand J
http://sites.google.com/a/cbcs.ac.in/students/anand
==============================================
The man who is really serious,
with the urge to find out what truth is,
has no style at all. He lives only in what is.
~Bruce Lee

Love is a trade with lousy accounting policies.
~Aang Jie

geremy condra

unread,
Dec 11, 2009, 9:48:34 PM12/11/09
to pytho...@python.org
On Fri, Dec 11, 2009 at 5:12 AM, Wolodja Wentland
<went...@cl.uni-heidelberg.de> wrote:
> Hi all,
>
> I am writing a library for accessing Wikipedia data and include a module
> that generates graphs from the Link structure between articles and other
> pages (like categories).
>
> These graphs could easily contain some million nodes which are frequently
> linked. The graphs I am building right now have around 300.000 nodes
> with an average in/out degree of - say - 4 and already need around 1-2GB of
> memory. I use networkx to model the graphs and serialise them to files on
> the disk. (using adjacency list format, pickle and/or graphml).

Huh. Using graphine- which should be somewhat more memory hungry
than networkx- I generated a naive million node 4-cycle graph and wound
up using something under 600 meg of ram. Can you post some code?

Geremy Condra

Tiago de Paula Peixoto

unread,
Dec 12, 2009, 7:01:36 AM12/12/09
to pytho...@python.org
Hi there,

On 12/11/2009 08:12 AM, Wolodja Wentland wrote:
> I really like the API of networkx but have no problem in switching to
> another one (right now) .... I have the impression that graph-tool might
> be faster and have a smaller memory footprint than networkx, but am
> unsure about that.

I'm the author of graph-tool, so my opinion may be biased. :-)

Nevertheless, I do think that graph-tool will be faster and have a
smaller memory footprint than networkx, since the graph data structures
and most algorithms are written in C++, using the Boost Graph Library,
whereas networkx is mostly pure python. I have made this library due to
my own requirements of being able to work with large graphs.

The only other library I can think of which may be comparable in
performance is igraph, which is implemented in C. But I think
graph-tool's interface is a bit more polished (my opinion
only). Moreover, since graph-tool uses template metaprograming to obtain
specialized versions of algorithms, it may be that it is even faster
than igraph, but I have at the moment no benchmarks to back this up.

Cheers,
Tiago

signature.asc
Reply all
Reply to author
Forward
0 new messages