[Python-ideas] Graph class

47 views
Skip to first unread message

Mark Adam

unread,
Dec 7, 2012, 4:45:25 PM12/7/12
to Python-Ideas
I have a decent semi-recursive Graph class that I think could be a
good addition to the Collections module. It probably needs some
refactoring, but I'm posting here to see if there's any interest.

For those who aren't too abreast of CS theory, a graph is one of the
most abstract data structures in computer science, encompassing trees,
and lists. I'm a bit surprised that no one's offered one up yet, so
I'll present mine.

The code is at http://github.com/theProphet/Social-Garden under the
pangaia directly called graph.py. It has a default dictionary
(defdict.py) dependency that I made before Python came up with it on
it's own (another place for refactoring).

Cheers,

MarkJ
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Thomas Kluyver

unread,
Dec 7, 2012, 5:22:42 PM12/7/12
to Mark Adam, Python-Ideas
On 7 December 2012 21:45, Mark Adam <dreamin...@gmail.com> wrote:
I have a decent semi-recursive Graph class that I think could be a
good addition to the Collections module.  It probably needs some
refactoring, but I'm posting here to see if there's any interest.

For reference, there was a previous idea to make some kind of standard Graph API:
http://wiki.python.org/moin/PythonGraphApi

When I had to implement a really simple DAG myself, I based it on this Graph ABC library:
http://www.linux.it/~della/GraphABC/

Best wishes,
Thomas

Terry Reedy

unread,
Dec 8, 2012, 2:17:17 AM12/8/12
to python...@python.org
On 12/7/2012 4:45 PM, Mark Adam wrote:
> I have a decent semi-recursive Graph class that I think could be a
> good addition to the Collections module. It probably needs some
> refactoring, but I'm posting here to see if there's any interest.
>
> For those who aren't too abreast of CS theory, a graph is one of the
> most abstract data structures in computer science, encompassing trees,
> and lists. I'm a bit surprised that no one's offered one up yet, so
> I'll present mine.

I believe there are are multiple graph modules and packages, but none is
really dominant. It is partly because there are multiple representations
and the best depends on the problem.

> The code is at http://github.com/theProphet/Social-Garden under the
> pangaia directly called graph.py. It has a default dictionary
> (defdict.py) dependency that I made before Python came up with it on
> it's own (another place for refactoring).

--
Terry Jan Reedy

Serhiy Storchaka

unread,
Dec 8, 2012, 3:07:40 AM12/8/12
to python...@python.org
On 07.12.12 23:45, Mark Adam wrote:
> I have a decent semi-recursive Graph class that I think could be a
> good addition to the Collections module. It probably needs some
> refactoring, but I'm posting here to see if there's any interest.
>
> For those who aren't too abreast of CS theory, a graph is one of the
> most abstract data structures in computer science, encompassing trees,
> and lists. I'm a bit surprised that no one's offered one up yet, so
> I'll present mine.
>
> The code is at http://github.com/theProphet/Social-Garden under the
> pangaia directly called graph.py. It has a default dictionary
> (defdict.py) dependency that I made before Python came up with it on
> it's own (another place for refactoring).

Graph is too abstract conception. There are a lot of implementations of
graphs. Every non-trivial program contains some (may be implicit) graphs.

See also for some implementations: Magnus Lie Hetland, "Python
Algorithms. Mastering Basic Algorithms in the Python Language".

Mark Adam

unread,
Dec 8, 2012, 8:29:56 PM12/8/12
to Thomas Kluyver, Python-Ideas
On Fri, Dec 7, 2012 at 4:22 PM, Thomas Kluyver <tho...@kluyver.me.uk> wrote:
> On 7 December 2012 21:45, Mark Adam <dreamin...@gmail.com> wrote:
>>
>> I have a decent semi-recursive Graph class that I think could be a
>> good addition to the Collections module.  It probably needs some
>> refactoring, but I'm posting here to see if there's any interest.
>
> For reference, there was a previous idea to make some kind of standard Graph
> API:
> http://wiki.python.org/moin/PythonGraphApi

All very interesting.  I'm going to suggest a sort of "meta-discussion" about why -- despite the power of graphs as a data structure -- such a feature has not stabilized into a workable solution for inclusion in a high-level language like Python.

I identity the following points of "wavery":

1) the naming of methods (add_edge, vs add(1,2)):  aesthetic grounds,
2) what methods to include (degree + neighbors or the standard dict's __len__ + __getitem__):  API grounds
3) how much flexibility to be offered (directed, multi-graphs, edge weights with arbitrary labeling, etc.):  functionality grounds
3) what underlying data structure to use (sparse adjacency dicts, matrices, etc):  representation conflicts.

And upon further thought, it looks like only a killer application could ever settle the issue(s) to make it part of the standard library.

mark

Paul Moore

unread,
Dec 9, 2012, 6:40:05 AM12/9/12
to Mark Adam, Python-Ideas
On 9 December 2012 01:29, Mark Adam <dreamin...@gmail.com> wrote:
> All very interesting. I'm going to suggest a sort of "meta-discussion"
> about why -- despite the power of graphs as a data structure -- such a
> feature has not stabilized into a workable solution for inclusion in a
> high-level language like Python.
>
> I identity the following points of "wavery":
>
> 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds,
> 2) what methods to include (degree + neighbors or the standard dict's
> __len__ + __getitem__): API grounds
> 3) how much flexibility to be offered (directed, multi-graphs, edge weights
> with arbitrary labeling, etc.): functionality grounds
> 3) what underlying data structure to use (sparse adjacency dicts, matrices,
> etc): representation conflicts.

4) Whether the library requires some sort of "Vertex" type, or works
with arbitrary values, similarly whether there is a defined "Edge"
class or edges can be labelled, weighted, etc with arbitrary Python
values.
5) Granularity - if all I want is a depth-first search algorithm, why
pull in a dependency on 100 graph algorithms I'm not interested in?

My feeling is that graphs are right on the borderline of a data
structure that is simple enough that people invent their own rather
than bother conforming to a "standard" model but complex enough that
it's worth using library functions rather than getting the details
wrong. In C, there are many examples of this type of "borderline"
stuff - linked lists, maps, sorting and searching algorithms, etc. In
Python, lists, dictionaries, sorting, etc are all "self evidently"
basic building blocks, but graphs hit that borderline area.

Paul

Mark Adam

unread,
Dec 9, 2012, 3:31:32 PM12/9/12
to Python-Ideas
Meant this to go to the whole list. Sorry.

On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore <p.f....@gmail.com> wrote:
> On 9 December 2012 01:29, Mark Adam <dreamin...@gmail.com> wrote:
>> All very interesting. I'm going to suggest a sort of "meta-discussion"
>> about why -- despite the power of graphs as a data structure -- such a
>> feature has not stabilized into a workable solution for inclusion in a
>> high-level language like Python.
>>
>> I identity the following points of "wavery":
>>
>> 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds,
>> 2) what methods to include (degree + neighbors or the standard dict's
>> __len__ + __getitem__): API grounds
>> 3) how much flexibility to be offered (directed, multi-graphs, edge weights
>> with arbitrary labeling, etc.): functionality grounds
>> 4) what underlying data structure to use (sparse adjacency dicts, matrices,
>> etc): representation conflicts.
>
> 4) Whether the library requires some sort of "Vertex" type, or works
> with arbitrary values, similarly whether there is a defined "Edge"
> class or edges can be labelled, weighted, etc with arbitrary Python
> values.

This I put under #3 (functionality grounds) "edge weights with
arbitrary labeling", Vertex's with abitrary values i think would be
included.

> 5) Granularity - if all I want is a depth-first search algorithm, why
> pull in a dependency on 100 graph algorithms I'm not interested in?

Hmm, I would call this "5) comprehensiveness: whether to include every
graph algorithm known to mankind."

> My feeling is that graphs are right on the borderline of a data
> structure that is simple enough that people invent their own rather
> than bother conforming to a "standard" model but complex enough that
> it's worth using library functions rather than getting the details
> wrong.

But this is also why (on both counts) it would be good to include it
in the standard library. The *simplicity* of a graph makes everyone
re-implement it, or (worse) work with some cruder work-around (like a
dict of dicts but not at all clear you're dealing with an actual
graph). But imagine if, for example, Subversion used a python graph
class to track all branches and nodes in it's distributed revision
control system. Then how easy it would be for third parties to make
tools to view repos or other developers to come in and work with the
dev team: they're already familiar with the standard graph class
structure.

And for the *complex enough* case, obviously it helps to have a
standard library help you out and just provide the sophistication of a
graph class for you. There are a lot of obvious uses for a graph,
but if you don't know of it, a beginning programmer won't *think* of
it and make some crude work-around.

Mark

Antoine Pitrou

unread,
Dec 9, 2012, 3:53:45 PM12/9/12
to python...@python.org
On Fri, 7 Dec 2012 15:45:25 -0600
Mark Adam <dreamin...@gmail.com>
wrote:
> I have a decent semi-recursive Graph class that I think could be a
> good addition to the Collections module. It probably needs some
> refactoring, but I'm posting here to see if there's any interest.
>
> For those who aren't too abreast of CS theory, a graph is one of the
> most abstract data structures in computer science, encompassing trees,
> and lists. I'm a bit surprised that no one's offered one up yet, so
> I'll present mine.
>
> The code is at http://github.com/theProphet/Social-Garden under the
> pangaia directly called graph.py. It has a default dictionary
> (defdict.py) dependency that I made before Python came up with it on
> it's own (another place for refactoring).

Do you know networkx?
http://networkx.lanl.gov/

Regards

Antoine.

Stephen J. Turnbull

unread,
Dec 9, 2012, 9:08:00 PM12/9/12
to Mark Adam, Python-Ideas
Mark Adam writes:

> graph). But imagine if, for example, Subversion used a python graph
> class to track all branches and nodes in it's distributed revision
> control system. Then how easy it would be for third parties to make
> tools to view repos or other developers to come in and work with the
> dev team: they're already familiar with the standard graph class
> structure.

This is a fallacy. As has been pointed out, there is a variety of
graphs, a large variety of computations to be done on and in them, and
a huge variety in algorithms for dealing with those varied tasks. For
a "standard" graph class to be useful enough to become the OOWTDI, it
would need to deal with a large fraction of those aspects of graph
theory.

Even so, people would only really internalize the parts they need for
the present task, forgetting or (worse) misremembering functionality
that doesn't work for them right now. Corner cases would force many
tasks to be done outside of the standard class. Differences in taste
would surely result in a large number of API variants to reflect
users' preferred syntaxes for representing graphs, and so on. I think
making a "Graph" class that has a chance of becoming the OOWTDI is a
big task. Not as big as SciPy, say, but then, SciPy isn't being
proposed for stdlib inclusion, either.

As usual for stdlib additions, I think this discussion would best be
advanced not by "going all meta", but rather by proposing specific
packages (either already available, perhaps on PyPI, or new ones --
but with actual code) for inclusion. The "meta" discussion should be
conducted with specific reference to the advantages or shortcomings of
those specific packages. N.B. A reasonably comprehensive package that
has seen significant real-world use, and preferably has a primary
distribution point of PyPI, would be the shortest path to inclusion.

Mark Adam

unread,
Dec 15, 2012, 1:19:41 AM12/15/12
to Stephen J. Turnbull, Python-Ideas
On Sun, Dec 9, 2012 at 8:08 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:
> Mark Adam writes:
>
> > graph). But imagine if, for example, Subversion used a python graph
> > class to track all branches and nodes in it's distributed revision
> > control system. Then how easy it would be for third parties to make
> > tools to view repos or other developers to come in and work with the
> > dev team: they're already familiar with the standard graph class
> > structure.
>
> This is a fallacy. As has been pointed out, there is a variety of
> graphs, a large variety of computations to be done on and in them, and
> a huge variety in algorithms for dealing with those varied tasks.

Yes, but the basic data structure concept is NOT in development, it is
already well-developed. The remaining issues of API are wholly
secondary. The usefulness of a graph is unquestioned and creates
cross-functionality across a large number of possible interesting
domains. Really its like having a car when everyone else is walking
(...but should the car be a buick or a toyota? a four-door or two? --
is all besides the point)

But your issue of proposing an actual implementation is well-taken
rather than spend a lot of time arguing over it all. With any luck,
I'll try to distill networkx with my work and put it all together.
haha

mark

Vinay Sajip

unread,
Dec 16, 2012, 10:36:26 AM12/16/12
to python...@python.org
Mark Adam <dreamingforward@...> writes:

> But your issue of proposing an actual implementation is well-taken
> rather than spend a lot of time arguing over it all. With any luck,
> I'll try to distill networkx with my work and put it all together.

In terms of use cases, you might be interested in potential users of any stdlib
graph library. I'm working on distlib [1], which evolved out of distutils2 and
uses graphs in a couple of places:

1. A dependency graph for distributions. This came directly from distutils2,
though I've added a couple of bits to it such as topological sorting and
determination of strongly-connected components.
2. A lightweight sequencer for build steps, added to avoid the approach in
distutils/distutils2 which makes it harder than necessary to handle custom
build steps. I didn't use the graph system used in point 1, as it was too
specific, and I haven't had time to look at refactoring it.

There's another potential use case in the area of packaging, though perhaps not
in distlib itself: the idea of generating build artifacts based on their
dependencies. Ideally, this would consider not only build artifacts and their
dependencies, but also the builders themselves as part of the graph.

Regards,

Vinay Sajip

[1] https://distlib.readthedocs.org/en/latest/

Guido van Rossum

unread,
Dec 16, 2012, 10:41:07 AM12/16/12
to Vinay Sajip, python...@python.org
I think of graphs and trees as patterns, not data structures.



--
--Guido van Rossum (on iPad)

Oscar Benjamin

unread,
Dec 16, 2012, 5:41:28 PM12/16/12
to Mark Adam, Python-Ideas
On 9 December 2012 20:31, Mark Adam <dreamin...@gmail.com> wrote:
>
> On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore <p.f....@gmail.com> wrote:
> > On 9 December 2012 01:29, Mark Adam <dreamin...@gmail.com> wrote:
> >> All very interesting. I'm going to suggest a sort of "meta-discussion"
> >> about why -- despite the power of graphs as a data structure -- such a
> >> feature has not stabilized into a workable solution for inclusion in a
> >> high-level language like Python.
> >>
> >> I identity the following points of "wavery":
> >>
> >> 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds,
> >> 2) what methods to include (degree + neighbors or the standard dict's
> >> __len__ + __getitem__): API grounds
> >> 3) how much flexibility to be offered (directed, multi-graphs, edge weights
> >> with arbitrary labeling, etc.): functionality grounds
> >> 4) what underlying data structure to use (sparse adjacency dicts, matrices,
> >> etc): representation conflicts.

For all the reasons above I don't much see the utility of implementing
some kind of standard graph class. There are too many possibilities
for any one implementation to be generally applicable. I have
implemented graphs in Python many times and I very often find that the
detail of what I want to do leads me to create a different
implementation. Another consideration that you've not mentioned is the
occasional need for a OrderedGraph that keeps track of some kind of
order for its vertices.

> > 4) Whether the library requires some sort of "Vertex" type, or works
> > with arbitrary values, similarly whether there is a defined "Edge"
> > class or edges can be labelled, weighted, etc with arbitrary Python
> > values.
>
> This I put under #3 (functionality grounds) "edge weights with
> arbitrary labeling", Vertex's with abitrary values i think would be
> included.

Having implemented graphs a few times now, I have come to the
conclusion that it is a good idea to make the restriction that the
vertices should be hashable. Otherwise, how would you get O(1)
behaviour for methods like has_edge()?

> > 5) Granularity - if all I want is a depth-first search algorithm, why
> > pull in a dependency on 100 graph algorithms I'm not interested in?
>
> Hmm, I would call this "5) comprehensiveness: whether to include every
> graph algorithm known to mankind."

This is the one part of a graph library that is really useful.
Creating a class or a data structure that represents a graph in some
way is trivially easy. Creating trustworthy implementations of all the
graph-theoretic algorithms with the right kind of big-O behaviour is
not.

> > My feeling is that graphs are right on the borderline of a data
> > structure that is simple enough that people invent their own rather
> > than bother conforming to a "standard" model but complex enough that
> > it's worth using library functions rather than getting the details
> > wrong.

What details would you get wrong? I contend that it is very easy to
implement a graph without getting any of the details wrong. It is the
graph algorithms that are hard, not the data structure. Here's a
couple of examples:

G = {
'A':{'B', 'C'},
'B':{'A'},
'C':{'B'}
}

M = [[0, 1, 1],
[1, 0, 0],
[0, 1, 0]]

You may want to wrap the above in some kind of class in which case
you'll end up with something like the following (from a private
project - modified a little before posting so it may not work now):

class Graph:

def __init__(self, nodes, edges):
self._nodes = frozenset(nodes)
self._edges = defaultdict(set)
for n1, n2 in edges:
self._edges[n1].add(n2)

@property
def nodes(self):
return iter(self._nodes)

@property
def edges(self):
for n1 in self._nodes:
for n2 in self.edges_node(n1):
yield (n1, n2)

def edges_node(self, node):
return iter(self._edges[node])

def has_edge(self, nfrom, nto):
return nto in self._edges[nfrom]

def __str__(self):
return '\n'.join(self._iterdot())

def _iterdot(self):
yield 'digraph G {'
for n in self.nodes:
yield ' %s;' % n
for nfrom, nto in self.edges:
yield ' %s -> %s;' % (nfrom, nto)
yield '}'

G2 = Graph('ABC', [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'A')])

The above class is unusual in the sense that it is a pure Graph class.
Normally I would simply be adding a few graphy methods onto a class
that represents a network of some kind.

The problem that I found with current support for graphs in Python is
not the lack of an appropriate data structure. Rather the problem is
that implementations of graph-theoretic algorithms (as in e.g.
pygraph) are tied to a specific Graph class that I didn't want to or
couldn't use in my own project. This means that to determine if you
have something that represents a strongly connected graph you first
need to create a separate redundant data structure and then pass that
into the algorithm.

What would be more useful than a new Graph class would be
implementations of graph algorithms that can easily be applied to any
representation of a graph. As an example, I can write a function for
determining if a graph is a DAG using a small subset of the possible
methods that a Graph class would have:

def is_dag(nodes, edges_node):
'''Determine if a directed graph is acyclic

nodes is an iterable yielding all vertices in the graph
edges_node(node) is an iterable giving all nodes that node connects to
'''
visited = set()
visiting = set()
for node in nodes:
if node not in visited:
if has_backedge(node, edges_node, visited, visiting):
return False
else:
return True

def has_backedge(node, edges_node, visited, visiting):
'''Helper for is_dag()'''
if node in visiting:
return True
visited.add(node)
visiting.add(node)
for childnode in edges_node(node):
if has_backedge(childnode, edges_node, visited, visiting):
return True
visiting.remove(node)
return False

This can be used with the Graph class or just as easily with the
dict-of-sets like so:

is_dag(G2.nodes, G2.edges_node)
is_dag(G, G.__getitem__)

It is possible to do something like this for all of the graph
algorithms and I think that a library like this would be more useful
than a new Graph type.

Hannu Krosing

unread,
Dec 16, 2012, 6:28:04 PM12/16/12
to Guido van Rossum, Vinay Sajip, python...@python.org
On 12/16/2012 04:41 PM, Guido van Rossum wrote:
I think of graphs and trees as patterns, not data structures.

How do you draw line between what is data structure and what is pattern ?

Do you have any ideas on how to represent "patterns" in
python standard library ?

By a set of samples ?

By (a set of) classes realising the patterns ?

By a set of functions working on existing structures which
implement the pattern ?

Duck-typing should lend itself well to this last approach.

Do we currently have any modules in standard library which are
more patterns and less data structures ?

-----------------------
Hannu



--
--Guido van Rossum (on iPad)


Nick Coghlan

unread,
Dec 17, 2012, 10:26:38 PM12/17/12
to Hannu Krosing, Vinay Sajip, python...@python.org
On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing <ha...@krosing.net> wrote:
On 12/16/2012 04:41 PM, Guido van Rossum wrote:
I think of graphs and trees as patterns, not data structures.

How do you draw line between what is data structure and what is pattern ?

A rough rule of thumb is that if it's harder to remember the configuration options in the API than it is to just write a purpose-specific function, it's probably better as a pattern that can be tweaked for a given use case than it is as an actual data structure.

More generally, ABCs and magic methods are used to express patterns (like iteration), which may be implemented by various data structures.

A graph library that focused on defining a good abstraction (and adapters) that allowed graph algorithms to be written that worked with multiple existing Python graph data structures could be quite interesting.

Cheers,
Nick.

--
Nick Coghlan   |   ncog...@gmail.com   |   Brisbane, Australia

Terry Reedy

unread,
Dec 18, 2012, 4:06:39 AM12/18/12
to python...@python.org
On 12/17/2012 10:26 PM, Nick Coghlan wrote:
> On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing
> <ha...@krosing.net
> <mailto:ha...@krosing.net>> wrote:
>
> On 12/16/2012 04:41 PM, Guido van Rossum wrote:
>> I think of graphs and trees as patterns, not data structures.
>
> How do you draw line between what is data structure and what is
> pattern ?
>
>
> A rough rule of thumb is that if it's harder to remember the
> configuration options in the API than it is to just write a
> purpose-specific function, it's probably better as a pattern that can be
> tweaked for a given use case than it is as an actual data structure.
>
> More generally, ABCs and magic methods are used to express patterns
> (like iteration), which may be implemented by various data structures.
>
> A graph library that focused on defining a good abstraction (and
> adapters) that allowed graph algorithms to be written that worked with
> multiple existing Python graph data structures could be quite interesting.

I was just thinking that what is needed, at least as a first step, is a
graph api, like the db api, that would allow the writing of algorithms
to one api and adapters to various implementations. I expect to be
writing some graph algorithms (in Python) in the next year and will try
to keep that idea in mind and see if it makes any sense, versus just
whipping up a implementation that fits the particular problem.

--
Terry Jan Reedy

Oscar Benjamin

unread,
Dec 18, 2012, 7:08:50 AM12/18/12
to Terry Reedy, python-ideas
On 18 December 2012 09:06, Terry Reedy <tjr...@udel.edu> wrote:
> On 12/17/2012 10:26 PM, Nick Coghlan wrote:
>> A graph library that focused on defining a good abstraction (and
>> adapters) that allowed graph algorithms to be written that worked with
>> multiple existing Python graph data structures could be quite interesting.
>
>
> I was just thinking that what is needed, at least as a first step, is a
> graph api, like the db api, that would allow the writing of algorithms to
> one api and adapters to various implementations. I expect to be writing some
> graph algorithms (in Python) in the next year and will try to keep that idea
> in mind and see if it makes any sense, versus just whipping up a
> implementation that fits the particular problem.

I'd be interested to use (and possibly to contribute to) a graph
library of this type on PyPI. I have some suggestions about the
appropriate level of abstraction below.

The graph algorithms that are the most useful can be written in terms
of two things:
1) An iterator over the nodes
2) A way to map each node into an iterator over its children (or partners)

It is also required to place some restriction on how the nodes can be
used. Descriptions of graph algorithms refer to marking/colouring the
nodes of a graph. If the nodes are instances of user defined classes,
then you can do this in a relatively literal sense by adding
attributes to the nodes, but this is fragile in the event of errors,
not thread-safe, etc.

Really though, the idea of marking the nodes just means that you need
an O(1) method for determining if a node has been marked or checking
the value that it was marked with. In Python this is easily done with
sets and dicts, which is not fragile and is thread-safe etc. (provided
the graph is not being mutated). This requires that the nodes be
hashable.

In the thread about deleting keys from a dict yesterday it occurred to
me (after MRAB's suggestion) that you can still apply the same methods
to non-hashable objects. That is, provided you have a situation where
node equality is determined by node identity you can just use id(node)
in each hash table. While this method works equally well for
user-defined class instances it does not work for immutable types
where, for example, two strings may be equal but have differing id()s.

One way to cover all cases is simply to provide a hashkey argument to
each algorithm that defaults to the identity function (lambda x: x),
but may be replaced by the id function in appropriate cases. This
means that all of the graph algorithms that I would want can be
implemented with a basic signature that goes like:

def strongly_connected(nodes, edgesfunc, hashkey=None):
'''
`nodes` is an iterable over the nodes of the graph.
`edgesfunc(node)` is an iterable over the children of node.
`hashkey` is an optional key function to apply when adding
nodes to a hash-table. For mutable objects where identity
is equality use `hashkey=id`.
'''
if hashkey is None:
# Would be great to have operator.identity here
hashkey = lambda x: x

There are some cases where optimisation is possible given additional
information. One example: I think it is possible to conclude that an
undirected graph contains at least one cycle if |E|>=|V|, so in this
case an optional hint parameter could give a shortcut for some graphs.
Generally, though, there are few algorithms where other quantities are
either required or are sufficient for all input graphs (exceptions to
the sufficient part of this rule are typically the relatively easy
algorithms like determining the mean degree).

Once you have algorithms that are implemented in this way it becomes
possible to piece them together as a concrete graph class, a mixin, an
ABC, a decorator that works like @functools.total_ordering or some
other class-based idiom. Crucially, though, unlike all of these class
based approaches, defining the algorithms firstly in a functional way
makes it easy to apply them to any data structure composed of
elementary types or of classes that you yourself cannot write or
subclass.


Oscar

Yuval Greenfield

unread,
Dec 18, 2012, 9:24:01 AM12/18/12
to Oscar Benjamin, python-ideas, Terry Reedy
On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin <oscar.j....@gmail.com> wrote:
On 18 December 2012 09:06, Terry Reedy <tjr...@udel.edu> wrote:
> On 12/17/2012 10:26 PM, Nick Coghlan wrote:
>> A graph library that focused on defining a good abstraction (and
>> adapters) that allowed graph algorithms to be written that worked with
>> multiple existing Python graph data structures could be quite interesting.
>
>
> I was just thinking that what is needed, at least as a first step, is a
> graph api, like the db api, that would allow the writing of algorithms to
> one api and adapters to various implementations. I expect to be writing some
> graph algorithms (in Python) in the next year and will try to keep that idea
> in mind and see if it makes any sense, versus just whipping up a
> implementation that fits the particular problem.

I'd be interested to use (and possibly to contribute to) a graph
library of this type on PyPI. I have some suggestions about the
appropriate level of abstraction below.

The graph algorithms that are the most useful can be written in terms
of two things:
1) An iterator over the nodes
2) A way to map each node into an iterator over its children (or partners)



Some graphs don't care for the nodes, all their information is in the edges. That's why most graph frameworks have iter_edges and iter_nodes functions. I'm not sure what's the clean way to represent the optional directionality of edges though.

Some example API's from networkx:


Terry Reedy

unread,
Dec 18, 2012, 11:06:29 AM12/18/12
to python...@python.org
On 12/18/2012 9:24 AM, Yuval Greenfield wrote:
>
> On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin
> <oscar.j....@gmail.com
> <mailto:oscar.j....@gmail.com>> wrote:

> I'd be interested to use (and possibly to contribute to) a graph
> library of this type on PyPI. I have some suggestions about the
> appropriate level of abstraction below.
>
> The graph algorithms that are the most useful can be written in terms
> of two things:
> 1) An iterator over the nodes

Or iterable if re-iteration is needed.

> 2) A way to map each node into an iterator over its children (or
> partners)

A callable could be either an iterator class or a generator function.

> Some graphs don't care for the nodes, all their information is in the
> edges. That's why most graph frameworks have iter_edges and iter_nodes
> functions. I'm not sure what's the clean way to represent the
> optional directionality of edges though.
>
> Some example API's from networkx:
>
> http://networkx.lanl.gov/reference/classes.html
> http://networkx.lanl.gov/reference/classes.digraph.html

Thank you both the the 'thought food'. Defining things in terms of
iterables and iterators instead of (for instance) sets is certainly the
Python3 way.

Oscar, I don't consider hashability an issue. General class instances
are hashable by default. One can even consider such instances as
hashable facades for unhashable dicts. Giving each instance a list
attribute does the same for lists.

The more important question, it seems to me, is whether to represent
nodes by counts and let the algorithm do its bookkeeping in private
structures, or to represent them by externally defined instances that
the algorithm mutates.

--
Terry Jan Reedy

Oscar Benjamin

unread,
Dec 18, 2012, 1:21:42 PM12/18/12
to Terry Reedy, python-ideas
On 18 December 2012 16:06, Terry Reedy <tjr...@udel.edu> wrote:
> On 12/18/2012 9:24 AM, Yuval Greenfield wrote:
>> On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin
>> <oscar.j....@gmail.com
>> <mailto:oscar.j....@gmail.com>> wrote:
>>
>> The graph algorithms that are the most useful can be written in terms
>> of two things:
>> 1) An iterator over the nodes
>
>
> Or iterable if re-iteration is needed.

True. Although there aren't many cases where re-iteration is needed.
The main exception would be if you wanted to instantiate a new Graph
as a result of the algorithm. For example a transitive pruning
function could be written to accept a factory like

def transitive_prune(nodes, childfunc, factory):
return factory(nodes, pruned_edges(nodes, childfunc))

in which case you need to be able to iterate once over the nodes for
the pruning algorithm and once to construct the new graph. In these
cases, the fact that you want to instantiate a new graph suggests that
your original graph was a concrete data structure so that it is
probably okay to require an iterable. To mutate the graph in place the
user would need to supply a function to remove edges:

def transitive_prune(nodes, childfunc, remove_edge):

>> 2) A way to map each node into an iterator over its children (or
>> partners)
>
> A callable could be either an iterator class or a generator function.
>
>> Some graphs don't care for the nodes, all their information is in the
>> edges. That's why most graph frameworks have iter_edges and iter_nodes
>> functions.

This is true. Some algorithms would rather have this information.
There are also a few that can proceed just from a particular node
rather than needing an iterable over all nodes.

>> I'm not sure what's the clean way to represent the
>> optional directionality of edges though.

I would have said that each API entry point should state how it will
interpret the edges. Are there algorithms that simultaneously make
sense for directed and undirected graphs while needing to behave
differently in the two cases (in which case is it really the same
algorithm)?

> Oscar, I don't consider hashability an issue. General class instances are
> hashable by default. One can even consider such instances as hashable
> facades for unhashable dicts. Giving each instance a list attribute does the
> same for lists.

True, I've not found hashability to be a problem in practice.

> The more important question, it seems to me, is whether to represent nodes
> by counts and let the algorithm do its bookkeeping in private structures, or
> to represent them by externally defined instances that the algorithm
> mutates.

I don't think I understand: How would the "externally defined instances" work?

Do you mean that the caller of a function would supply functions like
mark(), is_marked(), set_colour(), get_colour() and so on? If that's
the case what would the advantages be? I can think of one: if desired
the algorithm could be made to store all of its computations in say a
database so that it would be very scalable. Though to me that seems
like quite a specialised case that would probably merit from
reimplementing the desired algorithm anyway. Otherwise I guess it's a
lot simpler/safer to implement everything in private data structures.


Oscar
Reply all
Reply to author
Forward
0 new messages