On 9 December 2012 20:31, Mark Adam <
dreamin...@gmail.com> wrote:
>
> On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore <
p.f....@gmail.com> wrote:
> > On 9 December 2012 01:29, Mark Adam <
dreamin...@gmail.com> wrote:
> >> All very interesting. I'm going to suggest a sort of "meta-discussion"
> >> about why -- despite the power of graphs as a data structure -- such a
> >> feature has not stabilized into a workable solution for inclusion in a
> >> high-level language like Python.
> >>
> >> I identity the following points of "wavery":
> >>
> >> 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds,
> >> 2) what methods to include (degree + neighbors or the standard dict's
> >> __len__ + __getitem__): API grounds
> >> 3) how much flexibility to be offered (directed, multi-graphs, edge weights
> >> with arbitrary labeling, etc.): functionality grounds
> >> 4) what underlying data structure to use (sparse adjacency dicts, matrices,
> >> etc): representation conflicts.
For all the reasons above I don't much see the utility of implementing
some kind of standard graph class. There are too many possibilities
for any one implementation to be generally applicable. I have
implemented graphs in Python many times and I very often find that the
detail of what I want to do leads me to create a different
implementation. Another consideration that you've not mentioned is the
occasional need for a OrderedGraph that keeps track of some kind of
order for its vertices.
> > 4) Whether the library requires some sort of "Vertex" type, or works
> > with arbitrary values, similarly whether there is a defined "Edge"
> > class or edges can be labelled, weighted, etc with arbitrary Python
> > values.
>
> This I put under #3 (functionality grounds) "edge weights with
> arbitrary labeling", Vertex's with abitrary values i think would be
> included.
Having implemented graphs a few times now, I have come to the
conclusion that it is a good idea to make the restriction that the
vertices should be hashable. Otherwise, how would you get O(1)
behaviour for methods like has_edge()?
> > 5) Granularity - if all I want is a depth-first search algorithm, why
> > pull in a dependency on 100 graph algorithms I'm not interested in?
>
> Hmm, I would call this "5) comprehensiveness: whether to include every
> graph algorithm known to mankind."
This is the one part of a graph library that is really useful.
Creating a class or a data structure that represents a graph in some
way is trivially easy. Creating trustworthy implementations of all the
graph-theoretic algorithms with the right kind of big-O behaviour is
not.
> > My feeling is that graphs are right on the borderline of a data
> > structure that is simple enough that people invent their own rather
> > than bother conforming to a "standard" model but complex enough that
> > it's worth using library functions rather than getting the details
> > wrong.
What details would you get wrong? I contend that it is very easy to
implement a graph without getting any of the details wrong. It is the
graph algorithms that are hard, not the data structure. Here's a
couple of examples:
G = {
'A':{'B', 'C'},
'B':{'A'},
'C':{'B'}
}
M = [[0, 1, 1],
[1, 0, 0],
[0, 1, 0]]
You may want to wrap the above in some kind of class in which case
you'll end up with something like the following (from a private
project - modified a little before posting so it may not work now):
class Graph:
def __init__(self, nodes, edges):
self._nodes = frozenset(nodes)
self._edges = defaultdict(set)
for n1, n2 in edges:
self._edges[n1].add(n2)
@property
def nodes(self):
return iter(self._nodes)
@property
def edges(self):
for n1 in self._nodes:
for n2 in self.edges_node(n1):
yield (n1, n2)
def edges_node(self, node):
return iter(self._edges[node])
def has_edge(self, nfrom, nto):
return nto in self._edges[nfrom]
def __str__(self):
return '\n'.join(self._iterdot())
def _iterdot(self):
yield 'digraph G {'
for n in self.nodes:
yield ' %s;' % n
for nfrom, nto in self.edges:
yield ' %s -> %s;' % (nfrom, nto)
yield '}'
G2 = Graph('ABC', [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'A')])
The above class is unusual in the sense that it is a pure Graph class.
Normally I would simply be adding a few graphy methods onto a class
that represents a network of some kind.
The problem that I found with current support for graphs in Python is
not the lack of an appropriate data structure. Rather the problem is
that implementations of graph-theoretic algorithms (as in e.g.
pygraph) are tied to a specific Graph class that I didn't want to or
couldn't use in my own project. This means that to determine if you
have something that represents a strongly connected graph you first
need to create a separate redundant data structure and then pass that
into the algorithm.
What would be more useful than a new Graph class would be
implementations of graph algorithms that can easily be applied to any
representation of a graph. As an example, I can write a function for
determining if a graph is a DAG using a small subset of the possible
methods that a Graph class would have:
def is_dag(nodes, edges_node):
'''Determine if a directed graph is acyclic
nodes is an iterable yielding all vertices in the graph
edges_node(node) is an iterable giving all nodes that node connects to
'''
visited = set()
visiting = set()
for node in nodes:
if node not in visited:
if has_backedge(node, edges_node, visited, visiting):
return False
else:
return True
def has_backedge(node, edges_node, visited, visiting):
'''Helper for is_dag()'''
if node in visiting:
return True
visited.add(node)
visiting.add(node)
for childnode in edges_node(node):
if has_backedge(childnode, edges_node, visited, visiting):
return True
visiting.remove(node)
return False
This can be used with the Graph class or just as easily with the
dict-of-sets like so:
is_dag(G2.nodes, G2.edges_node)
is_dag(G, G.__getitem__)
It is possible to do something like this for all of the graph
algorithms and I think that a library like this would be more useful
than a new Graph type.