Breadth-first search as generator-function

547 views
Skip to first unread message

Salim Fadhley

unread,
Dec 12, 2008, 8:23:48 PM12/12/08
to networkx-discuss
I need to be able to find the nodes that are less than a certain
distance away from a set of nodes. Does networkx provide an algorithm
that makes this sort of thing particularly easy?

If not, I'd write one myself: The obvious way to do this is a breadth-
first search - since such a search would start from the inside and
work outwards after a certain point you could guarantee that all of
the nodes found would be further than my maximum distance from a start-
point.

I'd write such a function as a generator so that when I've got enough
results the search would just end... something like this:

nodes = set()
for bfs( G, start, max_wt ):
nodes.add( node )

Can anybody think of a better way to to it when you know you have a
list of nodes - some of the nodes might be very close together so to
blindly bfs for each of them might waste a lot of time.

Thanks

Aric Hagberg

unread,
Dec 12, 2008, 8:35:33 PM12/12/08
to networkx...@googlegroups.com
On Fri, Dec 12, 2008 at 05:23:48PM -0800, Salim Fadhley wrote:
>
> I need to be able to find the nodes that are less than a certain
> distance away from a set of nodes. Does networkx provide an algorithm
> that makes this sort of thing particularly easy?
>
> If not, I'd write one myself: The obvious way to do this is a breadth-
> first search - since such a search would start from the inside and
> work outwards after a certain point you could guarantee that all of
> the nodes found would be further than my maximum distance from a start-
> point.

There are functions in the path module that will provide this data
(with a distance cutoff).
http://networkx.lanl.gov/reference/algorithms.traversal.html#module-networkx.algorithms.traversal.path
(They use BFS and Dijkstra's algorithm). But they might not be the
most efficient if you just want the nodes and not the paths or path
lengths.

On the other hand if you actually want the subgraph
in the 1.0 pre-release version there is "ego_graph"
http://networkx.lanl.gov/preview/reference/generated/networkx.ego_graph.html#networkx.ego_graph
that gives the induced subgraph for a node and neighbors at distance
1. It wouldn't be too hard to extend that an arbitrary size
neighborhood.

Aric

Salim Fadhley

unread,
Dec 13, 2008, 5:42:18 AM12/13/08
to networkx-discuss
Actually I just want the nodes:

I've noticed that there does not seem to be any traversal generator
functions in the library - surely for search algorithms generators are
a great benefit: If you find what you want part way through the search
then you don't have to waste time doing the rest of it.

Might a good starting point be a the most basic bfs and dfs
implemented as a generator?

Sal

On Dec 13, 1:35 am, Aric Hagberg <hagb...@lanl.gov> wrote:
> On Fri, Dec 12, 2008 at 05:23:48PM -0800, Salim Fadhley wrote:
>
> > I need to be able to find the nodes that are less than a certain
> > distance away from a set of nodes. Does networkx provide an algorithm
> > that makes this sort of thing particularly easy?
>
> > If not, I'd write one myself: The obvious way to do this is a breadth-
> > first search - since such a search would start from the inside and
> > work outwards after a certain point you could guarantee that all of
> > the nodes found would be further than my maximum distance from a start-
> > point.
>
> There are functions in the path module that will provide this data
> (with a distance cutoff).  http://networkx.lanl.gov/reference/algorithms.traversal.html#module-n...
> (They use BFS and Dijkstra's algorithm).  But they might not be the
> most efficient if you just want the nodes and not the paths or path
> lengths.  
>
> On the other hand if you actually want the subgraph
> in the 1.0 pre-release version there is "ego_graph"http://networkx.lanl.gov/preview/reference/generated/networkx.ego_gra...

Aric Hagberg

unread,
Dec 13, 2008, 10:10:36 AM12/13/08
to networkx...@googlegroups.com
On Sat, Dec 13, 2008 at 02:42:18AM -0800, Salim Fadhley wrote:
>
> Actually I just want the nodes:
>
> I've noticed that there does not seem to be any traversal generator
> functions in the library - surely for search algorithms generators are
> a great benefit: If you find what you want part way through the search
> then you don't have to waste time doing the rest of it.
>
> Might a good starting point be a the most basic bfs and dfs
> implemented as a generator?

If I understand what you mean correctly I think you want to implement
"visitor" methods. That is, graph traversal methods (say for BFS or
DFS) that provide optional callbacks at different stages of the
traversal. One kind of callback would be to terminate the traversal
when some condition is found.

For simple cases a specific function (shortest path) will be
fastest if implemented without callbacks. That is currently what the
shortest paths algorithms do. For example your case of finding all of
the nodes in G within d hops of node n can be done with
nodes=single_source_shortest_path(G,n,cutoff=d).keys().


Aric

Salim Fadhley

unread,
Dec 14, 2008, 2:21:23 PM12/14/08
to networkx-discuss
That would work if we have a generic object which could be tested by
the callback functions, so for example I could have something like
this:
http://python.pastebin.com/d74380944

The idea is that the test function needs more than just the node to go
on - there's always going to be some context-relevant information
(e.g. the total cost of reaching the currently testing node) which is
not so much a property of the node but a consequence of the kind of
search we are doing right now.

I can go implement this if you are happy with it.

Sal

Aric Hagberg

unread,
Dec 14, 2008, 7:43:56 PM12/14/08
to networkx...@googlegroups.com
On Sun, Dec 14, 2008 at 11:21:23AM -0800, Salim Fadhley wrote:
>
> That would work if we have a generic object which could be tested by
> the callback functions, so for example I could have something like
> this:
> http://python.pastebin.com/d74380944
>
> The idea is that the test function needs more than just the node to go
> on - there's always going to be some context-relevant information
> (e.g. the total cost of reaching the currently testing node) which is
> not so much a property of the node but a consequence of the kind of
> search we are doing right now.
>
> I can go implement this if you are happy with it.

I'm still a little uncertain about what you are aiming for there.
But if you implement something please post back here and show us
how it works!

Aric

Matteo Dell'Amico

unread,
Dec 15, 2008, 5:02:58 AM12/15/08
to networkx...@googlegroups.com
I think Salim wanted something like this:

def bfs(g, start):
    visited, queue, enqueued = set(), [start], set([start])
    i = 0
    while len(queue) > i:
        n = queue[i]
        yield n
        new = set(g[n]) - enqueued
        enqueued |= new
        queue.extend(new)
        i += 1

def dfs(g, start):
    visited, stack, enqueued = set(), [start], set([start])
    while stack:
        n = stack.pop()
        yield n
        new = set(g[n]) - enqueued
        enqueued |= new
        stack.extend(new)

This way you can write cycles like "for node in dfs(g, startnode)". Writing a generator is the Pythonic way of iterating through data structures, instead of passing callbacks around :)

--
Ciao,
Matteo

Salim Fadhley

unread,
Dec 15, 2008, 5:38:22 AM12/15/08
to networkx-discuss
I like generator functions like this - it put much more control into
the hands of the user.

My only suggestion is that instead of yeilding the n (the current
node), I'd want the functions to yield a node wrapped up in some kind
of object that gives me information about the status of the search
function.

One benefit of the callback method is that you can change the behavior
of the function - for example if you wanted to modify a bfs function
so that it ignores certain kinds of nodes. It would be far easier to
do that via a callback than simply filtering the output of a generator
bfs function. Ideally I'd like a combination of both: A generator
function that accepts a callback.

Sal

On Dec 15, 10:02 am, "Matteo Dell'Amico" <matteodellam...@gmail.com>
wrote:

Aric Hagberg

unread,
Dec 15, 2008, 9:12:47 AM12/15/08
to networkx...@googlegroups.com
On Mon, Dec 15, 2008 at 02:38:22AM -0800, Salim Fadhley wrote:
>
> I like generator functions like this - it put much more control into
> the hands of the user.
>
> My only suggestion is that instead of yeilding the n (the current
> node), I'd want the functions to yield a node wrapped up in some kind
> of object that gives me information about the status of the search
> function.
>
> One benefit of the callback method is that you can change the behavior
> of the function - for example if you wanted to modify a bfs function
> so that it ignores certain kinds of nodes. It would be far easier to
> do that via a callback than simply filtering the output of a generator
> bfs function. Ideally I'd like a combination of both: A generator
> function that accepts a callback.

Matteo definitely has the right idea with generators (and a very
compact and clear representation of bfs and dfs).

We do have quite a few methods, sometimes labeled with _iter(), that
return generators. For example degree_iter(). We made the decision
not to make degree() a generator since it seemed surprising to users
to have to write "list(degree())" or "dict(degree())" to get what they
expected.

But generators are great; I like the UNIX-like pipe and filter way of
processing things. If you haven't read David Beazley's presentation
slides on generators see
http://www.dabeaz.com/generators/Generators.pdf

And for bfs and dfs maybe a version with callbacks is important.
It will be slower but might allow implementing and testing algorithms
that you can't write easily with the simple generator functions.
For example, in dfs if you need to do something when you first see a
node and when you last process see a node then the generator won't work.
(Or something similar with processing edges).

Aric


Matteo Dell'Amico

unread,
Dec 15, 2008, 10:41:18 AM12/15/08
to networkx...@googlegroups.com
Depending on what you're doing, it may be good enough to just do

bfs(g.subgraph(n for n in g.nodes_iter() if filter(n)), start)

...or maybe extending the generator with a callback (also, I changed the bfs implementation to use collections.deque):

def bfs(g, start, filter=lambda x:True):
    visited, queue, enqueued = set(), deque([start]), set([start])
    while queue:
        n = queue.popleft()
        if not filter(n):
            continue

        yield n
        new = set(g[n]) - enqueued
        enqueued |= new
        queue.extend(new)
        i += 1

I actually have no idea which hooks would be the most useful anyway...

--
Ciao,
Matteo

Matteo Dell'Amico

unread,
Mar 4, 2009, 8:59:29 AM3/4/09
to networkx...@googlegroups.com
I revive this thread because I happened to comment on a BFS graph traversal recipe on the Python Cookbook (http://code.activestate.com/recipes/576675/), and it made me think again to graph visits.

I think that what the graph visit generators should not return just the child, but also the parent. This way, we can use them to implement various stuff. As a proof of concept, I implemented BFS, DFS and the Dijkstra and A* algorithms.

Please have a look at this and tell me if you think if such a visits module could be useful in NX.

--
Ciao,
Matteo
visits.py

Aric Hagberg

unread,
Mar 4, 2009, 10:41:00 AM3/4/09
to networkx...@googlegroups.com

Yes, definitely. There are many applications for this.
Aric

Matteo Dell'Amico

unread,
Mar 4, 2009, 1:01:17 PM3/4/09
to networkx...@googlegroups.com
2009/3/4 Aric Hagberg <aric.h...@gmail.com>

Yes, definitely.  There are many applications for this.

Yep, but I accidentally attached the wrong file :) This is the one I was meaning :)

--
Ciao,
Matteo
nx_visits.py

Arthur

unread,
Mar 10, 2009, 9:16:31 PM3/10/09
to networkx-discuss
Hello Matteo,

I was reading this thread and your comment in http://code.activestate.com/recipes/576675/
and I'm trying to modify the bfs to provide all paths with a certain
cutoff, given a XGraph.

since you said that the search algorithm should not visit nodes more
than once and I believe this is for sure, could you help me to find
the way to modify the bfs to provide all paths with a certain cutoff?

Many thanks,

Arthur

Matteo Dell'Amico

unread,
Mar 11, 2009, 8:28:37 AM3/11/09
to networkx...@googlegroups.com


2009/3/11 Arthur <arthur...@gmail.com>

What do you mean precisely as "cutoff"?
 
--
Ciao,
Matteo

Arthur

unread,
Mar 14, 2009, 2:47:01 PM3/14/09
to networkx-discuss
Hi,

I mean a distance between the source node and the target node in terms
of edges.
for example:

a - b - c -d
a - b - c
a - b- d - e- c

all the paths between a and c with cutoff >= 2 will be the sencond and
third one.

Regards,

Arthur

On Mar 11, 1:28 pm, "Matteo Dell'Amico" <matteodellam...@gmail.com>
wrote:
> 2009/3/11 Arthur <arthur.tej...@gmail.com>

Dan Schult

unread,
Mar 14, 2009, 5:09:55 PM3/14/09
to networkx...@googlegroups.com
> I mean a distance between the source node and the target node in terms
> of edges.
> for example:
>
> a - b - c -d
> a - b - c
> a - b- d - e- c
>
> all the paths between a and c with cutoff >= 2 will be the sencond and
> third one.

??? do you mean the don't all the paths have 2 or more edges? Did you
mean <=2 and first and second?

Anyway, take a look at the source for single_source_shortest_path()
here:
https://networkx.lanl.gov/trac/browser/networkx/trunk/networkx/
algorithms/traversal/path.py
it's about line 300 in the file. It has a cutoff, but doesn't keep
all paths (only
shortest ones) and doesn't check to see if a target node has been found.
It should be fairly straight-forward (though not necessarily easy) to
add
lines that do that.

First step is to specify exactly what you want the cutoff to mean--
then write that
as a python statement so you can use it as a criteria.
Dan

Reply all
Reply to author
Forward
0 new messages