Randomized BFS traversal

Kapil Agrawal

unread,

Nov 8, 2023, 8:37:12 AM11/8/23

to networkx-discuss

Is there a faster way to implement a randomized BFS traversal? Currently, I have the following:

The problem with below snippet is for large graphs the random.shuffle becomes the bottleneck.. Any ideas?

def perform_random_bfs(self, g, source):
        visited = set()
        queue = [source]
        random_dfs_list = [source]
        visited.add(source)
        while len(queue):
            curr = queue.pop(0)
            childs = []
            for child in g.neighbors(curr):
                if child not in visited:
                    visited.add(child)
                    childs.append(child)
                    queue.append(child)
            random.shuffle(childs)
            [random_dfs_list.append(child) for child in childs]
        return random_dfs_list

Islam Akef Ebeid

unread,

Nov 8, 2023, 9:35:03 AM11/8/23

to networkx-discuss

Don't know what is randomized BFS compared to regular BFS but this researcher (link below) does GPU based parallel connected component detection. Might be helpful in case of large graphs.

https://userweb.cs.txstate.edu/~burtscher/research.html

https://userweb.cs.txstate.edu/~mb92/papers/sc23c.pdf

David Menéndez Hurtado

unread,

Nov 8, 2023, 10:18:12 AM11/8/23

to networkx...@googlegroups.com

How big is the graph? What is the degree distribution?

There are a couple of things you can do to speed it up. The first is using a deque for the queue, since you are adding elements from the end (fast on a list) and removing them from the beginning (very slow for lists).

https://docs.python.org/3/library/collections.html#collections.deque

The other is to transform the list comprehension to a plain for loop, since you are now creating a list and throwing it away immediately. Or even better, replace it with random_dfs_list.extend(childs).

This does nothing to the timings of shuffle, but it does shave off some time (about 20% in my tests, depending on the size and degree distribution).

If you want to optimize the shuffle, you may want to implement your own shuffle using a faster but lower quality RNG.

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/8d01607f-bfdb-4e42-a7c3-d144148866c8n%40googlegroups.com.

David Menéndez Hurtado

unread,

Nov 8, 2023, 5:20:28 PM11/8/23

to networkx...@googlegroups.com

Ah, I just realised, numpy's shuffle is much faster than the standard library, about four times across multiple shapes. And you can even squeeze almost a factor of 2 more using SFC64 instead.

Ross Barnowski

unread,

Nov 9, 2023, 1:54:20 AM11/9/23

to networkx...@googlegroups.com

The built-in NetworkX implementations of traversal functions often provide hooks for such features. For example, `nx.generic_bfs_edges` has a `neighbors` parameter that accepts a callable to allow you to control how neighboring nodes are accessed. You could use this feature to access nodes in a random order with something like:

```

>>> G = nx.star_graph(5) # Example input

>>> list(nx.generic_bfs_edges(G, 0)) # default neighbor order

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]

>>> def randomized(n):

... neighbors = list(G[n])

... random.shuffle(neighbors)

... return neighbors

>>> list(nx.generic_bfs_edges(G, 0, neighbors=randomized)) # randomized node order

[(0, 3), (0, 5), (0, 4), (0, 1), (0, 2)]

```

This doesn't necessarily address your performance concerns, but I just thought it was worth drawing this feature to your attention!

~Ross

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAJhcF%3D0YWHGvtHoNmMjzZ7BoGC5siuTHHxjGj9uWCUTfZiExhA%40mail.gmail.com.

Reply all

Reply to author

Forward