Randomized BFS traversal

84 views
Skip to first unread message

Kapil Agrawal

unread,
Nov 8, 2023, 8:37:12 AM11/8/23
to networkx-discuss
Is there a faster way to implement a randomized BFS traversal? Currently, I have the following:

The problem with below snippet is for large graphs the random.shuffle becomes the bottleneck.. Any ideas?
def perform_random_bfs(self, g, source):
visited = set()
queue = [source]
random_dfs_list = [source]
visited.add(source)
while len(queue):
curr = queue.pop(0)
childs = []
for child in g.neighbors(curr):
if child not in visited:
visited.add(child)
childs.append(child)
queue.append(child)
random.shuffle(childs)
[random_dfs_list.append(child) for child in childs]
return random_dfs_list

Islam Akef Ebeid

unread,
Nov 8, 2023, 9:35:03 AM11/8/23
to networkx-discuss
Don't know what is randomized BFS compared to regular BFS but this researcher (link below) does GPU based parallel connected component detection. Might be helpful in case of large graphs.


David Menéndez Hurtado

unread,
Nov 8, 2023, 10:18:12 AM11/8/23
to networkx...@googlegroups.com
How big is the graph? What is the degree distribution?

There are a couple of things you can do to speed it up. The first is using a deque for the queue, since you are adding elements from the end (fast on a list) and removing them from the beginning (very slow for lists).

The other is to transform the list comprehension to a plain for loop, since you are now creating a list and throwing it away immediately. Or even better, replace it with random_dfs_list.extend(childs).

This does nothing to the timings of shuffle, but it does shave off some time (about 20% in my tests, depending on the size and degree distribution).


If you want to optimize the shuffle, you may want to implement your own shuffle using a faster but lower quality RNG.

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/8d01607f-bfdb-4e42-a7c3-d144148866c8n%40googlegroups.com.

David Menéndez Hurtado

unread,
Nov 8, 2023, 5:20:28 PM11/8/23
to networkx...@googlegroups.com
Ah, I just realised, numpy's shuffle is much faster than the standard library, about four times across multiple shapes. And you can even squeeze almost a factor of 2 more using SFC64 instead.

Ross Barnowski

unread,
Nov 9, 2023, 1:54:20 AM11/9/23
to networkx...@googlegroups.com
The built-in NetworkX implementations of traversal functions often provide hooks for such features. For example, `nx.generic_bfs_edges` has a `neighbors` parameter that accepts a callable to allow you to control how neighboring nodes are accessed. You could use this feature to access nodes in a random order with something like:

```
>>> G = nx.star_graph(5)  # Example input
>>> list(nx.generic_bfs_edges(G, 0))  # default neighbor order
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]

>>> def randomized(n):
...     neighbors = list(G[n])
...     random.shuffle(neighbors)
...     return neighbors

>>> list(nx.generic_bfs_edges(G, 0, neighbors=randomized))  # randomized node order
[(0, 3), (0, 5), (0, 4), (0, 1), (0, 2)]
```

This doesn't necessarily address your performance concerns, but I just thought it was worth drawing this feature to your attention!

~Ross

Reply all
Reply to author
Forward
0 new messages