Average path length on large graphs (estimation)

1,602 views
Skip to first unread message

Conrad Lee

unread,
Jan 26, 2009, 5:30:52 AM1/26/09
to networkx-discuss
Does anybody know of the best way to find the average shortest path
length for all pairs of nodes in a graph?

So far I have been calculating this measurement by using the function
networkx.all_pairs_shortest_path_length()
and then simply taking the average of all the lengths.

The problem is that networkx.all_pairs_shortest_path_length() becomes
slow as the graph becomes large (in the case of my data, as the graph
size exceeds 300). I have noticed that in the literature there are
methods for quickly estimating this metric (e.g.,
http://www.citeulike.org/user/conradlee/article/3441458).

Has anyone implemented one of the estimation algorithms? Could the
recently-discussed A* algorithm be used for this? Anyone have ideas
about a nice solution?

Thanks,

Conrad Lee

Aric Hagberg

unread,
Jan 26, 2009, 11:43:11 AM1/26/09
to networkx...@googlegroups.com
On Mon, Jan 26, 2009 at 02:30:52AM -0800, Conrad Lee wrote:
> Does anybody know of the best way to find the average shortest path
> length for all pairs of nodes in a graph?
> So far I have been calculating this measurement by using the function
> networkx.all_pairs_shortest_path_length()
> and then simply taking the average of all the lengths.

You might get a little improvement using the
average_shortest_path_length() function we added recently.
I just tested it with

In [32]: G=networkx.gnp_random_graph(1000,0.1)

In [33]: G.number_of_edges()
Out[33]: 50383

In [34]: G.number_of_nodes()
Out[34]: 1000

In [35]: print networkx.average_shortest_path_length(G)
1.897266

and that took about 10 seconds.

> Has anyone implemented one of the estimation algorithms? Could the
> recently-discussed A* algorithm be used for this? Anyone have ideas
> about a nice solution?

I remember seeing some papers using sampling to estimate the diameter
and other properties. You might try searching for those.

Aric

Salim Fadhley

unread,
Jan 26, 2009, 1:40:24 PM1/26/09
to networkx-discuss
Yep - the "Chow" algorithm is intended for that sort of thing:

It lets you compute a good-enough guesstimate as to how far any node
is from any other node. If you look in the source code you can see a
link to Chow's original paper where he proposes the algorithm and
explains why it is a good-enough estimator for most heuristics.

Sal


Conrad Lee

unread,
Feb 18, 2009, 11:33:52 AM2/18/09
to networkx...@googlegroups.com
Salim,

Thanks for the reply.  I see that you've implemented the Chow algorithm as a heuristic for the A* search here.

I'm not quite sure how to use chow.py to find a quick estimate of the average shortest path length in a graph, but maybe you can tell me if what I'm doing is right.

In [1]: import chow

In [2]: import networkx

In [3]: G = networkx.gnp_random_graph(500, .1)

In [4]: random_centers = G.nodes()[:10]

In [5]: chow_object = chow.chow( random_centers )

In [6]: chow_object.optimize( G )

at this point I could create a list of all pairs of nodes, and then for each pair run

In [10]:for each pair in node_pairs:
           estimated_distance = chow_object.__call__( pair[0], pair[1] )

If I did this for each pair of nodes, and then found the average estimated distance, it should return something similar to the true average shortest path length, which can be calculated by simply running

In [12]: print networkx.average_shotest_path_length( G )

I have tried this, and the average shortest path length returned using the chow estimation above was 0.9315, whereas the path length returned by the second method was 1.90.  Am I doing something wrong?  Maybe I need more centers?  Fewer?

Are there other implemented heuristics for the recently-improved network.traversal.astar.astar_length() function that can estimate average shortest path length quickly?

Thanks,

Conrad
Reply all
Reply to author
Forward
0 new messages