Sparse way to compute the google matrix

Skip to first unread message


Jul 4, 2022, 8:59:00 AMJul 4
to networkx-discuss
Hi all,

I am trying to compute the row-normalized adjacency matrix P which appears inside the Google matrix (that whose leading eigenvector is the PageRank centrality). In order to do so, one option is to use nx.google_matrix(G, alpha = 1) . However if the graph is big (tens of millions of nodes) this becomes undoable.

Another option is computing P by hand: taking the scipy sparse version of the adjacency matrix of G, and applying some scipy operations on it. This is efficient and works for huge networks, BUT this can't handle the issue of dangling nodes (those with no outlinks), it just leaves the null rows like that. So unless the graph has no dangling nodes, this is not ideal.

I can only think of perhaps manually adding the links from the dangling nodes to other nodes of the network before applying this second method, but it seems a lot of work and a loss of efficiency for something not that complicated (in principle).

Thank you in advance,

Ross Barnowski

Jul 20, 2022, 8:44:28 AMJul 20
Hi Gonzalo,

That's a great question... At first glance, I don't see anything that would prevent using scipy.sparse arrays under-the-hood. The implementation is relatively straightforward:

... and none of the individual steps seem to *necessarily* be densifying operations. In the short term you might want to try re-implementing the above with `scipy.sparse.array` and evaluating it with your graph data. If that happens to work and manages to preserve sparsity throughout all the steps of the function, then a PR to switch to a sparse implementation would be a nice improvement for NetworkX!

Hopefully that helps!


You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
Reply all
Reply to author
0 new messages