On Sun, Nov 4, 2012 at 11:19 AM, GNS <
gabriel....@gmail.com> wrote:
> Hello all,
>
> We have a fairly large network, 5547 nodes, 20625 edges. To summarize, we
> are getting MemoryError issues with the NumPy-based eigenvector centrality,
> but the python-based eigenvector centrality algorithm works just fine, and
> much more quickly.
>
> Details:
>
> In calculating eigenvector centrality, our application bombed with the
> following error:
>
> Traceback (most recent call last):
> File "comp_centrality.py", line 1567, in <module>
> main()
> File "comp_centrality.py", line 1560, in main
> sanitise=p['sanitise'],
> File "comp_centrality.py", line 1095, in do_comp_centrality
> use_numpy=p['networkx_eigenvector_centrality_use_numpy'])
> File "comp_centrality.py", line 779, in compute_eigenvector_nx
> centrality=nx.eigenvector_centrality_numpy(G)
> File
> "C:\Python27\lib\site-packages\networkx-1.7-py2.7.egg\networkx\algorithms\centrality\eigenvector.py",
> line 152, in eigenvector_centrality_numpy
> eigenvalues,eigenvectors=np.linalg.eig(A)
> File "C:\Python27\lib\site-packages\numpy\linalg\linalg.py", line 1056, in
> eig
> v = array(vr, w.dtype)
> MemoryError
>
> Elapsed time for failed eigenvector centrality calculation ==>
> 0:17:47.750000
> Furthermore, the python process was taking upwards of 700MB of RAM.
>
> However, I reverted to using the standard python-based eigenvector
> calculation, which was able to run successfully.
> Elapsed time for eigenvector centrality calculation (non-NumPy) ==>
> 0:00:04.109000
>
> I assumed the NumPy version would be more efficient, but it seems to have a
> much higher memory and runtime footprint.
>
> Does anyone have any insights into this discrepancy in performance, between
> the NumPy and non-NumPy eigenvector centrality computations?
The pure-Python eigenvector centrality algorithm
(nx.eigenvector_centrality) uses the power method to find the single
largest eigenvalue/eigenvector pair. This is pretty efficient in some
cases since you don't need to copy the graph data and computing a
single eigenvalue/eigenvector is cheaper than computing all of them.
The NumPy-based eigenvector centrality algorithm
(nx.eigenvector_centrality_numpy()) first makes a copy of the graph to
a dense NumPy array (in your case a ~5000x5000 matrix) and then
computes *all* of the eigenvalues and eigenvectors using NumPy's
interface to LAPACK. The storage required increases with N as N**2
and the computational I think increases as N**3.
Aric