Re: [networkx-discuss] Eigenvector performance issues with NumPy

754 views
Skip to first unread message

Aric Hagberg

unread,
Nov 4, 2012, 1:33:05 PM11/4/12
to networkx...@googlegroups.com
On Sun, Nov 4, 2012 at 11:19 AM, GNS <gabriel....@gmail.com> wrote:
> Hello all,
>
> We have a fairly large network, 5547 nodes, 20625 edges. To summarize, we
> are getting MemoryError issues with the NumPy-based eigenvector centrality,
> but the python-based eigenvector centrality algorithm works just fine, and
> much more quickly.
>
> Details:
>
> In calculating eigenvector centrality, our application bombed with the
> following error:
>
> Traceback (most recent call last):
> File "comp_centrality.py", line 1567, in <module>
> main()
> File "comp_centrality.py", line 1560, in main
> sanitise=p['sanitise'],
> File "comp_centrality.py", line 1095, in do_comp_centrality
> use_numpy=p['networkx_eigenvector_centrality_use_numpy'])
> File "comp_centrality.py", line 779, in compute_eigenvector_nx
> centrality=nx.eigenvector_centrality_numpy(G)
> File
> "C:\Python27\lib\site-packages\networkx-1.7-py2.7.egg\networkx\algorithms\centrality\eigenvector.py",
> line 152, in eigenvector_centrality_numpy
> eigenvalues,eigenvectors=np.linalg.eig(A)
> File "C:\Python27\lib\site-packages\numpy\linalg\linalg.py", line 1056, in
> eig
> v = array(vr, w.dtype)
> MemoryError
>
> Elapsed time for failed eigenvector centrality calculation ==>
> 0:17:47.750000
> Furthermore, the python process was taking upwards of 700MB of RAM.
>
> However, I reverted to using the standard python-based eigenvector
> calculation, which was able to run successfully.
> Elapsed time for eigenvector centrality calculation (non-NumPy) ==>
> 0:00:04.109000
>
> I assumed the NumPy version would be more efficient, but it seems to have a
> much higher memory and runtime footprint.
>
> Does anyone have any insights into this discrepancy in performance, between
> the NumPy and non-NumPy eigenvector centrality computations?

The pure-Python eigenvector centrality algorithm
(nx.eigenvector_centrality) uses the power method to find the single
largest eigenvalue/eigenvector pair. This is pretty efficient in some
cases since you don't need to copy the graph data and computing a
single eigenvalue/eigenvector is cheaper than computing all of them.

The NumPy-based eigenvector centrality algorithm
(nx.eigenvector_centrality_numpy()) first makes a copy of the graph to
a dense NumPy array (in your case a ~5000x5000 matrix) and then
computes *all* of the eigenvalues and eigenvectors using NumPy's
interface to LAPACK. The storage required increases with N as N**2
and the computational I think increases as N**3.

Aric

GNS

unread,
Nov 4, 2012, 1:39:33 PM11/4/12
to networkx...@googlegroups.com
Hi Aric, 

Thanks for the quick reply!  Yeah, that sounds about right.  Memory footprint was huge.

Question: given this situation, is there a benefit to running the NumPy eigenvector alg vs the standard python one?

I chose to use the NumPy alg because, on smaller networks, the python algorithm often would not converge, whereas the NumPy alg always finished the computations, at least in our runs.

Is this fair to say that this is the trade-off that I'm looking at, when choosing between the NumPy vs. the python eigenvector centrality algorithm?

-=Gabriel

Daπid

unread,
Nov 4, 2012, 2:18:26 PM11/4/12
to networkx...@googlegroups.com
On Sun, Nov 4, 2012 at 7:39 PM, GNS <gabriel....@gmail.com> wrote:
Thanks for the quick reply!  Yeah, that sounds about right.  Memory footprint was huge.

I would say there is a problem with your installation. Recently, I have been calculating eigenvalues for 100 000 nodes networks and the whole process is quite fast: generating an Erdös-Renyi graph, convert it to a matrix (nx.to_numpy_matrix) and computing its eigenvalues takes 13 seconds on my computer (5 years old mid-end machine) under Windows (where things are slower). Plus, it only takes 20 MB of RAM for the graph, the matrix and the very Python interpreter.

It may happen that your Numpy version is not properly linked to ATLAS and LAPACK, so it is using the very naïve default algorithms. This is how you can tell:

>>> np.show_config()
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['f77blas', 'cblas', 'atlas']
    library_dirs = ['C:\\local\\lib\\yop\\sse3']
    define_macros = [('NO_ATLAS_INFO', -1)]
    language = c
atlas_blas_threads_info:
  NOT AVAILABLE
lapack_opt_info:
    libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
    library_dirs = ['C:\\local\\lib\\yop\\sse3']
    define_macros = [('NO_ATLAS_INFO', -1)]
    language = f77
atlas_info:
    libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
    library_dirs = ['C:\\local\\lib\\yop\\sse3']
    define_macros = [('NO_ATLAS_INFO', -1)]
    language = f77
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_blas_info:
    libraries = ['f77blas', 'cblas', 'atlas']
    library_dirs = ['C:\\local\\lib\\yop\\sse3']
    define_macros = [('NO_ATLAS_INFO', -1)]
    language = c
mkl_info:
  NOT AVAILABLE

I must add that if you are on a Unix machine, it is worth to compile ATLAS yourself, as it will try, at compilation time, different strategies and choose the best adapted to your hardware. If you are on a Debian machine, you can take a look on how to compile ATLAS here: https://python.g-node.org/python-autumnschool-2010/materials/advanced_numpy/atlas_sse2

With this you can get a speed up of several times of the ones distributed already compiled.


David.

Fernando Perez

unread,
Nov 5, 2012, 3:33:18 AM11/5/12
to networkx...@googlegroups.com
On Sun, Nov 4, 2012 at 10:33 AM, Aric Hagberg <aric.h...@gmail.com> wrote:
> The pure-Python eigenvector centrality algorithm
> (nx.eigenvector_centrality) uses the power method to find the single
> largest eigenvalue/eigenvector pair. This is pretty efficient in some
> cases since you don't need to copy the graph data and computing a
> single eigenvalue/eigenvector is cheaper than computing all of them.
>
> The NumPy-based eigenvector centrality algorithm
> (nx.eigenvector_centrality_numpy()) first makes a copy of the graph to
> a dense NumPy array (in your case a ~5000x5000 matrix) and then
> computes *all* of the eigenvalues and eigenvectors using NumPy's
> interface to LAPACK. The storage required increases with N as N**2
> and the computational I think increases as N**3.

Aric, the *scipy* lingalg.eigh funcition wraps, if I recall correctly,
the ARPACK routines that are also power-method based. It does allow
for the explicit specification of how many eigenvectors you want in
the 'eigvals' parameter.

I know that making scipy a hard dependency of nx is probably too much,
but perhaps offering a scipy-based method for users who have it, or
have the numpy method check if scipy.linalg is available and use its
eigh routine when present, might not be a bad idea...

Cheers,

f

Aric Hagberg

unread,
Nov 5, 2012, 5:57:01 AM11/5/12
to networkx...@googlegroups.com
Thanks for pointing that out. The ARPACK sparse eigenvalue solver is
scipy.sparse.linalg.eigs and it uses much more sophisticated
algorithms than the basic power-method (implicitly-restarted Arnoldi
method). We use that in nx.pagerank_scipy() which is a variant of
eigenvector centrality. So the OP might consider using that instead.

Aric
Reply all
Reply to author
Forward
0 new messages