[networkx-discuss] Betweenness centrality is zero (0) for all nodes

3,556 views
Skip to first unread message

Austin

unread,
May 15, 2010, 3:49:03 PM5/15/10
to networkx-discuss
I am working on a habitat modeling project and am using betweenness
centrality as a measure of node/patch importance. For all my test
data, NetworkX is giving me betweenness centralities of zero. I have
tested my code using a small (5 nodes) hypothetical graph and the
betweenness centralities algorithm seems to work properly. I have a
couple questions that will really help me out.

1. NetworkX's betweenness_centrality function spits out values with
only 1 decimal place as far as I can tell. However, I think for my
test data, betweenness centralities are going to be <<1 and >0, so 1
decimal place is probably not going to cut it. Is there something I'm
missing with the networkx package that will make it spit out more
decimal places, or am I completely wrong with the decimal place issue?

2. What are the chances that my betweenness centralities are exactly
zero if NetworkX gives me a MST with multiple paths that pass through
a single node?

I am obviously not a professional with graph theory and I feel like
I'm missing something obvious, so feel free to treat me like I have no
idea what I'm talking about.

Thanks!

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To post to this group, send email to networkx...@googlegroups.com.
To unsubscribe from this group, send email to networkx-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/networkx-discuss?hl=en.

Aric Hagberg

unread,
May 15, 2010, 7:53:42 PM5/15/10
to networkx...@googlegroups.com
On Sat, May 15, 2010 at 1:49 PM, Austin <may.b...@gmail.com> wrote:
> I am working on a habitat modeling project and am using betweenness
> centrality as a measure of node/patch importance.  For all my test
> data, NetworkX is giving me betweenness centralities of zero. I have
> tested my code using a small (5 nodes) hypothetical graph and the
> betweenness centralities algorithm seems to work properly. I have a
> couple questions that will really help me out.
>
> 1. NetworkX's betweenness_centrality function spits out values with
> only 1 decimal place as far as I can tell. However, I think for  my
> test data, betweenness centralities are going to be <<1 and >0, so 1
> decimal place is probably not going to cut it. Is there something I'm
> missing with the networkx package that will make it spit out more
> decimal places, or am I completely wrong with the decimal place issue?
>
> 2. What are the chances that my betweenness centralities are exactly
> zero if NetworkX gives me a MST with multiple paths that pass through
> a single node?

The betweenness_centrality() function returns a floating point number.
The Python representation of that might only show one number after the
decimal point but in generally you'll get a longer representation, e.g.


In [1]: import networkx

In [2]: G=networkx.path_graph(5)

In [3]: networkx.betweenness_centrality(G)
Out[3]: {0: 0.0, 1: 0.5, 2: 0.66666666666666663, 3: 0.5, 4: 0.0}

It seems pretty unlikely that you'll get betweenness of zero unless
your graph has almost no edges. E.g. for a graph with 4 nodes and no
edges you'll get

In [4]: G=networkx.Graph()

In [5]: G.add_nodes_from([1,2,3,4])

In [6]: networkx.betweenness_centrality(G)
Out[6]: {1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}


Aric

Austin

unread,
May 15, 2010, 9:36:56 PM5/15/10
to networkx-discuss
I uploaded my test data as a zip named test_data.zip. If it's not too
much trouble, can someone check to make sure I'm not just being an
idiot? The setup of the files is:

node_I node_J resistanceIJ

resistances with a value of -1 indicate do not create an edge.

In reply to your reply, Aric, I format my outputs from Python to print
as many decimal places as possible (i.e. '%s' % centrality), and I
always get outputs of 0.0 for my test data. It's very confusing.

If there's any more info I can give, please let me know.

Austin

On May 15, 7:53 pm, Aric Hagberg <ahagb...@gmail.com> wrote:

Richard Careaga

unread,
May 15, 2010, 11:15:42 PM5/15/10
to networkx...@googlegroups.com
On the second point, the results of the betweenness_centrality function is a dictionary, as shown in Aric's example and elaborated a bit:

>>> import networkx
>>> G=networkx.path_graph(5)
>>> networkx.betweenness_centrality(G)

{0: 0.0, 1: 0.5, 2: 0.66666666666666663, 3: 0.5, 4: 0.0}
>>> bc = networkx.betweenness_centrality(G)
>>> bc

{0: 0.0, 1: 0.5, 2: 0.66666666666666663, 3: 0.5, 4: 0.0}
>>> bc[2]
0.66666666666666663
>>> value = bc[2]
>>> value
0.66666666666666663
>>> print("The value is %s") % value
The value is 0.666666666667
>>> bc[0]
0.0

So, if you are printing out the dictionary returned by the function and you get 0.0 it is because the key item pair is 0.0, not because of the format string. What does your dictionary look like?

Aric Hagberg

unread,
May 16, 2010, 9:13:01 AM5/16/10
to networkx...@googlegroups.com
On Sat, May 15, 2010 at 7:36 PM, Austin <may.b...@gmail.com> wrote:
> I uploaded my test data as a zip named test_data.zip. If it's not too
> much trouble, can someone check to make sure I'm not just being an
> idiot? The setup of the files is:
>
> node_I node_J resistanceIJ
>
> resistances with a value of -1 indicate do not create an edge.
>
> In reply to your reply, Aric, I format my outputs from Python to print
> as many decimal places as possible (i.e. '%s' % centrality), and I
> always get outputs of 0.0 for my test data. It's very confusing.
>
> If there's any more info I can give, please let me know.

I couldn't read your file. But maybe you are not getting
the data loaded correctly from your file? How many nodes
and edges are there - try G.number_of_edges()? What is
the degree distribution G.degree()?

Austin

unread,
May 16, 2010, 9:53:37 AM5/16/10
to networkx-discuss
My graphs are weighted and undirected, and the weights are the node-to-
node resistances I talked about earlier.

This is from sfs60_res.txt:
Number of edges: 703

Degree Distribution: {1: 37, 2: 37, 3: 37, 4: 37, 5: 37, 6: 37, 7: 37,
8: 37, 9: 37, 10: 37, 11: 37, 12: 37, 13: 37, 14: 37, 15: 37, 16: 37,
17: 37, 18: 37, 19: 37, 20: 37, 21: 37, 22: 37, 23: 37, 24: 37, 25:
37, 26: 37, 27: 37, 28: 37, 29: 37, 30: 37, 31: 37, 32: 37, 33: 37,
34: 37, 35: 37, 36: 37, 37: 0, 38: 37, 39: 37}
All nodes are connected to all others excepted for node 37, which is
isolated

Centralities: {1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0,
8: 0.0, 9: 0.0, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0, 14: 0.0, 15: 0.0,
16: 0.0, 17: 0.0, 18: 0.0, 19: 0.0, 20: 0.0, 21: 0.0, 22: 0.0, 23:
0.0, 24: 0.0, 25: 0.0, 26: 0.0, 27: 0.0, 28: 0.0, 29: 0.0, 30: 0.0,
31: 0.0, 32: 0.0, 33: 0.0, 34: 0.0, 35: 0.0, 36: 0.0, 37: 0.0, 38:
0.0, 39: 0.0}

This is from amph60_res.txt:
Number of edges: 903

Degree Distribution: {1: 42, 2: 42, 3: 42, 4: 42, 5: 42, 6: 42, 7: 42,
8: 42, 9: 42, 10: 42, 11: 0, 12: 42, 13: 42, 14: 42, 15: 42, 16: 42,
17: 42, 18: 42, 19: 42, 20: 0, 21: 0, 22: 42, 23: 42, 24: 42, 25: 42,
26: 42, 27: 0, 28: 42, 29: 42, 30: 42, 31: 42, 32: 42, 33: 42, 34: 42,
35: 42, 36: 42, 37: 42, 38: 42, 39: 42, 40: 42, 41: 42, 42: 42, 43:
42, 44: 42, 45: 42, 46: 42, 47: 42}
Again, most nodes are connected to all others except for a few which
are isolated.

Centralities: {1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0,
8: 0.0, 9: 0.0, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0, 14: 0.0, 15: 0.0,
16: 0.0, 17: 0.0, 18: 0.0, 19: 0.0, 20: 0.0, 21: 0.0, 22: 0.0, 23:
0.0, 24: 0.0, 25: 0.0, 26: 0.0, 27: 0.0, 28: 0.0, 29: 0.0, 30: 0.0,
31: 0.0, 32: 0.0, 33: 0.0, 34: 0.0, 35: 0.0, 36: 0.0, 37: 0.0, 38:
0.0, 39: 0.0, 40: 0.0, 41: 0.0, 42: 0.0, 43: 0.0, 44: 0.0, 45: 0.0,
46: 0.0, 47: 0.0}

I am not including the graph from rcw60_res.txt because it is too
large and takes too long to generate the centralities.

It just occurred to me that maybe I am misunderstanding the way
NetworkX works on weighted graphs. Does it assume all edges have equal
weight, compute the shortest paths, and only after sum the weights
along the shortest paths? This would explain why my particular
examples are giving me centralities of zero.

Thanks again
Austin

On May 16, 9:13 am, Aric Hagberg <ahagb...@gmail.com> wrote:

Aric Hagberg

unread,
May 16, 2010, 10:04:36 AM5/16/10
to networkx...@googlegroups.com
On Sun, May 16, 2010 at 7:53 AM, Austin <may.b...@gmail.com> wrote:
> My graphs are weighted and undirected, and the weights are the node-to-
> node resistances I talked about earlier.
>
> This is from sfs60_res.txt:
> Number of edges: 703
>
> Degree Distribution: {1: 37, 2: 37, 3: 37, 4: 37, 5: 37, 6: 37, 7: 37,
> 8: 37, 9: 37, 10: 37, 11: 37, 12: 37, 13: 37, 14: 37, 15: 37, 16: 37,
> 17: 37, 18: 37, 19: 37, 20: 37, 21: 37, 22: 37, 23: 37, 24: 37, 25:
> 37, 26: 37, 27: 37, 28: 37, 29: 37, 30: 37, 31: 37, 32: 37, 33: 37,
> 34: 37, 35: 37, 36: 37, 37: 0, 38: 37, 39: 37}
> All nodes are connected to all others excepted for node 37, which is
> isolated
>

That explains it - the betweenness centrality for a complete
graph (all nodes connected to all others) is zero for every node.

An alternative definition that includes the endpoints of paths
in the betweenness calculation gives a constant value for all nodes.

e.g.

In [13]: G=networkx.complete_graph(5)

In [14]: networkx.betweenness_centrality(G)
Out[14]: {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}

In [15]: networkx.betweenness_centrality(G,endpoints=True)
Out[15]:
{0: 0.66666666666666663,
1: 0.66666666666666663,
2: 0.66666666666666663,
3: 0.66666666666666663,
4: 0.66666666666666663}

In [16]: networkx.betweenness_centrality(G,endpoints=True,normalized=False)
Out[16]: {0: 4.0, 1: 4.0, 2: 4.0, 3: 4.0, 4: 4.0}

Austin

unread,
May 16, 2010, 12:03:34 PM5/16/10
to networkx-discuss
I'm not sure how that definition makes sense. In a geospatial sense,
the shortest path between two nodes is not necessarily a straight line
path connecting them (consider elevation changes, the presence of
barriers like rivers, etc). In other words, the shortest path is the
path that presents the least difficulty in getting from one node to
the other, even when the two nodes are directly connected. I would
think that the "weighted=True" option would take this into account,
even in a complete graph. Is there not an algorithm in NetworkX that
does what I'm asking?

Austin

On May 16, 10:04 am, Aric Hagberg <ahagb...@gmail.com> wrote:

Aric Hagberg

unread,
May 16, 2010, 12:55:33 PM5/16/10
to networkx...@googlegroups.com
On Sun, May 16, 2010 at 10:03 AM, Austin <may.b...@gmail.com> wrote:
> I'm not sure how that definition makes sense. In a geospatial sense,
> the shortest path between two nodes is not necessarily a straight line
> path connecting them (consider elevation changes, the presence of
> barriers like rivers, etc). In other words, the shortest path is the
> path that presents the least difficulty in getting from one node to
> the other, even when the two nodes are directly connected. I would
> think that the "weighted=True" option would take this into account,
> even in a complete graph. Is there not an algorithm in NetworkX that
> does what I'm asking?

That algorithm definitely does consider edge weights when
you use the "weighted_edges=True" keyword. In that case
the shortest paths are computed using Dijkstra's algorithm.
Shortest path in this case means sum of the weights with
lower total weight defined as the shortest path.

In [25]: G=networkx.Graph()

In [26]: G.add_edge(0,1,weight=100)

In [27]: G.add_edge(1,2,weight=1)

In [28]: G.add_edge(0,2,weight=1)

In [29]: networkx.betweenness_centrality(G)
Out[29]: {0: 0.0, 1: 0.0, 2: 0.0}

In [30]: networkx.betweenness_centrality(G,weighted_edges=True)
Out[30]: {0: 0.0, 1: 0.0, 2: 1.0} # shortest weighted path from 0-1
goes through 2

Austin

unread,
May 16, 2010, 1:02:22 PM5/16/10
to networkx-discuss
Ok, then my problem persists. As I said above, my graph IS weighted,
so why am I still getting centralities of zero?

On May 16, 12:55 pm, Aric Hagberg <ahagb...@gmail.com> wrote:

Aric Hagberg

unread,
May 16, 2010, 1:06:51 PM5/16/10
to networkx...@googlegroups.com
On Sun, May 16, 2010 at 11:02 AM, Austin <may.b...@gmail.com> wrote:
> Ok, then my problem persists. As I said above, my graph IS weighted,
> so why am I still getting centralities of zero?
>

Are your weights almost all the same?

It's hard to tell without seeing an example. I've showed you
some where the centrality is not zero and I'm fairly confident
the algorithm is correct. Are your weights almost all the same?

Can you post a small example? Maybe a small subgraph
of your data?

Aric Hagberg

unread,
May 16, 2010, 1:08:19 PM5/16/10
to networkx...@googlegroups.com
On Sun, May 16, 2010 at 11:06 AM, Aric Hagberg <ahag...@gmail.com> wrote:
> On Sun, May 16, 2010 at 11:02 AM, Austin <may.b...@gmail.com> wrote:
>> Ok, then my problem persists. As I said above, my graph IS weighted,
>> so why am I still getting centralities of zero?
>>
>
> Are your weights almost all the same?
>
> It's hard to tell without seeing an example.  I've showed you
> some where the centrality is not zero and I'm fairly confident
> the algorithm is correct.  Are your weights almost all the same?

I meant to say (instead of just repeating myself) that if your weights
are almost all the same the shortest weighted paths might be the
same as shortest unweighted paths...

Austin

unread,
May 16, 2010, 3:09:42 PM5/16/10
to networkx-discuss
I think you may have hit the nail on the head with that last one. I
took this subset of my test data below and changed the weights for
node pairs with node 1 to be very small. I then ran my code and got a
betweenness centrality of 21 for node 1. Based on these results, I may
try computing the MST first and then doing the betweenness centrality
for each node. Thanks for your help.

1 2 3.577302864566731400e+003
1 3 5.693912765124735400e+003
1 4 3.829598119110462900e+003
1 5 3.848185661157436700e+003
1 6 3.434911405911566800e+003
1 7 5.232916420624997000e+003
1 8 1.450093581261496400e+004
2 3 4.973356332167905700e+003
2 4 3.109120828528973100e+003
2 5 3.127717789905367700e+003
2 6 2.714443956215488900e+003
2 7 4.512448719376208800e+003
2 8 1.378046820404796400e+004
3 4 5.165572467711900600e+003
3 5 5.183851937247747600e+003
3 6 4.770563951945267000e+003
3 7 6.568572925480633800e+003
3 8 1.583659295099239900e+004
4 5 3.239615193122653200e+003
4 6 2.826354143694640200e+003
4 7 4.624355539742107800e+003
4 8 1.389237447662237500e+004
5 6 2.756404198735972800e+003
5 7 4.550624200622975000e+003
5 8 1.381826612132493400e+004
6 7 4.100464277671884700e+003
6 8 1.336706466407540200e+004
7 8 1.511612258874454300e+004


On May 16, 1:08 pm, Aric Hagberg <ahagb...@gmail.com> wrote:
> On Sun, May 16, 2010 at 11:06 AM, Aric Hagberg <ahagb...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages