interpreting Clustering Coefficients?

2,598 views
Skip to first unread message

ayesha

unread,
Dec 1, 2009, 1:14:50 AM12/1/09
to networkx-discuss
Hi Guys,

I have calculated clustering coefficinets of nodes in my gene/protein
interaction network (actually i just did it for a saml number of nodes
to get some preliminary reults:
following are my results:
{'K08E4.5': 0.0, 'ZK1025.6': 0.0, 'dhc-1': 1.0, 'GO:0007154': 0.0,
'ZK177.2': 0.0, 'GO:0007275': 1.0, 'GO:0016185': 1.0, 'GO:0006541':
0.0, 'sqt-1': 0.0, 'ZK632.4': 0.0, 'ubc-12': 1.0, 'col-139': 0.0, 'GO:
0006508': 0.16666666666666666, 'GO:0007186': 0.0, 'GO:0006118': 0.0,
'ubl-1': 0.0, 'GO:0040011': 1.0, 'GO:0006468': 0.0, 'Y20F4.3': 0.0,
'cng-1': 0.0,....}

My data set looks something like;
cng-1 GO:0006811|
K08E4.5 GO:0006508|
F30F8.2 GO:0006541|
ZK177.2 GO:0006468|
dhc-1 GO:0006508|GO:0007018|
ZK632.4 GO:0005975|
dsl-1 GO:0007154|
sre-50 GO:0007606|
sqt-1 GO:0006817|
Y20F4.3 GO:0007242|
sra-34 GO:0007606|
ZK1025.6 GO:0006355|
Y52B11A.3 GO:0006118|
toh-2 GO:0006508|
ubl-1 GO:0006412|

However, I dont quite understand the significance of these results.
e.g for K08E4.5 GO:0006508|
the clust.cof is 0.0.. does that mean that for every node in the
netwrok that is connected only to one other node , the clustering
coefficient would be 0? I manually calculated the Ci for a little
graph and i got a value of 1.66 for one of my nodes using the same
formula that Network x uses in its clustering coef module ... But in
my results here no value exceeds 1..why is that?

P.S: moreover, the results are not in order...not in the same order as
in the dataset.

Sudarshan Iyengar

unread,
Dec 1, 2009, 1:21:05 AM12/1/09
to networkx...@googlegroups.com
On Tue, Dec 1, 2009 at 11:44 AM, ayesha <ayesha.d...@gmail.com> wrote:
> Hi Guys,
>
> I have calculated clustering coefficinets of nodes in my gene/protein
> interaction network (actually i just did it for a saml number of nodes
> to get some preliminary reults:
> following are my results:
> {'K08E4.5': 0.0, 'ZK1025.6': 0.0, 'dhc-1': 1.0, 'GO:0007154': 0.0,
> 'ZK177.2': 0.0, 'GO:0007275': 1.0, 'GO:0016185': 1.0, 'GO:0006541':
> 0.0, 'sqt-1': 0.0, 'ZK632.4': 0.0, 'ubc-12': 1.0, 'col-139': 0.0, 'GO:
> 0006508': 0.16666666666666666, 'GO:0007186': 0.0, 'GO:0006118': 0.0,
> 'ubl-1': 0.0, 'GO:0040011': 1.0, 'GO:0006468': 0.0, 'Y20F4.3': 0.0,
> 'cng-1': 0.0,....}
>
> My data set looks something like;

What do we mean by data set here? the graph?



> cng-1   GO:0006811|
> K08E4.5 GO:0006508|
> F30F8.2 GO:0006541|
> ZK177.2 GO:0006468|
> dhc-1   GO:0006508|GO:0007018|
> ZK632.4 GO:0005975|
> dsl-1   GO:0007154|
> sre-50  GO:0007606|
> sqt-1   GO:0006817|
> Y20F4.3 GO:0007242|
> sra-34  GO:0007606|
> ZK1025.6        GO:0006355|
> Y52B11A.3       GO:0006118|
> toh-2   GO:0006508|
> ubl-1   GO:0006412|



>
> However, I dont quite understand the significance of these results.
> e.g for K08E4.5 GO:0006508|
>  the clust.cof is 0.0.. does that mean that for every node in the
> netwrok that is connected only to one other node , the clustering
> coefficient would be 0? I manually calculated the Ci for a little
> graph and i got a value of 1.66 for one of my nodes using the same

clustering co-efficient is always on a one point scale so we cannot get 1.66

> formula that Network x uses in its clustering coef module ... But in
> my results here no value exceeds 1..why is that?
>
> P.S: moreover, the results are not in order...not in the same order as
> in the dataset.
>
> --
>
> You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
> To post to this group, send email to networkx...@googlegroups.com.
> To unsubscribe from this group, send email to networkx-discu...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/networkx-discuss?hl=en.
>
>
>

ayesha

unread,
Dec 1, 2009, 1:46:06 AM12/1/09
to networkx-discuss
well yeah, the data set contains proteins and genes as nodes connected
by edges.. so each line of represents proteins and genes that are
connected in the netwrok.
dhc-1 GO:0006508|GO:0007018| here dhc-1 is connected to GO:0006508
AND GO:0007018...
So you mean Clus. Coef is always plotted between zero and one?


On Nov 30, 10:21 pm, Sudarshan Iyengar <sudarshani...@gmail.com>
wrote:
> > For more options, visit this group athttp://groups.google.com/group/networkx-discuss?hl=en.- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Sudarshan Iyengar

unread,
Dec 1, 2009, 2:35:40 AM12/1/09
to networkx...@googlegroups.com
On Tue, Dec 1, 2009 at 12:16 PM, ayesha <ayesha.d...@gmail.com> wrote:
> well yeah, the data set contains proteins and genes as nodes connected
> by edges.. so each line of represents proteins and genes that are
> connected in the netwrok.
> dhc-1   GO:0006508|GO:0007018| here dhc-1 is connected to GO:0006508
> AND GO:0007018...
> So you mean Clus. Coef is always plotted between zero and one?

Yes, clustering co-efficient is between zero and one. For better
understanding, let us consider a friendship network of say a 100
friends. The clustering co-efficient of a person in the network is the
number of friendships that his immediate friends have.

for example, in case I have 20 friends (my friends needn't know each
other), and amongst these 20 friends of mine, there may be 30
friendships (while the maximum possible is (20*19)/2 = 190). So the
clustering co-efficient would be 30/maximum possible friendships,
which would be 30/190. So 30/190 is my clustering co-efficient in my
network.

>>>import networkx as nx
>>>G=nx.erdos_renyi_graph(100,0.3) #a random graph with 100 nodes
>>>c=nx.clustering(G)

c will contain the clustering co-efficient of all the vertices of G.
It will be in python dictionary format.

Hope that helps.

regards,
Sudarshan

Todd A. Gibson

unread,
Dec 1, 2009, 4:40:37 AM12/1/09
to networkx...@googlegroups.com
> So you mean Clus. Coef is always plotted between zero and one?

The clustering coefficient is a measure which assumes there are no
selfloops in the network. If I recall correctly, the clustering
coefficient calculation in NetworkX does not check for selfloops. If
you have selfloops when you calculate the clustering coeffient, it
will generate an incorrect result, including numbers greater than one
in some cases.

Sudarshan Iyengar

unread,
Dec 1, 2009, 4:46:33 AM12/1/09
to networkx...@googlegroups.com
I think the self loops are ignored.

>>> import networkx as nx
>>> G=nx.erdos_renyi_graph(30,0.7)
>>> X=nx.clustering(G)
>>> for i in range(G.order()):
... G.add_edge(i,i)
...
>>> Y=nx.clustering(G)


There is no difference between X and Y values.

Sudarshan

Todd A. Gibson

unread,
Dec 1, 2009, 5:12:35 AM12/1/09
to networkx...@googlegroups.com
> I think the self loops are ignored.
>
> >>> import networkx as nx
> >>> G=nx.erdos_renyi_graph(30,0.7)
> >>> X=nx.clustering(G)
> >>> for i in range(G.order()):
> ... G.add_edge(i,i)
> ...
> >>> Y=nx.clustering(G)
>
>
> There is no difference between X and Y values.

Ah. I'm still using 0.99 which produces different results so it must
be changed/fixed in 1.0rc1.

Marcel Blattner

unread,
Dec 1, 2009, 2:59:48 AM12/1/09
to networkx...@googlegroups.com
--

You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To post to this group, send email to networkx...@googlegroups.com.
To unsubscribe from this group, send email to networkx-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/networkx-discuss?hl=en.


...the clustering coefficient is a basic measure for local density (connections) in a network. The number gives a 'probability' that two friends of a node are connected too (if A is connected to B and to C, how probable is the connections between B and C) . An alternative view: it counts the number of triangles around a node, divided by the possible number of triangles around a node. All these numbers are local (node-related). To obtain the overall clustering coefficient of a network one takes the mean value of all node-related clustering coefficients.

Cheers
Marcel

Rohan Dixit

unread,
Dec 1, 2009, 2:17:48 AM12/1/09
to networkx...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages