error in topological coefficient calculation ?

Hervé Diedie

unread,

Jul 25, 2016, 5:50:01 PM7/25/16

to cytoscape-discuss

Hi I think there's an error in topological coefficient calculation formula used by networkanalyzer

when reading the online help

Topological coefficients

The topological coefficient [15] T_n of a node n with k_n neighbors is computed as follows:
T_n = avg ( J(n,m) ) / k_n.
Here, J(n,m) is defined for all nodes m that share at least one neighbor with n. The value J(n,m) is the number of neighbors shared between the nodes n and m, plus one if there is a direct link between n and m

whereas the paper cited in reference [15] use this formula:

T_n = avg ( J(n,m) ) / k_n.

Can someone help me ?

Barry Demchak

unread,

Jul 25, 2016, 5:51:23 PM7/25/16

to cytoscap...@googlegroups.com

Forwarded to https://groups.google.com/forum/#!forum/cytoscape-helpdesk

--
You received this message because you are subscribed to the Google Groups "cytoscape-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cytoscape-disc...@googlegroups.com.
To post to this group, send email to cytoscap...@googlegroups.com.
Visit this group at https://groups.google.com/group/cytoscape-discuss.
For more options, visit https://groups.google.com/d/optout.

Matthias König

unread,

Jul 26, 2016, 4:12:57 AM7/26/16

to cytoscape-discuss

Stelzl, U., et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 122 (2005) 957-968.

"The topological coefficient was calculated for every protein in the network and plotted against the number of links (TCp = average(J(p,j)/kp), where J(p,j) denotes the number of nodes to which both p and j are linked, kp is the number of links of node p"
...
"Besides the clustering coefficient, the topological coefficient was used (Goldberg and Roth, 2003; Ravasz et al., 2002) to study the characteristics of the interaction network. The topological coefficient, TC(k), is a relative measure for the extent to which a protein in the network shares interaction partners with other proteins."

Goldberg2003 does not give a own definition, but just references Ravasz2002.

Ravasz et al., 2002
https://www.ncbi.nlm.nih.gov/pubmed/12202830
finally gives the original definition of topological overlap.
"For each pair of nodes, i and j, we define the topological overlap OT(i, j) = Jn(i, j)/[min (ki, kj)], where Jn(i, j) denotes the number of nodes to which both i and j are linked ( plus 1 if there is a direct link between i and j) and [min (ki, kj)] is the smaller of the ki and kj degrees. (see Figure 3 for a nice example).

Interestingly, the topological overlap is defined in Ravasz as an edge property. The topological coefficient is than calculated as the average topological overlap of all edges of a node, making it a node propery. So the topological coefficient can be understood as the mean topological overlap of the edges of the node.
So Stelzl only cites papers which don't really give the definition of topological coefficent, but only deal with the topological overlap.

For me the definition in the NetworkAnalyzer and Stelz look identical.

So here what the network analyzer is actually calculating:
https://github.com/cytoscape/network-analyzer.git

/**
 * Computes the topological coefficient of the given node.
 * @param node The node's index.
 * @param numNodes Number of nodes in the graph.
 * @param edges Array with every node's neighbor indices.
 * @param edgeOffsets Array with the indices of each node's first neighbor in <code>edges</code>.
 * @return The node's topological coefficient in the range [0; 1];
 *          <code>NaN</code> in case the topological coefficient is not defined for this case.
 */
public static double computeTC(int node, int numNodes, int[] edges, int[] edgeOffsets)
{
   boolean[] commNNodes = new boolean[numNodes];
   int commNNodesSize = 0;
   boolean[] isNeighbor = new boolean[numNodes];
   int tc = 0;
   
   int firstEdge = edgeOffsets[node], lastEdge = edgeOffsets[node + 1];
   
   for (int ni = firstEdge; ni < lastEdge; ni++)
      isNeighbor[edges[ni]] = true;
   
   for (int ni = firstEdge; ni < lastEdge; ni++)
   {
      int neighbor = edges[ni];
      int firstNEdge = edgeOffsets[neighbor], lastNEdge = edgeOffsets[neighbor + 1];
      for (int nni = firstNEdge; nni < lastNEdge; nni++)
      {
         int nneighbor = edges[nni];
         if (nneighbor == node)
            continue;
         tc++;
         if (!commNNodes[nneighbor])
         {
            commNNodes[nneighbor] = true;
            commNNodesSize++;
            if (isNeighbor[nneighbor])
               tc++;
         }
      }
   }
   
   return (double)tc / ((double)commNNodesSize * (double)(lastEdge - firstEdge));
}

For me the

     if (nneighbor == node)
            continue;
         tc++;

is looking like there is a bracket missing (because tc++ should be incremented by 1 only once if direct connection.
And also the normalization looks a bit strange

   return (double)tc / ((double)commNNodesSize * (double)(lastEdge - firstEdge));

For me it is not clear if this really calculates the topological coefficient.

I build a small test network to see what happens (three nodes, two of them sharing a neighbor)
n1 - n2 - n3
where the J(i, j) matrix should be (J(i,j) denotes the number of nodes to which both i and j are linked)
J = [
0 0 1
0 0 0
1 0 0]
I.e. n1 and n3 share exactly one neighbor, the node n2.
The averaging should therefore give a nonzero value for n1 and n3.
But calculating the TopologicalCoefficient for the network gives 0 for all nodes (see screenshot attached).
=> So there seems definitely to be some problem/bug with the topological coefficent in the Cytoscape NetworkAnalyzer.

In summary: For me it looks like the definition is identical everywhere, but what is actually calculated in NetworkAnalyzer is not what the formula says.

Matthias

Screenshot from 2016-07-26 10:09:17.png

Reply all

Reply to author

Forward