4e - Binomial probability

27 views
Skip to first unread message

Matt

unread,
Apr 14, 2011, 3:02:09 PM4/14/11
to Computational Genomics 2011Spring
I get the idea of using the binomial distribution to compute the
likelihood of a subgraph with A nodes on the left, B nodes on the
right, and E edges between them - find the likelihood of seeing E
edges out of AxB possible edges, each placed with random probability
p. The question I have is how to find this binomial probability of
success p. The relative sparsity of arcs in the bipartite graph means
that the global probability of having an arc is very low, so if we use
this value, the likelihood of a subgraph of even a few arcs is almost
0. The frequency of arcs in the subgraph can't be used, since that
would mean the subgraph has the expected number of edges. Does anyone
have any suggestions on how to find this parameter of the binomial
distribution?

Lu Xie

unread,
Apr 14, 2011, 3:17:34 PM4/14/11
to computational-ge...@googlegroups.com
I used the background probability p = E / (A * B).
Here we are not supposed to compute the likelihood, but the p-value,
since parameter p comes from the background model which we would like
to reject.

Lu

Matt

unread,
Apr 14, 2011, 4:08:53 PM4/14/11
to Computational Genomics 2011Spring
The likelihood that we would see a graph with AxB nodes and at least E
edges under the random model should be equivalent to the p-value. It's
the probability of having that dense a subgraph under the random
model. The background probability of an arc is very low (<7%), so the
p-value of all my connected subgraphs are indistinguishable at 0. Even
a subgraph with 2 nodes in A and 2 nodes in B with 3 out of 4 arcs
placed gets a p-value of nearly 0 (command 1-binocdf(3,4,0.07)). How
did you compare subgraphs when they're all equally unlikely?

Lu Xie

unread,
Apr 14, 2011, 4:11:18 PM4/14/11
to computational-ge...@googlegroups.com
Forget binocdf; use binopdf instead, and use a for-loop to integrate
binopdf to get the cdf.

Lu

Lu Xie

unread,
Apr 15, 2011, 12:08:08 PM4/15/11
to computational-ge...@googlegroups.com
I found a better way to bypass the underflow of binocdf. Instead of
using 1-binocdf(k,n,p) which has the risk of giving all 0, using
binocdf(n-k,n,1-p) is much safer.

Another thing is about the p-value. Go back to the assumption, a
subgraph with 2 nodes in A and 2 nodes in B with 3 out of 4 arcs,
p-value here should be 1-binocdf(2,4,0.07) = binopdf(3,4,0.07) +
binopdf(4,4,0.07) because p-value is defined as the probability of
obtaining a test statistic at least as extreme as the one that was
actually observed, so 3-arc should be included.

Lu

Lu Xie

unread,
Apr 15, 2011, 12:22:47 PM4/15/11
to computational-ge...@googlegroups.com
Quick correction:

1-binocdf(k,n,p) = binocdf(n-k-1,n,1-p)

Reply all
Reply to author
Forward
0 new messages