> I am a bit confused about the binomial model. Is the idea that there is a certain probability for overlap (heads) and non-overlap (tails) and that the significance is derived from a binomial based on this? If so, this is interesting, as one important downside of Monte Carlo trials is that they do not respect the relative distances of B features in the genome. In some cases, these distances/densities may be biologically relevant.
>
> Could you elaborate a bit on the proposed binomial model? Perhaps a toy example?
Mmm... I'm not sure it is a proper model, btw a colleague of mine
suggested that if you want to estimate the p of k features A
overlapping features B, one can calculate
p as coverage of B
n as number of features in B
and then build the distribution B(n, p) and calculate the p of having
more than k overlapping features (looking at cdf?).
Besides the fact I'm not sure this is "theoretically" correct, I also
guess that this model does not take in proper consideration feature
size (relative to genome or between the two feature sets).
I'm using the bootstrap method, although I'm not sure the set of
bootstrapped interserctions is distributed as a normal or poisson or
other (it looks normal, though).
d