Score usage

41 views
Skip to first unread message

dawe

unread,
Feb 13, 2012, 8:32:25 AM2/13/12
to RSEG Users
Hi there, after using RSEG for IP/input analysis (1 tail), I have a
list of ENRICHED/BACKGROUND domains.
Selecting the ENRICHED domains will give me a picture of the
underlying phenomenon, but how do I use / filter those feature? How do
I compare outcomes from different files? I've found that score
increases linear with domain length, but different experiments have
different slope, so a score of, say, 300 may mean different things in
different experiments.
Any hint/help is appreciated.

d

Song, Qiang

unread,
Feb 15, 2012, 12:23:12 AM2/15/12
to rseg-s...@googlegroups.com
Hi Davide,

The score of enriched domains are defined as following:

For each bin, we computed the posterior probability that it is in the
foreground (ENRICHED) state. The score of a enriched domain is 
therefore the sum of the posterior probabilities of all bins within that
domain. That is the reason this score has an approximate linear 
relationship with domain length. In order for a domain to have a high score,
it should be quite big (because we are interested in dispersed domains), and
each of the underlying bins should have a high posterior probability. By filtering 
those domains with higher score,we are more confident we obtain dispersed 
regions.

Such enriched domains can be used to obtain information about underlying 
genomic region. For example, if a gene overlap with a H3K36me3 domain, 
we would expect it is actively expressed. On the other hand, if the enriched 
domain disappear in a another tissue, it may suggest this region may have 
regulatory roles in tissue differentiation.

Additionally, as you said, such domains give us a "picture of the underlying phenomena", 
which is particularly useful to study those histone markers that are not fully characterized.

Regards,
Song Qiang

Davide Cittaro

unread,
Feb 15, 2012, 4:08:21 AM2/15/12
to rseg-s...@googlegroups.com
On Feb 15, 2012, at 6:23 AM, Song, Qiang wrote:
>
> For each bin, we computed the posterior probability that it is in the
> foreground (ENRICHED) state. The score of a enriched domain is
> therefore the sum of the posterior probabilities of all bins within that
> domain. That is the reason this score has an approximate linear
> relationship with domain length. In order for a domain to have a high score,
> it should be quite big (because we are interested in dispersed domains), and
> each of the underlying bins should have a high posterior probability. By filtering
> those domains with higher score,we are more confident we obtain dispersed
> regions.

Just a quick additional question: is the scale linear or what? i.e.
a) S = p1 + p2 + p3
b) S = -10log10(p1) -10log10(p2) -10log10(p3)
c) S = -10log10(p1 + p2 + p3)
d) ??

TY

d

---
Davide Cittaro
daweo...@gmail.com
http://sites.google.com/site/davidecittaro/

Song, Qiang

unread,
Feb 16, 2012, 1:07:18 AM2/16/12
to rseg-s...@googlegroups.com
On Wed, Feb 15, 2012 at 1:08 AM, Davide Cittaro <daweo...@gmail.com> wrote:
On Feb 15, 2012, at 6:23 AM, Song, Qiang wrote:
>
> For each bin, we computed the posterior probability that it is in the
> foreground (ENRICHED) state. The score of a enriched domain is
> therefore the sum of the posterior probabilities of all bins within that
> domain. That is the reason this score has an approximate linear
> relationship with domain length. In order for a domain to have a high score,
> it should be quite big (because we are interested in dispersed domains), and
> each of the underlying bins should have a high posterior probability. By filtering
> those domains with higher score,we are more confident we obtain dispersed
> regions.

Just a quick additional question: is the scale linear or what? i.e.
a) S = p1 + p2 + p3

This one: S = p1 + p2 + p3
Reply all
Reply to author
Forward
0 new messages