Score usage

dawe

unread,

Feb 13, 2012, 8:32:25 AM2/13/12

to RSEG Users

Hi there, after using RSEG for IP/input analysis (1 tail), I have a
list of ENRICHED/BACKGROUND domains.
Selecting the ENRICHED domains will give me a picture of the
underlying phenomenon, but how do I use / filter those feature? How do
I compare outcomes from different files? I've found that score
increases linear with domain length, but different experiments have
different slope, so a score of, say, 300 may mean different things in
different experiments.
Any hint/help is appreciated.

d

Song, Qiang

unread,

Feb 15, 2012, 12:23:12 AM2/15/12

to rseg-s...@googlegroups.com

Hi Davide,

The score of enriched domains are defined as following:

For each bin, we computed the posterior probability that it is in the

foreground (ENRICHED) state. The score of a enriched domain is

therefore the sum of the posterior probabilities of all bins within that

domain. That is the reason this score has an approximate linear

relationship with domain length. In order for a domain to have a high score,

it should be quite big (because we are interested in dispersed domains), and

each of the underlying bins should have a high posterior probability. By filtering

those domains with higher score,we are more confident we obtain dispersed

regions.

Such enriched domains can be used to obtain information about underlying

genomic region. For example, if a gene overlap with a H3K36me3 domain,

we would expect it is actively expressed. On the other hand, if the enriched

domain disappear in a another tissue, it may suggest this region may have

regulatory roles in tissue differentiation.

Additionally, as you said, such domains give us a "picture of the underlying phenomena",

which is particularly useful to study those histone markers that are not fully characterized.

Regards,

Song Qiang

Davide Cittaro

unread,

Feb 15, 2012, 4:08:21 AM2/15/12

to rseg-s...@googlegroups.com

On Feb 15, 2012, at 6:23 AM, Song, Qiang wrote:
>
> For each bin, we computed the posterior probability that it is in the
> foreground (ENRICHED) state. The score of a enriched domain is
> therefore the sum of the posterior probabilities of all bins within that
> domain. That is the reason this score has an approximate linear
> relationship with domain length. In order for a domain to have a high score,
> it should be quite big (because we are interested in dispersed domains), and
> each of the underlying bins should have a high posterior probability. By filtering
> those domains with higher score,we are more confident we obtain dispersed
> regions.

Just a quick additional question: is the scale linear or what? i.e.
a) S = p1 + p2 + p3
b) S = -10log10(p1) -10log10(p2) -10log10(p3)
c) S = -10log10(p1 + p2 + p3)
d) ??

TY

d

---
Davide Cittaro
daweo...@gmail.com
http://sites.google.com/site/davidecittaro/

Song, Qiang

unread,

Feb 16, 2012, 1:07:18 AM2/16/12

to rseg-s...@googlegroups.com

On Wed, Feb 15, 2012 at 1:08 AM, Davide Cittaro <daweo...@gmail.com> wrote:

On Feb 15, 2012, at 6:23 AM, Song, Qiang wrote:
>
> For each bin, we computed the posterior probability that it is in the
> foreground (ENRICHED) state. The score of a enriched domain is
> therefore the sum of the posterior probabilities of all bins within that
> domain. That is the reason this score has an approximate linear
> relationship with domain length. In order for a domain to have a high score,
> it should be quite big (because we are interested in dispersed domains), and
> each of the underlying bins should have a high posterior probability. By filtering
> those domains with higher score,we are more confident we obtain dispersed
> regions.

Just a quick additional question: is the scale linear or what? i.e.
a) S = p1 + p2 + p3

This one: S = p1 + p2 + p3

Reply all

Reply to author

Forward