How site distributions oops|zoops|anr influence the E-value ?

303 views

Skip to first unread message

kelian.ha...@gmail.com

unread,

Jun 2, 2015, 10:12:18 AM6/2/15

to meme-...@googlegroups.com

Hi,

I am studying the out-put of MEME while testing crp0.s, a sequence example given in the MEME suite.

When I change the site distribution from oops to anr, the E-value of the motif found changes. It is higher while testing anr distribution.(1.9 for anr distribution against 0.011 for oops distribution)

Could you explain why there is a such difference between the two results ?

Another thing is that you said in another message on the forum that "it is unusual to consider a motif with an E-value larger than 0.05 significant". Thus according to the distribution a motif could be significant or not while analysing a same set of sequences ?

Kélian

CharlesEGrant

unread,

Jun 2, 2015, 12:51:18 PM6/2/15

to meme-...@googlegroups.com

The statistical confidence (E-value) assigned to a discovered motif depends on the observed evidence for the motif and the underlying statistical model of motif occurrences. MEME can use one of three different statistical models: OOPS (only one occurrence per sequence) assumes that each sequence will contain exactly one occurrence of any motif. ZOOPS (zero or one occurrence per sequence) assumes that each sequence can contain either one or no motif occurrences. ANR (Any number of repetitions) assumes that each sequence can contain any number of motif occurrences.

For example, if you select the OOPS model MEME will consider exactly one candidate motif site from each sequence as evidence for the motif. On the other hand, if you select the ANR model, MEME may decide that some sequences contains no candidate motif sites, and it may identify a dozen in another sequence. Increasing the number of good candidate sites will tend to move the E-value lower, but removing constraints from the model means that more of the candidates will be simply chance matches, which drives the E-value higher.

Thus according to the distribution a motif could be significant or not while analyzing a same set of sequences ?

Yes, absolutely. You have to use your understanding of the sequence data to pick an appropriate model.