E-value of predicted motifs

655 views
Skip to first unread message

Kaamini Raithatha

unread,
Dec 20, 2016, 6:59:49 AM12/20/16
to MEME Suite Q&A
I am identifying motifs using online MEME server and off-line stand alone tool for the same set of sequences output is throwing same motifs but what I have observed is the e-value is different for same motifs obtained from online server and offline tool. Can any one explain why is so ?? As I have no clue how these e-values has been calculated , is the method of calculation varying for offline and online versions of MEME ??

CharlesEGrant

unread,
Dec 20, 2016, 5:52:10 PM12/20/16
to meme-...@googlegroups.com
The online version of MEME is just a web interface to the command line version of MEME. However, the online version specifies some command line options that differ from the default values. For example, the command line version of MEME defaults to the amino acid alphabet. If you are actually analyzing DNA  sequences you need to specify the '-dna' option. The online interface will automatically figure out the alphabet of the sequence file and start MEME using the '-dna' option if needed.

The exact command line used by the web interface is given at the bottom of the MEME HTML output in the section labeled "Command line" under "Inputs and Settings". As long as you are using the same version of MEME, and give it the same command line options, the offline version should match the online version exactly,

minz

unread,
Dec 21, 2016, 5:34:06 AM12/21/16
to MEME Suite Q&A
Thanks you for your reply and for highlighting the critical parameters

So, the versions for online and off-line MEME used are same, also I did mentioned the -dna option in commandline following are my commands from online and command line versions:

Command_line: meme input.fasta -dna -o otput_dir -nmotifs 5 -maxsize 1000000

Online :
meme input.fasta -dna -oc . -nostatus -time 18000 -maxsize 60000 -mod zoops -nmotifs 5 -minw 6 -maxw 50 -revcomp

where , -minw, -maxw,revcomp are default parameters in online version and I am sure it dosent make any difference in e-value, correct me if I am wrong


CharlesEGrant

unread,
Dec 29, 2016, 7:12:12 PM12/29/16
to meme-...@googlegroups.com
where , -minw, -maxw,revcomp are default parameters in online version and I am sure it dosent make any difference in e-value, correct me if I am wrong

All those parameters can affect which motifs MEME discovers, and the the E-value reported for those motifs.

Let's step back a couple of steps to see if I can clarify this. MEME performs de novo motif discovery by identifying short sub-sequences that are statistically over-represented in your sequence data. This is a tricky problem because MEME has to figure out both what the underlying motif is, and which sites in the sequence are instances of that motif. It turns out that doing this exhaustively is computationally impractical (i.e. it would take far too long).  Instead MEME uses heuristics to make initial guesses for the motif and it's instances. It evaluates those guesses by measuring how well the hypothesized motif matches the  hypothesized sites. MEME then makes adjustments to the guessed motif and the sites to improve the match scores. Once it has enough information to evaluate the statistical significance of the motif and the sites, it reports that motif, masks over those sites, guesses a new motif and set of instances, and repeats the process. This continues until MEME has reported the number of motifs you've requested, or it runs out of time. 

If your data contains a strong motif signal, then the minw and maxw options shouldn't affect the results too much, at least as long as the motif's width is between minw and maxw. However, the revcomp option can have a huge effect. By default, MEME only considers positions on the forward strand as possible motif sites. If the revcomp option is given, MEME will consider positions on both strands as possible motif sites.  This will have a huge effect on which sub-sequences are guessed as instance of the motif, which in turn affects the E-value. If you click on the downward pointing arrow in the "More" column of the "Discovered Motifs" section you'll see a list of the sites that MEME decided were instances of the motif. I think you'll see that different options result in different sites being selected, which affects the E-value.


Arun Prasanna

unread,
Dec 30, 2016, 7:38:42 AM12/30/16
to MEME Suite Q&A
Hi,
I have a question in this regard. I ran MEME on a set of 100 co-regulated genes expecting the presence of a few or more common TFBS. The command I used was:
./meme coreg_promoters.fasta -dna -mod zoops -minw 6 -maxw 14 -nmotifs 10 -maxsize 65000

I did get 10 motifs found on my sequence set, but all of them were semi-transparent, implying e-value greater than 0.05. Next, I went on to check with 

./meme StipeUp_promoters.fasta -dna -mod zoops -minw 6 -maxw 14 -nmotifs 10 -maxsize 65000 -evt 0.05

As expected, the output said "No significant motifs found !'

Should I consider those semi-transparent motifs or discard them ? How can one decide in this case ?

Thanks,
AP

CharlesEGrant

unread,
Dec 30, 2016, 3:53:44 PM12/30/16
to meme-...@googlegroups.com
You can treat the E-value as a p-value corrected for multiple testing. An E-value of 0.01 or 0.05 is traditionally used as a threshold for statistical significance. This is only a convention though, not a promise that all motifs with an E-value less than the threshold are "real", or that all motifs with E-values greater than the threshold are due to chance. You have to exercise your judgement.

Also, the '-evt' option may not have the effect you hope it does. The '-evt' option tells MEME to stop looking for motifs as soon as it discovers a motif with an E-value higher than the given threshold. The problem is that MEME does not guarantee that it will find motifs in strict order of increasing E-value. It usually does, but sometimes MEME will find a motif with a higher E-value before it finds one with a lesser E-value. We rarely recommend using the '-evt' option, because it can result in statistically significant motifs not being reported. The better strategy is to specify '-nmotifs' high enough so that MEME reports five or six motifs with E-values above your desired threshold.

Arun Prasanna

unread,
Jan 16, 2017, 5:22:08 AM1/16/17
to MEME Suite Q&A
Hi Charles,
Thanks for your insights. I tried to increase the -nmotifs to 100. But, I couldn't recover a single motif with significant e-value (<=0.05). I am not sure, if this is a valid result. Hence, I chose an alternative approach. 
1. I found -nmotifs 50
2. generated random sequences of same length and size, tested them for enrichment with AME. I got ~30/50 motifs enriched in my set. 
3. Filtered these 30 motifs against GOMo (to leave out known motifs) and found 26 as novel motifs.

Thanks,
Arun


CharlesEGrant

unread,
Jan 16, 2017, 1:03:11 PM1/16/17
to MEME Suite Q&A
Please see my response to your other question.

Arun Prasanna

unread,
Jan 16, 2017, 2:27:10 PM1/16/17
to MEME Suite Q&A
Hi Charles,
Sorry, I can't see any response for the question.
Thanks,
AP

CharlesEGrant

unread,
Jan 16, 2017, 6:32:58 PM1/16/17
to MEME Suite Q&A
Sorry, I was interrupted between messages, and just finished the other message now.
Reply all
Reply to author
Forward
0 new messages