Using FIMO for short protein sequences

92 views
Skip to first unread message

HaimA

unread,
Mar 16, 2016, 4:08:53 PM3/16/16
to MEME Suite Q&A
Hi,

I am trying to use FIMO command line version with motifs inferred from short protein sequences (with length of 6-14) to scan short peptides.
 
As a test, I've used FIMO to scan the peptides dataset contributing to the inferred motifs. I observed that peptides shorter than 10 amino-acids were never discovered.

I've tried to change the e-value threshold in order to increase the number of hits, but still shorter peptides were not discovered.

In addition, I see cases where the correct motif was not associated with the correct sequence (i.e. sequence contributing to motif X is not a hit of motif X). The score and e-value of such hits were indeed much lower, however, still, the sequence was not associated with it true motif.

Therefore I have the following questions:
1. Is there some limitation in FIMO for hit length? 
2. Do you have any advice how I can better use FIMO for my dataset (short protein sequences and short motifs).

Thanks in advanced!
Haim

CharlesEGrant

unread,
Mar 16, 2016, 5:04:06 PM3/16/16
to meme-...@googlegroups.com
Hi Haim,

I've tried to change the e-value threshold in order to increase the number of hits, but still shorter peptides were not discovered.

I'm not sure what you mean here. FIMO doesn't calculate E-values. Are you referring to the MEME E-value for the motif? This is a measure of how statistically significant the motif is in the context of the motif discovery algorithm. It depends both on the number of contributing sites, how similar those sites are too each other, and how similar they are to the background frequencies.  It will be only indirectly related to how well the motif matches in any particular instance.

FIMO computes a p-value for each match. By default, matches with a p-value larger then 0.0001 are not reported. Matches to short motifs will inevitably have larger p-values than matches to longer motifs. You can change FIMO's p-value threshold using the --thresh command line option. This is described in the FIMO command line documentation. Try setting the p-value threshold to something higher. If you set it to 1.0 FIMO will report a match for every position in your sequence data.

Charles

HaimA

unread,
Mar 16, 2016, 5:19:39 PM3/16/16
to MEME Suite Q&A
Hi Charles,

Thanks for the prompt reply!

Sorry for the confusion, I meant to write that I've changed the p-value threshold to be 10 (using the --thresh option), exactly as you suggested. Yet I don't get short peptides as hit.

Sorry for the confusion, I really appreciate your help.

Thanks!
Haim

CharlesEGrant

unread,
Mar 16, 2016, 5:29:49 PM3/16/16
to MEME Suite Q&A
Hi Haim,

Hmm, by definition, a p-value can't be larger then 1.0. FIMO should be reporting a match score for each position in your sequence. How wide is your motif, and how long are your peptide sequences. FIMO won't perform partial matches if the motif is wider then the sequence.

Could you post the exact command line that you used, and attach an example motif and sequence files? That would help us troubleshoot the problem.

Charles

HaimA

unread,
Mar 17, 2016, 8:21:04 AM3/17/16
to MEME Suite Q&A
Hi Charles,

Thanks a lot. Indeed all the missing hits are shorter than the motif length. 
Is there an option to force partial match with FIMO (e.g. artificially extend the sequence)? Is there another option from MEME suite that do support partial matches?

Thanks,
Haim

CharlesEGrant

unread,
Mar 17, 2016, 2:13:17 PM3/17/16
to MEME Suite Q&A
Hi Haim,

No, there aren't any tools in the MEME Suite that will support partial matches. You might try embedding your peptides in longer sequences. Note though that FIMO will also skip intervals containing ambiguity codes,  so you can't just pad your peptide sequences with 'X'.

HaimA

unread,
Mar 17, 2016, 2:36:37 PM3/17/16
to MEME Suite Q&A
Hi Charles,

Thanks!

Does it make sense to use MAST to scan sequences padded with 'X' each time by using only a single motif (with the -m parameter) and while using the -hit_list parameter to get the list of all significant hits?
It seems to get significant matches also for sequences originally shorter than the motif width. I'll be happy to know if I'm missing something :)

Thanks for all the replies!
Haim

CharlesEGrant

unread,
Mar 18, 2016, 5:40:57 PM3/18/16
to MEME Suite Q&A
Hi Haim,

I hadn't thought of that! The internals of MAST are different from FIMO. If you pad your short sequences on the right with 'X', the ' -hit_list' option of MAST will work.

Charles
Reply all
Reply to author
Forward
0 new messages