FIMO short motif returning no hits

156 views
Skip to first unread message

Stefan Stark

unread,
Aug 16, 2016, 10:48:06 AM8/16/16
to MEME Suite Q&A
I am searching for genes with a short motif (e.g. AGAGAG). FIMO runs without error but produces no results.

My call looks like this:

iupac2meme AGAGAG > $motif_meme
fimo --oc $outdir $motif_meme $genomeFile

Any ideas what is the problem?

However, I do not get an error message

Thanks!

CharlesEGrant

unread,
Aug 16, 2016, 4:09:59 PM8/16/16
to meme-...@googlegroups.com
We fixed the problem that was causing the error message some time ago, so you shouldn't see that now. However, the underlying  problem is that on the scale of a eukaryotic genome you are going to find millions of perfect matches to a 6bp motif entirely by chance. Storing the details of millions of matches would use all available memory, and would be useless, since almost all of the matches would simply be chance coincidences. FIMO tries to cope by gradually applying a more stringent p-value threshold, but even a perfect match to a 6bp motif isn't going to have a terribly significant p-value, so FIMO ends up discarding all the matches.

FIMO's default p-value threshold is 0.0001. If your sequence file is only a few Mb you could try setting the p-value threshold to a less stringent setting. The p-value threshold can be set using --thresh option.

That won't help if your sequence file is a eukaryotic genome. Even if you set a permissive p-value, FIMO will just start moving it downward as the flood of matches starts using up all available memory. You could try using the '--text' option. This directs FIMO to report matches as they occur. This keeps millions of matches from using up all available memory, but it also prevents FIMO from calculating q-values for the matches. But then what are you going to do with millions of identically scoring perfect matches, almost of all which are simply due to chance? This is jokingly  called "The Futility Theorem" (
  1. Wasserman WW
  2. Sandelin A
Applied bioinformatics for the identification of regulatory elementsNat Rev Genet 2004;5:276-87.).


The only solution to this is to use some other source of information to help distinguish the biologically significant matches from the chance matches. FIMO does support the use of position specific priors (PSP) described in 

Gabriel Cuellar-Partida, Fabian A. Buske, Robert C. McLeay, Tom Whitington, William Stafford Noble, and Timothy L. Bailey, "Epigenetic priors for identifying active transcription factor binding sites", Bioinformatics 28(1): 56-62, 2012 [pdf]
Reply all
Reply to author
Forward
0 new messages