q-values in FIMO

1,601 views
Skip to first unread message

Alejandro Montenegro-Montero

unread,
Feb 4, 2014, 5:22:40 PM2/4/14
to meme-...@googlegroups.com
Hi all,

I'm using FIMO (from the command line) to scan the promoter regions of all the genes of my model organism (~10k genes) with PWMs that we have obtained from protein binding microarray experiments.
I calculated a background file using 4th-order Markov model.

I'm getting several hits, with p values ranging from 1E-05 to 9.7E-05. The thing that made me curious is that every hit had a q-value of 1. What does that mean?

I tested the same sequences in the online version of FIMO (for which I couldn't use the background file) and now, q value are being computed, but they are all between 0.728 and 0.812. I was expecting small numbers. P values from the web, were between 9.9E-06 to 9.77E-05

What should I interpret from this?

Thanks!

-A

CharlesEGrant

unread,
Feb 9, 2014, 6:03:11 PM2/9/14
to meme-...@googlegroups.com
Unfortunately FIMO will only uses a 0th order Markov model for the background. Of the MEME Suite tools only MEME can use higher order background models. The higher orders models include the lower order models, so FIMO is just picking the 0th order model out of the input background file.

You may find this posting on the MEME Q&A site helpful:

The q-value is a modification of the p-value to account for the problem of multiple testing.

What should I use as a threshold of significance for q-value?
https://groups.google.com/forum/#!searchin/meme-suite/q-value/meme-suite/WX-4zL-1kXs/CTotOH4_2EYJ

The problem is that when you scan a large sequence with a PWM you are performing the p-value significance test over and over again. If your sequence is long enough you are guaranteed of getting matches that exceed the p-value threshold entirely by chance. If you are scanning an entire genome the chance matches may completely overwhelm the 'true' matches. Multiple testing corrections attempt to account for this.

You could try making your p-value threshold more stringent, but since your motif is relatively short, so that may be of limited help. Your motif is only 8bp long. Using equal nucleotide frequencies of 0.25, the chance of random 8 bp segment being a perfect match to your motif is 0.25^8 = 0.0000152. If you are scanning a 10Mb sequence on average you will find 152 perfect matches to your motif entirely by chance. If you are scanning both strands of the human genome you'd get ~91,000 perfect matches entirely by chance.

atla goutham

unread,
Nov 17, 2015, 4:27:39 AM11/17/15
to MEME Suite Q&A
This paper cites that they used 5th order background model for FIMO. http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3432.html

Are there any changes afterwards in FIMO ?

CharlesEGrant

unread,
Nov 17, 2015, 1:34:35 PM11/17/15
to MEME Suite Q&A
No, FIMO and most of the rest of the MEME Suite only support 0th order models. The only tool in the suite that supports higher order models is MEME. The MEME Suite includes the utility program 'fasta-get-markov'. If that program is directed to produce a 5th order model, the output will contain the 0th through 4th order models as well. They may have created a fifth order background model, but FIMO only used the included 0th order model.

Prashant Kumar

unread,
Mar 9, 2016, 11:45:27 AM3/9/16
to MEME Suite Q&A
Hi,

I have recently started to use FIMO to bind occurences of a DNA sequence. The typically length of the sequence is 
15 bp. 

I got about 2000 hits across the genome but only 6 have q-value<=0.05. But, it is known from experiments that the 
sequence has more than 100 significant hits. Is there something that could be tweaked in order to increase the size 
of the significant hits.

Thanks.
PK

CharlesEGrant

unread,
Mar 9, 2016, 5:23:56 PM3/9/16
to meme-...@googlegroups.com
Hi Prashant,


ot about 2000 hits across the genome but only 6 have q-value<=0.05. But, it is known from experiments that the 
sequence has more than 100 significant hits. 

Keep in mind that the 0.05 limit for significance is just a convention. A q-value less than 0.05 doesn't guarantee that a match is biologically significant, and a q-value greater than 0.05  doesn't absolutely exclude a match from being biologically significant. 

 Is there something that could be tweaked in order to increase the size 
of the significant hits.

The single most important factor in getting good results from FIMO is choosing an appropriate background model. Did you provide a custom background model or just use the default? The default background model uses nucleotide frequencies from the NR database. Ideally you'd create a background model from sequence data that is biologically similar to the sequences you are analyzing with FIMO, but that doesn't contain any instances of the motifs you are trying to match. If you are scanning a full genome then using the nucleotide frequencies for that genome as a background model is a reasonable approach.

The other tactic that can help is using epigenetic information. If you are looking for transcription factor binding sites, then you may want to provide FIMO with position specific priors (PSP),  derived from DNAse I hypersensitivity data data. This is described in the FIMO documentation and Epigenetic priors for identifying active transcription factor binding sites. The FIMO web application provides DNAse I PSP online for a limited number of cell types for human and mouse.
Reply all
Reply to author
Forward
0 new messages