Advice on filtering FIMO output

8 views
Skip to first unread message

Amira Kramdi

unread,
Jun 28, 2024, 11:45:05 AM (8 days ago) Jun 28
to MEME Suite Q&A
Hello everyone,

I am using FIMO to scan a motif in a region of 200bp centered around ChIP-seq peak summits (npeaks=500). The peaks correspond to binding sites of a protein that binds DNA via ​a partner. Thus, the motif scanned belongs to the partner,​ so I expect FIMO to return positive hits in a ​significant fraction of the input sequences.

H​ere's the command I used initially :

fimo --norc --oc $fimoDIR/$motifID --verbosity 1 ​ --qv-thresh --thresh ​0.01 $motifFile $fimoDIR/$motifID/sequencesToScan.fa

First, I noticed that when setting ​--qv-thresh --thresh ​0.01​ ​the program does not threshold on the q-value, which was misleading at first. Am I using these options correctly ?

​Using this command and after applying the q-value filter myself on best_sit​e.narrowPeak file, I​ was surprised to get very few significant hits (6 hits)​.​ I go up to 100 best hits with q-value 0.05.

At this point, I considered the hypothesis that the ChIPed protein may have different DNA binding partners, so I decided to I run MEME-chip on the sequences with no hits (including no significant hits based on the q-value threshold). To my surprise, the top motif detected by STREME was the once I initially scanned and Centrimo showed a nice central enrichment around the summit​. This made me wonder if I was missing likely true occurrences because of p-value/q-value filters.

​While I am aware that it is important to account for multi testing due to the sequence length and that these thresholds are arbitrary (this discussion was very helpful in this regard btw), I am tempted not to filter on the q-value and work with p-value=1e-3​ in this case.

Any thoughts on this ? Do FIMO users always use the q-value to report hits ? I've seen papers that use only the p-value (may be because the reported motifs checked out in terms of central enrichment, ChIP signal and such..)

​Many thanks in advance for the help !
Best,
Amira

cegrant

unread,
Jun 30, 2024, 12:54:13 AM (6 days ago) Jun 30
to MEME Suite Q&A
First, I noticed that when setting ​--qv-thresh --thresh ​0.01​ ​the program does not threshold on the q-value, which was misleading at first. Am I using these options correctly ?

I just double checked, and that is the correct usage. When I run it, it does correctly threshold the results on the q-value. Could you forward us a copy of the input file you used and the FIMO HTML output? That would help us troubleshoot the problem.

At this point, I considered the hypothesis that the ChIPed protein may have different DNA binding partners, so I decided to I run MEME-chip on the sequences with no hits (including no significant hits based on the q-value threshold). To my surprise, the top motif detected by STREME was the once I initially scanned and Centrimo showed a nice central enrichment around the summit​. This made me wonder if I was missing likely true occurrences because of p-value/q-value filters.

This is entirely possible! You have to keep in mind though that FIMO has no biological insight. It's performing a purely statistical test of whether a short sequence is a "good" match to a motif. In many cases truly functional sequences may not have a statistically significant match to the motif, while other, non-functional sequences are, a highly significant match. The larger you sequence set the bigger the problem is due to the multiple testing issue that you noted. This is discussed briefly in the Example section of the FIMO paper. If you can provide priors for which segments of your sequences are more likely to be biologically active (say epigenetic marks), then you might take advantage of FIMO's ability to include position specific priors in its scoring (see the FIMO documentation on the --psp option, also see Gabriel Cuellar-Partida, Fabian A. Buske, Robert C. McLeay, Tom Whitington, William Stafford Noble, and Timothy L. Bailey, "Epigenetic priors for identifying active transcription factor binding sites",
Bioinformatics 28(1): 56-62, 2012). 

If you want to use FIMO as an exploratory tool setting up your later work, then you are free to choose the filters and thresholds you find useful. However, if you are going to present FIMO output as actual evidence for the locations of motif binding sites, then it's best to be rigorous. Use q-values and a significance threshold that other researchers will find credible. 
Reply all
Reply to author
Forward
0 new messages