Clarification regarding best practices for FIMO background selection

110 views
Skip to first unread message

GKunzelman

unread,
Feb 15, 2022, 6:06:16 PM2/15/22
to MEME Suite Q&A
I am using FIMO to identify motifs present in H3K27ac ChIP-seq peaks that are differential between experimental groups - in some cases I have as few as 1 peak (~200 nts) and in other cases a few thousand sequences of variable length. I am unsure regarding the best practices for establishing the background model for FIMO. 

My understanding is that the background should be biologically similar to peaks that I am asking about but should not contain instances of the motif of interest. I have run 'fasta-get-markov' on the following file:

1. The sequences in which I am trying to identify motifs
2. Sequences (peaks) that are common to all my experimental groups
3. The entire genome

Each yields a different backgrounds model and the FIMO result vary greatly based which I use - It is my understanding that this is expected.  Considering the fact that this was a histone modification IP and that in some cases I am only asking about a single sequence, is one of these methods best or is there an alternative approach that I should be taking to generate background? I am simply trying to make sure the motifs identified are the most accurate and I am struggling to find a clear answer on which approach is best.

Thanks in advance for any assistance.  

cegrant

unread,
Feb 18, 2022, 9:53:03 PM2/18/22
to MEME Suite Q&A
Ideally the background file should be generated from a collection of sequences that is similar biologically to the sequences you are scanning, but which doesn’t contain any instances of the motifs you are looking for. Unfortunately, unless you have an all-knowing oracle, that information is impossible to find. In most cases your input sequences contain relatively few instances of any motifs relative to the overall size of the file, so a reasonable substitute is to simply generate the background model from the input sequences (your choice 1.). This is what most of the MEME Suite tools do by default.

FIMO can only use a 0th order background model, which is just the set of nucleotide frequencies. Unless you expect each of your sequences to contain many instances of the motifs you are scanning for, the best you can do is generate the background model from the sequences you are scanning.
Reply all
Reply to author
Forward
0 new messages