I am using FIMO to identify motifs present in H3K27ac ChIP-seq peaks that are differential between experimental groups - in some cases I have as few as 1 peak (~200 nts) and in other cases a few thousand sequences of variable length. I am unsure regarding the best practices for establishing the background model for FIMO.
My understanding is that the background should be biologically similar to peaks that I am asking about but should not contain instances of the motif of interest. I have run 'fasta-get-markov' on the following file:
1. The sequences in which I am trying to identify motifs
2. Sequences (peaks) that are common to all my experimental groups
3. The entire genome
Each yields a different backgrounds model and the FIMO result vary greatly based which I use - It is my understanding that this is expected. Considering the fact that this was a histone modification IP and that in some cases I am only asking about a single sequence, is one of these methods best or is there an alternative approach that I should be taking to generate background? I am simply trying to make sure the motifs identified are the most accurate and I am struggling to find a clear answer on which approach is best.
Thanks in advance for any assistance.