Hello,
I'd like to know if there is a way to refine motifs found with MEME. I have a large set of fixed length sequences detected from peaks in the signal. Depending on the dataset and the threshold used, this set comprise between 50k and 100k sequences explained by 1 to 30 motifs ranging from 4 to 13 bp. Those are expected to have some degeneracy (ambiguous nucleotides).
I already tried DREME, MEME-ChIP, and MEME. The first two handle those large dataset easily but motifs are often combined (short or overlapping motifs), and DREME do not handle gapped motifs which I have. Right now I got the best results by using MEME recursively on the sequence set, ~2000 sequences at a time. I remove sequences with motifs found in step 1 and re-run MEME until no motif are detected.
Unfortunately, I often got motifs slightly longer than expected with 1 or 2 positions with low confidence/probs bases which make it difficult to distinguish from actual ambiguous bases. Is there a robust way to refine the motifs found like that? I tried to increase the number of sequences in MEME but it quickly become too computationally intensive for my application. Could I use the PWM found by MEME and refined them with the complete set of sequence? I did not found other software in your suite to do that but I may be missing something (AME?). Or is there other parameters in MEME that I can tweak?
Regards,
Alan