Hello,
I have 3 questions concerning the use of MEME Suite tools for de novo motif discovery and enrichment analysis.
1. I am working on discovering de novo motifs in
1.5 kb long promoter sequences from several sets of co-expressed genes. The number of primary sequences vary quite a lot in size, ranging from ~20 to ~1200 sequences.
That is the first reason why I am not sure whether to use MEME or STREME. The second reason is that I only want the given strand to be scanned, but not the reverse complement.
Is it possible to force STREME to only use the given strand?
If so, would it be reasonable to use MEME for modules with < 50 sequences and STREME for > 50 sequences - both with differential enrichment as objective function? Or would you suggest to use the same algorithm for all sets of primary sequences regardless of their size?
2. Further, I am doing enrichment analyses using SEA (with the same primary sequences). Therefore, I am providing control sequences created by random sampling of sequences that should not contain the motifs of interest. Here, I am not sure about the number of control sequences in respect to the number of primary sequences and how it affects the p-value.
In general, is there a minimum number of sequences that should be used for SEA to get meaningful results?
3. I am aware of the existence of XSTREME which would combine all tools that I would like to use.
What is the advantage of using both, MEME and STREME for motif discovery?
Thanks a lot for your help, I really appreciate this forum and the support you provide to MEME Suite users!
Best,
Zoe
P.S.: I have another question on FIMO that I will put into another post, because this one is already rather long. If this is not appropriate, please tell me.