MEME or STREME: scanning reverse complement + SEA: number of control sequences

247 views

Skip to first unread message

Zoe Wessely

unread,

Jun 24, 2023, 9:49:42 AM6/24/23

to MEME Suite Q&A

Hello,

I have 3 questions concerning the use of MEME Suite tools for de novo motif discovery and enrichment analysis.

1. I am working on discovering de novo motifs in 1.5 kb long promoter sequences from several sets of co-expressed genes. The number of primary sequences vary quite a lot in size, ranging from ~20 to ~1200 sequences.

That is the first reason why I am not sure whether to use MEME or STREME. The second reason is that I only want the given strand to be scanned, but not the reverse complement.

Is it possible to force STREME to only use the given strand?

If so, would it be reasonable to use MEME for modules with < 50 sequences and STREME for > 50 sequences - both with differential enrichment as objective function? Or would you suggest to use the same algorithm for all sets of primary sequences regardless of their size?

2. Further, I am doing enrichment analyses using SEA (with the same primary sequences). Therefore, I am providing control sequences created by random sampling of sequences that should not contain the motifs of interest. Here, I am not sure about the number of control sequences in respect to the number of primary sequences and how it affects the p-value.

In general, is there a minimum number of sequences that should be used for SEA to get meaningful results?

3. I am aware of the existence of XSTREME which would combine all tools that I would like to use.

What is the advantage of using both, MEME and STREME for motif discovery?

Thanks a lot for your help, I really appreciate this forum and the support you provide to MEME Suite users!

Best,

Zoe

P.S.: I have another question on FIMO that I will put into another post, because this one is already rather long. If this is not appropriate, please tell me.

cegrant

unread,

Jul 15, 2023, 10:03:23 PM7/15/23

to MEME Suite Q&A

Hi Zoe,

For your first question, the only way to force STREME to scan only one strand is to use a custom alphabet. The MEME Suite format for a custom alphabet is described here. You can copy the alphabet for DNA, but specify that the symbols are not complementable. For example, change

A ~ T

The custom alphabet will also force MEME to only scan the forward strand, so you can drop the '-revcomp' option.

For your second question, the key property is that the sequences in the target and control sequences have to be drawn from the same length distributions. That is, if the sequences in your primary range in size from ~20 to ~1200, then the control sequences should too, and with about the same number of sequences near 20 etc. The number of sequences in the control set isn't as important, though if there is a large mismatch between the number of target and control sequences it will be harder to constrain them to have the same length distribution.

For your final question: MEME is somewhat better at identifying short motifs than STREME, and STREME is much better at identifying long motifs than MEME. If you have no idea of the size of the motif you are looking for it's best to try both. XSTREME does that, and also runs several other useful analyses, particularly Tomtom which compares the motifs identified to databases of known motifs.

Reply all

Reply to author

Forward

0 new messages