Differentially enriched motifs with different promoter sizes

26 views

Skip to first unread message

Arthur Roque Justino

unread,

Apr 29, 2025, 8:20:48 PM4/29/25

to MEME Suite Q&A

Hi everyone,

I'm planning to perform an analysis of differential motif enrichment in the promoter (~ 1 Kb before TSS) regions of different species clades for a set of genes. However, the sequences are not well conserved, even in terms of length — some contain insertions or deletions that range from just a few nucleotides to several dozen bases.

I would like to ask: is there a recommended way to account for such differences before running the analysis? Should I restrict the comparison to conserved regions only?

Additionally, would STREME be the most suitable tool for this type of analysis, or would AME be more appropriate?

Thank you!

Best regards,

Arthur

cegrant

unread,

May 10, 2025, 8:30:01 PM5/10/25

to MEME Suite Q&A

The purpose of STREME is de novo motif discovery. Motif enrichment analysis with known motifs is handled by AME or SEA. SEA replaces AME, much as STREME replaces MEME.

Both SEA and STREME assume that the target and control sequences are all drawn from the same distribution. Ideally, they'd all have roughly the same size, but a certain amount of variation won't spoil the statistics. I don't think a variation of several dozen bases in ~1kb sequences will be a problem at all. We do occasionally encounter users submitting databases that contain sequences varying in size from 5-10 bases to tens of kb! We have to warn them that sort of extreme variation in size will make the statistical confidence measures unreliable. If the average length of the control sequences exceeds that of the target sequences, the control sequences will automatically be trimmed to the average length of the target sequences unless the