Question about Motif enrichment analysis and AME

lhc...@gmail.com

unread,

Jun 8, 2016, 3:09:53 PM6/8/16

to MEME Suite Q&A, meil...@usc.edu, shou...@usc.edu

Question about Motif enrichment analysis

Hi there,

I want to check whether there are any JASPAR motifs are enriched in my sequences. One thing I can do is just like MEME-Chip:

1) find discover novel DNA-binding motifs (with MEME and DREME)

2) analyze them for similarity to known binding motifs in JASPAR motif database with Tomtom

But if the number of the sequences is not big enough for MEME or DREME to discover the de novo motif, it cannot carry out motif

comparison for de novo motif and known JASPAR motifs by TomTom.

So, I think AME is right tools for my case, since it can tests whether a set of sequences contain more or better matches to the

motifs then would be expected by chance.

However, according to its manual, AME assumes the sequences have different importance.

Input Sequence File

A collection of (primary) sequences in FASTA format. The sequences must be sorted by increasing value of some secondary criterion (e.g.,

expression level, peak height, fluorescence score). In this documentation, we refer to this secondary criterion as the "FASTA score". This score

can optionally be placed in the FASTA ID line. If present, the FASTA score must come immediately after the sequence ID. For example, if the

FASTA ID line is

>seq_1 0.123

then 0.123 is the FASTA score for that sequence.

My question is if it’s hard to say which sequence is more important than others, the fasta file can’t be sorted or no scores can be

assigned to each sequence, can I assuming all the sequences are equally important? Is it possible for AME to perform motif

enrichment analysis for this situation? Should I assigned equal number such as 1 to each sequence in a fasta file? Or whether

there are any other proper tool for such kind of analysis?

Thank you very much and look forward your reply.

CharlesEGrant

unread,

Jun 16, 2016, 8:20:03 PM6/16/16

to MEME Suite Q&A, meil...@usc.edu, shou...@usc.edu

I belive you can. Certainly AME will run and should detect enrichment. My only concern might be that the p-values will not be entirely accurate. I'm checking with Tim Bailey, one of the authors of AME. I should have a definitive answer for you shortly.

CharlesEGrant

unread,

Jun 17, 2016, 7:10:14 PM6/17/16

to MEME Suite Q&A, meil...@usc.edu, shou...@usc.edu

Yes, you can definitely do this, and there is no need assign scores to each of the sequences.

AME allows you to provide a set of 'control sequences' that are not expected to be enriched for the motifs. AME can then determine motif enrichment by comparing the primary and control sequences. Neither the primary or control sequences have to be any particular order. The AME web application can automatically generate a control set by shuffling the primary sequence set. All you need to do is select the "Shuffled input sequences" radio button at the top of the form.

If you are using the command line version of AME you will ned to generate your own control set and use the --control command line option. You can shuffle your sequences using the

fasta-shuffle-letters utility included in the MEME Suite distribution.

Thanks for making us aware of this problem. The AME documentation was very unclear. We're revising for the next release.

lhc...@gmail.com

unread,

Jun 21, 2016, 12:07:58 PM6/21/16

to MEME Suite Q&A, meil...@usc.edu, shou...@usc.edu

Thank you very much CharlesEGrant. Your explanation is very clear!

在 2016年6月17日星期五 UTC-7下午4:10:14，CharlesEGrant写道：

Hai Li

unread,

Sep 7, 2016, 5:02:18 AM9/7/16

to MEME Suite Q&A, meil...@usc.edu, shou...@usc.edu

What the importance of the Markov background file in AWE? Does it make efforts to reduce the probabilities of finding motif by chance or just make function like control file?

Thank you so much!

CharlesEGrant

unread,

Sep 7, 2016, 7:11:04 PM9/7/16

to meme-...@googlegroups.com, meil...@usc.edu, shou...@usc.edu

The details of the AME algorithms can be found in this paper:

"Motif Enrichment Analysis: A unified framework and method evaluation", BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165.

At its lowest level, AME has to count the matches to the motifs found in the input and control sequences. But what constitutes a match to a motif? Motif instances are highly variable, so you can't simply look for an exact match to the consensus sequence, or even a regular expression approximating the motif.

Motif matches are identified by calculating the likelihood of observing a given subsequence using the Position Weight Matrix (PWM) of the motif model, the likelihood of observing the same sub-sequence using the background model, calculating the log-ratio of those two likelihoods, estimating the distribution of log-likelihood ratios, and then applying a p-value threshold. In short the background model is used to identify matches to the motifs in the input sequences.

If no control sequence set is provided, AME can generate one by shuffling the input sequence file. However this doesn't make any use of the background model.

cnzq...@gmail.com

unread,

May 29, 2021, 7:20:01 AM5/29/21

to MEME Suite Q&A

Hi CharlesEGrant,

How can I know the motif is over-represented or under-represented enrichment?

Thanks,

QZ

cegrant

unread,

Jun 10, 2021, 8:49:29 PM6/10/21

to MEME Suite Q&A

AME only reports motifs that are over-represented in the target sequences compared to the control sequences, or in the low scoring sequences when sequence score annotation is used. In the command line documentation it's described this way:

"AME identifies known user-provided motifs that are either relatively enriched in your sequences compared with control sequences, that are enriched in the first sequences in your input file, or that are enriched in sequences with small values of scores that you can specify with your input sequences (sample output from sequences, control sequences and motifs). "

Reply all

Reply to author

Forward