a lot of questions about AME

81 views
Skip to first unread message

y yang

unread,
Jul 26, 2023, 7:11:45 AM7/26/23
to MEME Suite Q&A

Hello,

 

I have three primary fasta sequences. All of them are important. I also have a control fasta sequence. I want to compare my three primary fasta sequences with my control fasta sequence to get the AME enrichment result. However, I have a lot of questions about the result.

 

1.     All of my three primary fasta sequences are important, I do not want to order them. In order to avoid the order, I separate them into three input files and run AME three times separately. However, I also get the FASTA_score in each sequences.tsv. I don’t know where the Fasta score comes from. Is it order my primary fasta sequences or order the motif database? How to understand the FASTA_score in sequences.tsv?

 

2.     I have two motif databases. We call them database1 and database2. They have different motifs. I can find the significant enrichment result from database1 and I cannot find significant enrichment results database2. However, after I merge database1 and database2 together, the significant enrichment results from database1 cannot be found from this merged database. My question is whether the size of motif database will influence the result.

 

3.     After I checked the ame.html file, I found the background source are built from my primary sequences. So what is the function of my control sequences?

 

 

Thanks a lot!

 

Best,

YY

y yang

unread,
Jul 26, 2023, 7:19:34 AM7/26/23
to MEME Suite Q&A
Moreover, this is my command: 
for question1:
1. ame --oc /input/AME_test_motif_xac  --control /input/control.fa /input/xac.fa /data/motif_database1_test.meme
1. ame --oc /input/AME_test_motif_xab  --control /input/control.fa /input/xab.fa /data/motif_database1_test.meme
1. ame --oc /input/AME_test_motif_xaa  --control /input/control.fa /input/xaa.fa /data/motif_database1_test.meme
for question2:
1. ame --oc /input/AME_test_motif_xac  --control /input/control.fa /input/xac.fa /data/motif_database1_test.meme (have enrichment result)
2. ame --oc /input/AME_test_motif_xac  --control /input/control.fa /input/xac.fa /data/motif_database2_test.meme (do not have enrichment result)
3. ame --oc /input/AME_test_motif_xac --control /input/control.fa /input/xac.fa /data/motif_combined_test.meme (do not have enrichment result)

cegrant

unread,
Aug 2, 2023, 7:33:59 PM8/2/23
to MEME Suite Q&A
Hi YY,

 All of my three primary fasta sequences are important, I do not want to order them. In order to avoid the order, I separate them into three input files and run AME three times separately.

 However, I also get the FASTA_score in each sequences.tsv. I don’t know where the Fasta score comes from. Is it order my primary fasta sequences

This doesn't sound like the sort of analysis that AME is really designed for. AME really expects to have multiple ranked sequences in the input set. If you don't provide an explicit FASTA score the FASTA score will be derived from the order of the sequences in the input sequence file, even if you have only once sequence. Running AME on a single sequence will have very little statistical power and probably is not useful. SEA might be closer to what you need, as it doesn't expect the sequences to be ranked, and you could include all 3 sequences in a single file. However, SEA considers only the single best match to a motif in each sequence, whereas AME can using a scoring system that considers multiple matches to a motif in a sequence. It depends on what you expect for you sequences.

 However, after I merge database1 and database2 together, the significant enrichment results from database1 cannot be found from this merged database. My question is whether the size of motif database will influence the result.

The enrichment of each motif is calculated independently. The size of the motif file should not affect the scores. You'd have to forward us copies of the input sequence and motif files you want us to troubleshoot this.

 I found the background source are built from my primary sequences. So what is the function of my control sequences?

The background is the model of the nucleotide frequencies outside instances of a motif. For AME to provide good statistics the target and control sequence sets should have close to the same background i.e. the same nucleotide frequencies. The control sequences are assumed NOT to be enriched for the motifs. AME calculates a score based on the motif matches in each sequence. If the sequences in the control set tend to score higher than the sequences in the target set then that is evidence that the target set is not enriched for the motifs. Depending on how many sequences in the target set outscore sequences in the control set quantitative statistics can be generated for the enrichment of the target set. This is why having AME analyze only one target sequence can only provide very coarse statistics for the enrichment.
Reply all
Reply to author
Forward
0 new messages