Interpreting AME output

68 views
Skip to first unread message

Susan Luong

unread,
Feb 14, 2018, 12:29:37 PM2/14/18
to MEME Suite Q&A
Hello,

I am trying to determine if there are specific motifs enriched in the 3' UTRs of increased and decreased genes resulting from an RNA-Seq experiment. I extracted the 3' UTR sequences of these genes from UCSC and submitted them to AME. I also used the sequences to get Markov background models for each group (increased and decreased), respectively. I provided AME with a list of 10 motifs. When I run the program, I see that most of the motifs are highly enriched in both groups with crazy p-values. Although, the decreased genes have higher corrected p-values than the increased genes. Can someone help me interpret these data? Can I make any statement on whether a motif is more  or less enriched in the increased genes than the decreased genes?

Thanks in advance for you help!

Example:

Increased genes: 2. Ranksum p-values of motif CPE (TTTTAT) top 1334 seqs (left,right,twotailed): 1 2.179e-63 4.359e-63 U-value: 5.562e+05 (Corrected p-values: 1 1.308e-62 1.308e-62)

Decreased genes: 6. Ranksum p-values of motif CPE (TTTTAT) top 982 seqs (left,right,twotailed): 1 1.529e-21 3.058e-21 U-value: 7.016e+05 (Corrected p-values: 1 9.173e-21 9.173e-21)

CharlesEGrant

unread,
Feb 22, 2018, 8:41:06 PM2/22/18
to MEME Suite Q&A

The first thing that occurs to me, looking at the motif TTTTAT is that it's short and repetitive, and as you say, the enrichment p-values are astronomical.  Is it possible that your sequences contain low-complexity regions with runs of repeated nucleotides? Keep in mind that AME doesn't actually know much about biology, it's just looking at the statistics of good matches to your motif. If your sequences contain long runs of 'A', or 'T' that could be just as significant to AME as a bunch of transcription factor binding sites. For that reason we suggest masking out low complexity regions and repeats.

It sounds like you ran AME separately for each sequence using a shuffled version of itself as the control. Have you considered one set of sequences as the control and the other as the target? Say, using the up-regulated genes as the target and the down-regulated genes as the control? This might be better at identifying interesting motifs enriched in one set over the other.
Reply all
Reply to author
Forward
0 new messages