Order of input motifs affect MAST output

54 views
Skip to first unread message

Laura Pineda

unread,
Jun 4, 2020, 2:04:28 PM6/4/20
to MEME Suite Q&A
Hi, I´m using MAST to scan 6 upstream sequences with 8 know motifs sequences as input. I did the first attempt typing the 8 motifs in a random order and choosing in the advanced options the option Remove redundant motifs from query, Scale the motif display threshold, and Use individual sequence composition. In the first output I got, 3 of the 8 motifs were removed due to similarity with other motifs. However, I did a second attempt leaving the advanced options that I mentiones before, but I typed the same motifs in a different order. In this case, again 3 of the 8 motifs were removed but I got different motifs found in my sequences. The E.value changed for all my sequences and in some cases I got additional motivs in the block diagram or some were removed and didn't appear this time. Also, some stayed the same but the p-value changed.

I'm aware that MAST output depends on sequence lenght, but why am I getting different results if the motifs are the same?

HTML Output for the first attempt
 
HTML Output for the second attempt
 

cegrant

unread,
Jun 4, 2020, 6:56:38 PM6/4/20
to MEME Suite Q&A
Could you attach the full HTML output files? The screen captures are too small to read. Use the "Attach a file" link in the edit box.

cegrant

unread,
Jun 5, 2020, 1:50:50 PM6/5/20
to MEME Suite Q&A
Several of the motifs you used are highly correlated (in fact nearly identical), and you specified that highly correlated motifs should not be considered in the search. Notice that some of the motifs are "greyed out", and the text at the top of the list of motifs: "Motifs which are grayed-out were very similar to other earlier specified motifs and were removed from the scan as you requested."

When you change the order of the motifs MAST makes different choices about which motifs to remove from the scan. Suppose you have motifs A, B, and C, and the motifs A and C are nearly identical. If the order of the motif in your input files is A, B, C, then motif C will be dropped as redundant to A. If the order of the motifs in your input file is C, B, A, then A will be dropped as redundant to C.

Note that MAST actually identified the the same five top scoring sequences (although with slightly different E-value because different motifs were dropped).

Laura Pineda

unread,
Jun 5, 2020, 6:57:04 PM6/5/20
to MEME Suite Q&A
So, is it an error to specify that highly correlated motifs should not be considered in the search? I used those motifs, which should be highly conserved, because my intention is to find if some of my sequences have a similar one, so I can't really change those motif sequences.

cegrant

unread,
Jun 6, 2020, 3:38:50 PM6/6/20
to meme-...@googlegroups.com
Remember MAST's basic task is to score sequences for their combined match to all of the motifs in your input motif file. MAST identifies the best match to each motif in a given sequence, and then multiplies the p-values of those best matches to get an overall score for the sequence. The idea is to identify those sequences that contain a good match to all the motifs in your file. This is typically used to identify promoter sequences that binding sites to known co-regulating transcript factors. If you have motifs A, B, and C MAST is only going to report those sequences that have good matches to A and B and C. If you have a sequences with a perfect match to A, but no matches to B and C, then MAST is not going to report that match to A.

If your motifs are highly similar to each other (highly correlated) this becomes a problem for MAST. If you have motifs A, and B, that are very similar to each other, then the best match to A in a given sequence might also be the best match to B in that sequence, but they aren't distinct sites. That's why MAST's default behavior is to not consider motifs that are almost identical to motifs that have already been considered. You can turn this off by unchecking the checkbox "Remove redundant motifs from query?"., but given that your motifs are nearly identical, the MAST results wouldn't be reliable.

because my intention is to find if some of my sequences have a similar one

I'm not clear what your overall goal is, but this makes me think that MAST is  not the appropriate tool for you, and you probably should be using FIMO. It really sounds like you want to find individual matches to your motifs.

Laura Pineda

unread,
Jun 9, 2020, 9:15:07 PM6/9/20
to MEME Suite Q&A
Thank you for the advice, I'll try to use FIMO.
Reply all
Reply to author
Forward
0 new messages