How does MAST differ from FIMO?

1,121 views
Skip to first unread message

CharlesEGrant

unread,
Jul 11, 2017, 4:08:40 PM7/11/17
to meme-...@googlegroups.com
How does MAST differ from FIMO

FIMO is looking for the best individual matches to motifs. MAST is looking for the sequences that have the best overall match to a collection of motifs. 

FIMO's task is simple: given a set of motifs and a database of sequences, compute the match score to each motif at each position in each sequence, and report all the motif matches that pass the p-value/q-value threshold. 

MAST's algorithm is more complex. For each sequence it carries out an initial scoring that is quite similar to FIMO's. MAST then picks the best match for each motif in the sequence. The p-values of these top matches are multiplied together to create an overall score for the full sequence. MAST reports the sequences that have the most significant overall scores. Typically MAST would be used to look for regulatory regions in DNA, or structures in proteins, where several motifs might occur near each other.


Note that this means that MAST is not suitable for genome scale analyses. If you try to scan a full eukaryotic genome with MAST, it will almost certainly run out of memory. Conceptually, it makes no sense to score the similarity of an entire chromosome to a handful of motifs.

Alejandro Montenegro-Montero

unread,
Feb 12, 2016, 6:09:18 PM2/12/16
to MEME Suite Q&A
Hi,

If you scanned the same sequences with a single PWM, using both FIMO and MAST, would you expect to get the same results?

Thanks!

-A

cegrant

unread,
Feb 12, 2016, 9:11:39 PM2/12/16
to MEME Suite Q&A
Sometimes, but not necessarily! It's very important to remember that MAST is scoring sequences while FIMO is scoring individual motif matches. If you look at the section titled 'p-values'  in the MAST documentation you'll see that MAST distinguishes between 'position p-values', 'sequence p-values', and 'combined p-values'. The combined p-value is the basis of the MAST sequence score. It depends not only on the motif matches, but also on the length of the sequence! You could have two sequences containing the exact same match to a motif, but if once sequence is much longer than the other, its E-value might be so low that MAST simply won't report it.

Internally, MAST does score all the motif matches, just like FIMO does. It's just that MAST then uses those motif match scores to calculate an overall sequence score, and only reports the motif matches if the sequence score passes the chosen significance threshold. 

If you are using the command line version of MAST you can use the '-hit_list' option. This will direct MAST to simply print a list of all motif matches as plain text to the standard output, without calculating the sequence scores.

Laura Pineda

unread,
Jun 1, 2020, 10:35:26 PM6/1/20
to MEME Suite Q&A
Hi, I don't understand very well the situations in which using MAST or FIMO turn out to be advantageous. For example, if I want to search for a known motif in a couple of upstream sequences (200-300 pb) and I have some motifs from literature or a database, how do I know which tool is more suitable? I mean, both tools will give me the localization of the possible motif, which should be related to the known motifs I gave as input.

cegrant

unread,
Jun 2, 2020, 4:11:50 PM6/2/20
to meme-...@googlegroups.com
Behind the scenes, MAST is doing the same thing as FIMO, finding all the significant matches to all the motifs. However, MAST goes a step further and combines the best match to each motif over an entire sequence, to score the sequence as a whole. Suppose you have motifs A, B, and C corresponding to transcription factor binding sites. You would only use MAST when you expect that A and B and C have to occur together in a sequence for them to be biologically functional, say cooperating TF. You can't rely on the MAST output to show you all the significant matches to your motifs. It just shows the matches in the sequences that have the best overall match to the all three motifs.  Use FIMO when you just want to know the location of the best individual matches to A or B or C, and you don't care whether a particular sequence contains a good match to all three.

Reply all
Reply to author
Forward
0 new messages