Does the number of query or target motifs affect the Tomtom p-value?

327 views
Skip to first unread message

ara...@bu.edu

unread,
Jul 6, 2018, 10:38:34 AM7/6/18
to MEME Suite Q&A
Hi,

I'm running Tomtom to compare de novo motifs to a set of known/established motifs from a motif database.  The general goal is to identify the best match for the de novo motif.  I have a de novo motif: FOX_denovo and known motifs for FOXD1 and FOXI1.  I ran Tomtom two ways but I'm getting a different p-value for pairwise motif matching.  

1st run of Tomtom:
<query motif file>: set of 6 de novo motifs
<target motif file>: set of 2 motifs (FOXD1 and FOXI1)
Result: The p-values for the FOX_denovo motif were 4.25E-02 for FOXD1 and 5.99E-02 for FOXI1

2nd run of Tomtom:
<query motif file>: set of 6 de novo motifs
<target motif file>: set of 97 motifs (full list of motifs in my database)
Result: The p-values for the FOX_denovo motif were 2.24E-03 for FOXD1  and 1.73E-04 for FOXI1

I was expecting the Tomtom p-value to be exactly the same between these two analysis.  Is it correct to say that the Tomtom p-value represents a p-value for motif similarity? Is the Tomtom p-value a function of how many motifs are in the <target motif file>?  

What is the best way to get a similarity score for comparing two motifs?  I was hoping to use Tomtom for this purpose.

Andy

CharlesEGrant

unread,
Jul 10, 2018, 5:21:14 PM7/10/18
to MEME Suite Q&A
If you hover your mouse pointer over "p-value" you'll see a red question mark appear. This is the link to the on-line help for "p-value":

The probability that a random motif of the same width as the target would have an optimal alignment with a match score as good or better than the target's.
Tomtom estimates the p-value using a null model consisting of sampling motif columns from all the columns in the set of target motifs.

so the null model used to estimate the p-value depends on the motifs in the target database, which means that the choice of the target database will affect the reported p-values. 

In some sense, the larger the target database the better the null model, however a larger target database also means a larger multiple testing problem. Tomtom provides an E-value and a q-value along the p-value for motif matches. Like, the p-value, the E-value and q-value are measures of statistical significance, but they are corrected for multiple testing. My recommendation would be to use the larger target database, and use the q-value or the E-value as the similarity score. The E-value is somewhat more conservative than the q-value. In either case though, the smaller the value, the more statistically significant the match.

Ariana Treat

unread,
Jul 13, 2025, 4:50:38 PMJul 13
to MEME Suite Q&A
Is there a recommended target database size or size threshold? eg. >100 motifs? I am looking to compare motifs detected by STREME between two datasets, but the amount of significant motifs is around 20 for both. I can redo the STREME runs and set -nmotifs to get a larger dataset, but I don't know what number of motifs I should specify.

tlawb...@gmail.com

unread,
Jul 30, 2025, 8:39:44 PMJul 30
to MEME Suite Q&A
If the scientific question is "which of two motifs better matches motif X", then you would want to 
search each of the two motifs (the queries) against a large set of targets containing motif X.
You want the target motifs to represent a wide range of diverse motifs, so I would recommend
that you use a large reference motif database from the organism you are interested in.  You can download
such databases easily from the MEME Suite site: https://meme-suite.org/meme/doc/download.html

If motif X is not in the reference database, add a file containing it to the list of <target file>+ in the
Tomtom command line.

If the two query motifs are not in the same file, make sure you run Tomtom twice using the same
(list of) target motif databases each time.  Then the p-values of the two query motifs will be comparable.

-Tim
Reply all
Reply to author
Forward
0 new messages