STREME motif significance

22 views
Skip to first unread message

Ariana Treat

unread,
Feb 13, 2026, 5:26:58 PMFeb 13
to MEME Suite Q&A
From the STREME paper,
  1. Motif significance computation. STREME computes the statistical significance of the motif by using the motif and the optimal discriminative score threshold (based on the primary and control sequences) to classify the hold-out set sequences, and then applying the statistical test to the classification. Classification is based on the best match to the motif in each sequence (on either strand when the alphabet is complementable).

This is unclear to me; how should we interpret the ordering of motifs in significance? Some motifs that are considered more significant by STREME show up much less frequently in my list of sequences than less significant motifs (I determined this by searching the sequences with STREME motifs using FIMO and tallying motif occurrences). Are more significant motifs also more significant biologically?

Thank you,
Ariana Treat

cegrant

unread,
Mar 19, 2026, 6:35:56 PM (13 days ago) Mar 19
to MEME Suite Q&A
As you probably are no doubt aware, motifs can be ordered in order of increasing p-value.
Smaller p-values are more statistically significant.  A basic tenet of motif discovery is that /statistical/
significance does NOT guarantee /biologica/l significance, but they are often correlated.

Your confusion may be caused by how you are tallying motif occurrences in your FIMO output.
If you are counting all of the occurrences of a motif in each sequence, then you are doing
something very different than what STREME does.  As stated in the text you quoted above,
STREME only pays attention to the single strongest match to the motif in each sequence.
It automatically picks a motif match threshold, and compares how many positive and negative
hold-out sequences have (at least one) match above that threshold. The Fisher Exact test
is used to compute the p-value of the motif.

To visualize this a bit better, you can click on the "More" button in the sample STREME output
provided at the MEME Suite website.  The results for the top motif look like this:


The "Test" results show the values inserted into the Fisher Exact test and the resulting p-value.
The number of matching Test positives (e.g., 34) depends on the Match Threshold of 11.5654.
FIMO will undoubtedly be using a different threshold when it reports its results.  (FIMO uses a
threshold defined in terms of a different kind of statistical significance, not the match score.)
This makes trying to reproduce the STREME p-value prediction using FIMO difficult.

The best way to state the STREME significance is that it approximates the probability of a random motif
achieving a classification result as good or better on the hold-out set as the one observed.
In the example above, the motif classifies 37.8% of the positive sequences as positives, while
only 3.3% of the negative sequences are (falsely) classified as positive.

I hope this explanation helps clarify the meaning of motif significance in STREME.

(Posted by cegrant, written by Tim Bailey)
Reply all
Reply to author
Forward
0 new messages