motif, e-value and number of sequences

277 views
Skip to first unread message

Nicolas Descostes

unread,
Oct 7, 2016, 4:54:40 PM10/7/16
to MEME Suite Q&A

Hello,


I have performed a motif enrichment analysis with MEME-ChIP.


I am getting a beautiful motif with an e-value of 1.3e-357. And luckily I get a known motif (show more section below the logo) with an e-value of 1.1e-264.


When I click on "MEME" under the "Discovery/​Enrichment Program" section, I can see that 306 sites contributed to build the motif. Knowing that I submitted 2359 sequences, I would tend to say that this motif is not significant.


Do you agree on that?


The known motif which was found with centrimo, indicates 1658 in "region matches" section. does it mean that 1658 of my sequences have the motif?


Thanks a lot

CharlesEGrant

unread,
Oct 7, 2016, 5:52:27 PM10/7/16
to meme-...@googlegroups.com
When I click on "MEME" under the "Discovery/​Enrichment Program" section, I can see that 306 sites contributed to build the motif. Knowing that I submitted 2359 sequences, I would tend to say that this motif is not significant.
Do you agree on that?

No! MEME uses a greedy algorithm. As soon as it has enough evidence to estimate the statistical significance of a motif candidate it will stop looking for further evidence for that motif, and start looking for other motifs. The statistical significance of a motif discovered by MEME is given by the E-value. An E-value of 1.3e-357 indicates that motif is highly statistically significant in your sequence data.

Keep in mind that MEME is performing de novo motif discovery without any reference to databases of known motifs, while CENTRIMO is looking for enrichment of motifs  both from your MEME results and from databases of known motifs. You'll want to asses whether the highly significant motif reported by MEME is really some variant of the known motif reported by CENTRIMO. You can compare the logograms by eye, and then look at the TomTom results to see if they really represent the same motif.

The known motif which was found with centrimo, indicates 1658 in "region matches" section. does it mean that 1658 of my sequences have the motif?

Clicking on the help button (the red question mark) for the "Region Matches" column of the CENTRIMO results provides the following text.

The number of (positive) sequences whose best match to the motif falls in the reported region.

Note: This number may be less than the number of (positive) sequences that have a best match in the region. The reason for this is that a sequence may have many matches that score equally best. If n matches have the best score in a sequence, 1/n is added to the appropriate bin for each match.


Furthermore, CENTRIMO is only considering motif matches that pass a score threshold. So, strictly speaking, CENTRIMO reporting 1658 regions matches means that 1658 of your sequences contain at least one match to the motif in their central region that passes the score threshold. Other sequences might also contain instances of the motif outside the central region, or with poorer matches to the motif position weight matrix than the score threshold will allow.

Look at the FIMO output included in the MEME-ChIP results to get a list of all motif matches in your sequence data.

Nicolas Descostes

unread,
Oct 10, 2016, 10:10:25 AM10/10/16
to MEME Suite Q&A
Thank you very much Charles and sorry for cross posting.
Reply all
Reply to author
Forward
0 new messages