MEME de-novo motif discovery

gatla

unread,

May 16, 2017, 12:45:33 AM5/16/17

to MEME Suite Q&A

Hi All,

I am using MEME command line version to find the denovo motifs, and later TOMTOM to compare them to known motifs. I have couple of questions:

1. I am centering the peaks of my interest and trying to call denovo motifs with different flanking lengths from the centre of the peak, both for query peaks and control peaks. Currently I am using +/-50bp, +/-100bp, +/-150bp, +/-200bp and +/-250bp. I get a different motif everytime. I am not sure which one to believe in.

2. I also get the repetitive motifs like TATATATATATA or GTGTGTGTGTG as my first hits in denovo motif analysis. Is that biologically true ?

3. I am using fasta-get-markov to create the background nucleotide frequencies. Which order should I use ? Currently I am using 5, but not sure how to choose it.

gatla

unread,

May 16, 2017, 12:47:21 AM5/16/17

to MEME Suite Q&A

I forgot to mention that its a K27Ac peaks and the peak is entered on ATAC Seq signal.

CharlesEGrant

unread,

May 16, 2017, 5:30:39 PM5/16/17

to meme-...@googlegroups.com

1. I am centering the peaks of my interest and trying to call denovo motifs with different flanking lengths from the centre of the peak, both for query peaks and control peaks. Currently I am using +/-50bp, +/-100bp, +/-150bp, +/-200bp and +/-250bp. I get a different motif everytime. I am not sure which one to believe in.
2. I also get the repetitive motifs like TATATATATATA or GTGTGTGTGTG as my first hits in denovo motif analysis. Is that biologically true ?

MEME identifies motifs by finding short sub-sequences that are statistically over-represented in you sequence database. It is applying a statistical algorithm, not some encapsulation of biological knowledge. MEME can't distinguish tandem repeats or low complexity regions from more biologically relevant motifs. For this reason you may want to mask repeats and low complexity sequences out of your sequences using tools like DUST and RepeatMasker.

MEME doesn't perform an exhaustive search for motifs. Even for very small sequence databases, an exhaustive search is computationally infeasible. MEME combines a greedy algorithm with heuristics that allow it to make good guesses about the best motifs. MEME isn't guaranteed to find motifs in order of their statistical significance, but it is usually able to spot the most significant motifs early on. If you direct MEME to find 10-20 motifs you should see a fairly clear drop off in the statistical significance (E-value). What are the E-values of the motifs in your results? If the E-values are much greater than 0.02, then you may just be looking at 'noise', and those results will vary widely as you change the details of the search, say by including more sequence data.

If you are using ChIP-Seq or some related technology you may want to use MEME-ChIP rather than MEME. MEME-ChIP will sample 600 sequences from your dataset and trim them to their central 100bp before analyzing them with MEME. A detailed protocol for analyzing ChIP-Seq data with MEME-ChIP is available:

Motif-based analysis of large nucleotide datasets using MEME-ChIP" Nature Protocols, 9(6):1428-1450, 2014.

3. I am using fasta-get-markov to create the background nucleotide frequencies. Which order should I use ? Currently I am using 5, but not sure how to choose it.

MEME de-novo motif discovery - parameters

gatla

gatla

CharlesEGrant