Finding location of DREME motifs in input sequence

115 views
Skip to first unread message

Ian Sudbery

unread,
Mar 5, 2015, 6:34:20 AM3/5/15
to meme-...@googlegroups.com
Hi,

I have done descriminative motif find using DREME and a postive and negative set of sequences. I would now like to find the location of these motifs in the positive sequence set.

My immediate instinct was to turn to FIMO, but FIMO only significant motif matches - I want all perfect matches (if I just relax the significance threshold all the way to 1 I believe i'll get none perfect matches?).

I suppose I could write my own parser to do this, but I wondered if there was an already extant way to do this within the MEME suite.

Yours,

Ian Sudbery
--------------------

James Johnson

unread,
Mar 5, 2015, 6:19:28 PM3/5/15
to meme-...@googlegroups.com
We do indeed have a script that can do that.
http://research.imb.uq.edu.au/~j.johnson/tools/fasta-re-match

Usage:
    fasta-re-match [options] <IUPAC DNA Motif>

     Options:
      -norc                     Only find matches to motifs in the given strand
      -erase <IUPAC DNA Motif>  erases this motif before finding matches;
                                repeatable; order matters!
      -help                     prints this help message

     Reads sequences from standard input.

     Writes to standard output tab separated (space padded) columns:
     <matching sequence> <strand +-> <line number> <column number> <sequence offset> <sequence name>

     If you are trying to recreate DREME motif sites note that DREME erases
     previously found motifs so you will have to use the -erase option for any but
     the first motif, like:
     fasta-re-match -erase CCMCRCCC TTATCW < sample-dna-Klf1.fa

     If you want a count of sites try piping the output to "wc -l" like:
     fasta-re-match CCMCRCCC < sample-dna-Klf1.fa | wc -l

     If you want only one of the columns try piping the output to "cut -f <num>" like:
     fasta-re-match CCMCRCCC < sample-dna-Klf1.fa | cut -f 1



Shweta Bhandare

unread,
Oct 28, 2016, 4:55:46 PM10/28/16
to MEME Suite Q&A
I have used DREME and then used TAMO to get the k-mer and then used python to find which of the k-mers were identified in each sequence.

However, what I am trying to identify - given 2 sets of sequences (positive and negative) - how do I compute sensitivity and PPV for the identified motifs.

I guess I could use counts where each motif was found in positive and negative sets? 

CharlesEGrant

unread,
Oct 28, 2016, 5:03:27 PM10/28/16
to MEME Suite Q&A
Hi Shweta,

As I remarked in your other question, we're happy to answer questions about installing and using the MEME Suite, but this seems to be more of a general bioinformatics/statistics question. You should take this up with your local mentor.

Charles
Reply all
Reply to author
Forward
0 new messages