Which software should I use?

50 views
Skip to first unread message

av...@ualberta.ca

unread,
Dec 4, 2015, 3:05:37 AM12/4/15
to MEME Suite Q&A, marian...@na.icar.cnr.it
Hi there,
I am new in this area of research, so I was looking for some help in utilizing MEME Suite to solve my problem.
I have a murine and a homolog human data set of circa 1000 sequences of 5000bp upstream genes' regions each, which are known to be, from experiments of CHIP-CHIP, regulated by the same gene. We know that this gene most probably recognizes certain motifs, which are actually not stored in any of those motifs databases out there, and we know also that the regulation happens upstream (that is why I collected their upstream positions). I "manually" found (using fuzznuc command of EMBOSS) the positions of my motifs in all of the sequences of the two organisms and I "merely" compared the positional information between mouse and humans. Surprisingly (or not), I collected a bunch of motifs that have a very close positional information (as genetic distance from the start of transcription of every single gene) in homologous genes in both the species data sets, focusing my attention from -1 to -1000 of my upstream regions. Would you suggest me a tool which could validate, statistically, these occurances that I found, keeping in mind that my motifs are not known any published motifs database?
Thanks a lot in advance.
Mariano

CharlesEGrant

unread,
Dec 21, 2015, 6:47:47 PM12/21/15
to MEME Suite Q&A, marian...@na.icar.cnr.it
Hi Mariano,

I'm not sure what you mean by "statistically validate". You could take the motifs you've found, put them into the MEME motif format, and use AME to see if your sequences are enriched for those motifs. Alternatively, you could perform de novo motif discovery the sequences you've selected, using MEME or DREME, which should confirm the motifs you've already found. 

Both of MEME and DREME perform de novo motif discovery, and don't rely on published motif databases. Both programs provide an estimate of statistical significance for the motifs they discover.  DREME is more suitable for short motifs, but is limited to motifs less that 8bp wide. The public web application for MEME is limited to 60kb of input sequences, so your raw sequence data set (5mb) is far too large for that. In fact, that data set is probably too big to be handled on a local installation of the command line version of MEME unless you have a fairly powerful cluster computer at your disposal.  You might be able to work around this by trimming your sequences to just the regions containing the motifs you've already identified.
Reply all
Reply to author
Forward
0 new messages