mining motif data by gene lists

26 views
Skip to first unread message

Michael Shapira

unread,
Jul 14, 2014, 1:00:11 PM7/14/14
to gen...@soe.ucsc.edu
Hi.
I am trying to use the database to find prevalence of a certain motif in different gene lists (to get numbers, and possibly information of preferred localization within the promoter). Is it doable? and if so, I'd appreciate some guidance.
Thanks.
=========================
Michael Shapira
UC Berkeley
Department of Integrative Biology
Valley Life Sciences Bldg room 5155A
Berkeley, CA 94720-3140
(510) 643-2579

Brian Lee

unread,
Jul 17, 2014, 5:48:53 PM7/17/14
to Michael Shapira, gen...@soe.ucsc.edu
Dear Michael,

Thank you for using the UCSC Genome Browser and your question about finding motifs in different genes.

You could employ a command line utility, such as tacg or findMotif, to isolate your motifs:

If you wish to use the findMotif utility review this archived mailing list question: 

You could then a create custom track of your motifs and then you could use the Table Browser to intersect those regions with a gene track.

For example after building a custom track, findMotif -motif=cacgtg /gbdb/hg19/hg19.2bit >cacgtgMotifHg19.bed, you could add it as a custom track at http://genome.ucsc.edu/cgi-bin/hgCustom. Once added you can edit it to have the name cacgtgMotifHg19 to make things simpler. (Please note that this motif will match twice, once on each strand C-G,A-T,C-G,G-C,T-A,G-C)

Then navigate to the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables, and select the "group:" Custom Tracks "track:" cacgtgMotifHg19 and then click the "create" button next to the "Intersection" option. With "All cacgtgMotifHg19 track records that have any overlap with UCSC Genes" selected you can click "submit" and change "output format" to "custom track". Then gives this new custom track a name like "cacgtg UCSC gene exons" and "get custom track in genome browser". The result will be only the motifs that fall into the UCSC gene exons, in this case 28,162.

By first creating a file of entire gene regions you could do another intersection to find where the motifs falls in all regions of a gene. From the Table Browser select "group:" Genes... and "track:" UCSC Genes or whatever gene track you wish, set "output format:" to selected fields... and click "get output" and only select "chrom" "txStart" and "txEnd". Then upload the results as custom track. The intersection will be many more motifs, 279,538 in this case, as it now includes non-coding UCSC gene regions.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Brian Lee

unread,
Jul 17, 2014, 7:37:14 PM7/17/14
to Michael Shapira, gen...@soe.ucsc.edu

Dear Michael,

You may also be interested in our Improbizer tool: http://users.soe.ucsc.edu/~kent/improbizer/improbizer.html

It allows you to submit sequences, such as for a select set of genes, and search for motifs that occur with improbable frequency (to be just chance) using a variation of the expectation maximization (EM) algorithm.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply togen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

Reply all
Reply to author
Forward
0 new messages