Transcription factor motifs - wgEncodeRegTfbsClusteredV3.bed.gz

129 views
Skip to first unread message

James Studd

unread,
Feb 27, 2015, 12:29:22 PM2/27/15
to gen...@soe.ucsc.edu

Hi,

 

I was hoping to find a bed file which contains the mapped motifs for transcription factors. It’s not immediately clear from looking at the bed file above or the parent directory  ‘http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/’ which file has this information. Is it contained within the wgEncodeRegTfbsClusteredV3.bed.gz file? If so how can you distinguish ChIP seq peaks from motifs?

 

Many thanks

 

James Studd | Postdoctoral Research Fellow

The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey | SM2 5NG

T +44 208 722 4113E James...@icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London

Facebook www.facebook.com/theinstituteofcancerresearch

Making the discoveries that defeat cancer

ICR

 


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

Brian Lee

unread,
Feb 27, 2015, 6:20:48 PM2/27/15
to James Studd, gen...@soe.ucsc.edu

Dear James,

Thank you for using the UCSC Genome Browser and your question about obtaining a bed file for the mapped motifs for transcription factors in the TFBS Clusters track: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3

Usually where you looked would be the right place to find the downloadable files. Instead, to avoid the kind of possible confusion you are hinting at, and since the motifs are not universal to all clusters, there was discussion to release this to a downloads location for a separate track devoted to these factorbook generated motifs, but that hasn't happened yet. You can obtain this information now, however, from both the Table Browser or the Public MySql server.

To use the MySQL server, please see the resource page, http://genome.ucsc.edu/goldenPath/help/mysql.html, and then you could use a command like the following, or a variation of it to obtain parts or all of the tables of interest:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'select * from factorbookMotifPos;' hg19 > factorbookMotifPos

The other tables of interest would be the factorbookMotifPwm table and the two tables that help coordinate the factorbook TFBS terms to the names used at UCSC: factorbookMotifCanonical and factorbookGeneAlias. Those tables help translate terms, for example, bot UCSC and factorbook use BRCA1, but the differ with CTBP2 and CtBP2, or EP300 and P300.

You can also access all these tables from the Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables

1. Select the hg19 human assembly.
2. Set "group:" to "All Tables" 
3. From "table:" select the factorbookMotifPos table.
*At this point you could use the filter or intersection tools to limit the output to factors or locations of interest (via a bed file of coordinates, see more about the Table Browser here: http://www.openhelix.com/cgi/tutorialInfo.cgi?id=28)
4. Set "output format:" to either "custom track" or "BED" and click "get output".

If what you are desiring is just the motifs displayed in the wgEncodeRegTfbsClusteredV3 track, it becomes a little more complicated, as algorithms are applied to limit the display of motifs from the original factorbookMotifPos table to the highest score per region. Here is a session link to help exemplify this issue where multiple motifs for NFY exist (NFYB at UCSC), and only one is mapped to the cluster:http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.factorbookMotifPos

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply togen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages