TFBS sites

91 views
Skip to first unread message

mrigaya mehra

unread,
Aug 12, 2014, 12:18:10 PM8/12/14
to gen...@soe.ucsc.edu
Dear UCSC team,
I wish to find all the TFBS identified in the human genome. How can I find it.
Although I tried downloading the conserved TFBS data however, I am unable to find the annotation of the file as to which  column denotes which information. Further, I wish to find all the TFBS and not only the conserved sites also the different RBP binding sites if that data is available in UCSC. Kindly help

--
Mrigaya Mehra
SRF, 
IHBT (CSIR)
email: mrigay...@gmail.com

Jonathan Casper

unread,
Aug 13, 2014, 7:50:32 PM8/13/14
to mrigaya mehra, gen...@soe.ucsc.edu

Hello Mrigaya,

Thank you for your question about finding TFBS data for the human genome. You can find information about what each column of a table represents by opening the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTracks, selecting your track and table, and then clicking on the "describe table schema" button. Often, this information is also available by opening the track description page and looking for the "schema" link on that page.

As an alternative to the TFBS Conserved track, you can look at the ENCODE data available on the UCSC Genome Browser for the hg19 human assembly. The ENCODE project data includes two large supertracks with binding site information. One is the parent track for all Transcription Factor Binding tracks. The description page for this supertrack is available at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeTfBindingSuper. The other track is the parent track for all ENCODE RNA Binding Proteins tracks, and the description page is available at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRbpSuper. Each of these parent tracks contains many subtracks of data from experiments that examine the binding sites in detail. If you would like to download the data for your own analysis, links to our download server can be found at http://genome.ucsc.edu/ENCODE/downloads.html.

You may also be interested in the Integrated Regulation ENCODE supertrack page, http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeReg, where you will find a link to our clustered TFBS track http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3. This clustered track contains a summary of the TFBS data from the subtracks that I suggested above. Click the "schema" link to see a description for each column in the table. Click the "metadata" link and you can directly download the file by clicking the fileName link. You can separate out just the first few columns of this bed file if you are not interested in knowing which experiments and cell types these transcription factors were observed (e.g., with the Unix command "awk {'print $1,$2,$3,$4'} wgEncodeRegTfbsClusteredV3.bed").

I also highly recommend examining the ENCODE FAQ at http://genome.ucsc.edu/ENCODE/FAQ/index.html, which includes links to tutorials and other resources. NHGRI, in particular, provides some tutorials at http://www.genome.gov/27553900 that may be useful for you.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Reply all
Reply to author
Forward
0 new messages