downloading all ENCODE transcription factor binding data from a certain region of the genome

Eric Foss

unread,

Feb 6, 2015, 5:32:08 PM2/6/15

to gen...@soe.ucsc.edu

Dear UCSC Genome Browser,

I would like to download all of the ENCODE transcription factor binding data in the vicinity of a gene I’m interested in. Is this possible with the Table Browser or some other UCSC Genome Browser tool?

Thank you.

Eric

Brian Lee

unread,

Feb 6, 2015, 7:44:03 PM2/6/15

to Eric Foss, gen...@soe.ucsc.edu

Dear Eric,

Thank you for using the UCSC Genome Browser and your question about downloading all of the ENCODE transcription factor binding data in the vicinity of a gene of interest.

You can use the Table Browser to access this information. For example, here is a session where an example region of interest is highlighted near the start of a gene, SIRT1. Below are steps to acquire the Transcription Factor data for this region, chr10:69,637,000-69,639,000.

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.SIRT1.TFBS.NFYB

First go to the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables and set the "Group:" to "Regulation", the "track:" to "Txn Factor ChIP" and the "table:" to "wgEncodeRegTfbsClusteredV3". Then select "position" and enter coordinates of interest: chr10:69,637,000-69,639,000. By clicking "get output" you will see the following output:

#bin    chrom    chromStart    chromEnd    name    score    expCount    expNums    expScores
1116    chr10    69637728    69638048    NFYB    154    1    517    154
1116    chr10    69637926    69638250    CEBPB    430    4    212,343,426,477    275,262,218,430
1116    chr10    69638035    69638275    USF1    139    1    161    139

The chrom, chromStart, and chromEnd fields give the regions where named transcription factors like NFYB have been seen, and the score gives a relative indication of the strength of the signal seen in experiments, while expCount indicates the number of experiments binding has been observed.

What you will notice in this session is that this wgEncodeRegTfbsClusteredV3 represents a processed summarized condensation of hundreds of ChIP-seq experiments. If you are interested in looking deeper into the underlying files that produced the clustered summary, you can click the boxes, such as the one for NFYB, and then click the "metadata" link for "more info". There you will see the lab, antibody, and cell type and the uniform processed peak track, wgEncodeAwgTfbsSydhK562NfybUniPk, that was used in a clustering algorithm to generate the clusters track.

You will also see a UCSC Accession, wgEncodeEH002024, when looking at metadata details of cluster items. You can use the accessions like wgEncodeEH002024 to also find the underlying raw signal track, if desired. For example, with the Track Search tool, http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hgt_tSearch=1&tsCurTab=simpleTab&tsSimple=wgEncodeEH002024, you click search and then click a similar blue metaData arrows next to the "K562 NF-YB Standard ChIP-seq Signal from ENCODE/SYDH " line to see a displayed "fileName" such "wgEncodeSydhTfbsK562NfybStdSig.bigWig", which you can download for the entire genome. Conversely, if you want this signal data for only your region, you can return to the Table Browser, set the "Group" to "All Tables", "table:" "wgEncodeSydhTfbsK562NfybStdSig" and "position:" chr10:69635813-69645132 and "get output" as "data points".

In summary, the Table Browser output from wgEncodeRegTfbsClusteredV3 (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3) provides a processed clustered coordinate condensation of hundreds of uniformly processed ChIP-seq files (seehttp://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgTfbsUniform for details), that were in turn generated from separate laboratories for various cell lines (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeTfBindingSuper). To read more about the background of these data sources please see the related Track Description Pages in this paragraph.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

Eric

--

Eric Foss

unread,

Feb 9, 2015, 3:16:38 PM2/9/15

to Brian Lee, gen...@soe.ucsc.edu

Dear Brian,

Thank you so much for this prompt and very helpful email. I really appreciate the help.