Dear Eric,
Thank you for using the UCSC Genome Browser and your question about downloading all of the ENCODE transcription factor binding data in the vicinity of a gene of interest.
You can use the Table Browser to access this information. For example, here is a session where an example region of interest is highlighted near the start of a gene, SIRT1. Below are steps to acquire the Transcription Factor data for this region, chr10:69,637,000-69,639,000.
First go to the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables and set the "Group:" to "Regulation", the "track:" to "Txn Factor ChIP" and the "table:" to "wgEncodeRegTfbsClusteredV3". Then select "position" and enter coordinates of interest: chr10:69,637,000-69,639,000. By clicking "get output" you will see the following output:
#bin chrom chromStart chromEnd name score expCount expNums expScores 1116 chr10 69637728 69638048 NFYB 154 1 517 154 1116 chr10 69637926 69638250 CEBPB 430 4 212,343,426,477 275,262,218,430 1116 chr10 69638035 69638275 USF1 139 1 161 139
The chrom, chromStart, and chromEnd fields give the regions where named transcription factors like NFYB have been seen, and the score gives a relative indication of the strength of the signal seen in experiments, while expCount indicates the number of experiments binding has been observed.
What you will notice in this session is that this wgEncodeRegTfbsClusteredV3 represents a processed summarized condensation of hundreds of ChIP-seq experiments. If you are interested in looking deeper into the underlying files that produced the clustered summary, you can click the boxes, such as the one for NFYB, and then click the "metadata" link for "more info". There you will see the lab, antibody, and cell type and the uniform processed peak track, wgEncodeAwgTfbsSydhK562NfybUniPk, that was used in a clustering algorithm to generate the clusters track.
You will also see a UCSC Accession, wgEncodeEH002024, when looking at metadata details of cluster items. You can use the accessions like wgEncodeEH002024 to also find the underlying raw signal track, if desired. For example, with the Track Search tool, http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hgt_tSearch=1&tsCurTab=simpleTab&tsSimple=wgEncodeEH002024, you click search and then click a similar blue metaData arrows next to the "K562 NF-YB Standard ChIP-seq Signal from ENCODE/SYDH " line to see a displayed "fileName" such "wgEncodeSydhTfbsK562NfybStdSig.bigWig", which you can download for the entire genome. Conversely, if you want this signal data for only your region, you can return to the Table Browser, set the "Group" to "All Tables", "table:" "wgEncodeSydhTfbsK562NfybStdSig" and "position:" chr10:69635813-69645132 and "get output" as "data points".
In summary, the Table Browser output from wgEncodeRegTfbsClusteredV3 (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3) provides a processed clustered coordinate condensation of hundreds of uniformly processed ChIP-seq files (seehttp://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgTfbsUniform for details), that were in turn generated from separate laboratories for various cell lines (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeTfBindingSuper). To read more about the background of these data sources please see the related Track Description Pages in this paragraph.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group
Eric
--