Hello Jenifer,
Thank you for your interest in the Genome Browser.
As you have said, there are many CTCF tables due to the different cell lines. There is no quick way to download a large number of tables through the Table Browser, it must be done individually. The following, however, contains an aggregate of all the TF data (factorbook): http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3
The complete data can be found in the following download link: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/wgEncodeRegTfbsClustered.bed.gz
Using your terminal, you can extract only the CTCF factors, e.x.:
Keep in mind you must run that in the same directory you download the file. That query will return 218370 entries. If you would like to extract only the first four fields of the bed file (chrom-start-stop-factor name), you can just the cut command, and then save it to a file with the greater than (>) symbol:
The result will be a file, ctcfMotifs.bed, with all the CTCF motif coordinates.
For more information on this file and data, you may reference the description page.
For the V3/V4 releases, a new track table format, 'factorSource' was used to represent the primary clusters table and downloads file, wgEncodeRegTfbsClusteredV3. This format consists of standard BED5 fields (see File Formats) followed by an experiment count field (expCount) and finally two fields containing comma-separated lists. The first list field (expNums) contains numeric identifiers for experiments, keyed to the wgEncodeRegTfbsClusteredInputsV3 table, which includes such information as the experiment's underlying Uniform TFBS table name, factor targeted, antibody used, cell type, treatment (if any), and laboratory source. The second list field (expScores) contains the scores for the corresponding experiments. For convenience, the file downloads directory for this track also contains a BED file, wgEncodeRegTfbsClusteredWithCellsV3, that lists each cluster with the cluster score followed by a comma-separated list of cell types.
I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.
Lou Nassar
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/AM0PR04MB4609B168063D3F6E70CB10F8B09F0%40AM0PR04MB4609.eurprd04.prod.outlook.com.