Genome Browser query

19 views
Skip to first unread message

brea iglesias jenifer

unread,
Jun 15, 2020, 9:36:29 AM6/15/20
to gen...@soe.ucsc.edu
Dear Sir or Madam,

I am a PhD student from the University of Santiago de Compostela and I was trying to download the coordinates where CTCF motifs were described in the UCSC Genome Browser, no matter in which cell line. I was trying to use the table browser, but there are so much different tables to download. I would be so grateful if you could explain me an efficient way to download all the positions, and its signal, where a CTCF motif has been described in the databases of the genome browser.


Yours faithfully,
Jenifer Brea
PhD Student - Genomes & Disease Group
Molecular Medicine and Chronic Diseases Research Centre (CIMUS)
University of Santiago de Compostela, Spain

Luis Nassar

unread,
Jun 16, 2020, 11:58:44 AM6/16/20
to brea iglesias jenifer, gen...@soe.ucsc.edu

Hello Jenifer,

Thank you for your interest in the Genome Browser.

As you have said, there are many CTCF tables due to the different cell lines. There is no quick way to download a large number of tables through the Table Browser, it must be done individually. The following, however, contains an aggregate of all the TF data (factorbook): http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3

The complete data can be found in the following download link: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/wgEncodeRegTfbsClustered.bed.gz

Using your terminal, you can extract only the CTCF factors, e.x.:

$ zgrep CTCF wgEncodeRegTfbsClustered.bed.gz

Keep in mind you must run that in the same directory you download the file. That query will return 218370 entries. If you would like to extract only the first four fields of the bed file (chrom-start-stop-factor name), you can just the cut command, and then save it to a file with the greater than (>) symbol:

$ zgrep CTCF wgEncodeRegTfbsClustered.bed.gz | cut 1-4 > ctcfMotifs.bed

The result will be a file, ctcfMotifs.bed, with all the CTCF motif coordinates.

For more information on this file and data, you may reference the description page.

For the V3/V4 releases, a new track table format, 'factorSource' was used to represent the primary clusters table and downloads file, wgEncodeRegTfbsClusteredV3. This format consists of standard BED5 fields (see File Formats) followed by an experiment count field (expCount) and finally two fields containing comma-separated lists. The first list field (expNums) contains numeric identifiers for experiments, keyed to the wgEncodeRegTfbsClusteredInputsV3 table, which includes such information as the experiment's underlying Uniform TFBS table name, factor targeted, antibody used, cell type, treatment (if any), and laboratory source. The second list field (expScores) contains the scores for the corresponding experiments. For convenience, the file downloads directory for this track also contains a BED file, wgEncodeRegTfbsClusteredWithCellsV3, that lists each cluster with the cluster score followed by a comma-separated list of cell types.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/AM0PR04MB4609B168063D3F6E70CB10F8B09F0%40AM0PR04MB4609.eurprd04.prod.outlook.com.

brea iglesias jenifer

unread,
Jun 17, 2020, 12:16:58 PM6/17/20
to Luis Nassar, gen...@soe.ucsc.edu
Thank you, this was very useful to me.

Jenifer

De: Luis Nassar <lrna...@ucsc.edu>
Enviado: miércoles, 17 de junio de 2020 3:58
Para: brea iglesias jenifer <jenife...@rai.usc.es>
Cc: gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Asunto: Re: [genome] Genome Browser query
 
Reply all
Reply to author
Forward
0 new messages