About ENCODE TFBS peaks

254 views
Skip to first unread message

xiehb

unread,
Sep 12, 2014, 9:23:40 AM9/12/14
to genome
Dear UCSC Genome Informatics Group,
   I am a UCSC genome browser user from Kunming Institute of Zoology, the Chinese Academy of Sciences. I encountered a problem in using the ENCODE annotation of human genome. Where can I find the data (chromosomal coordinates) for Factorbook-identified canonical motifs (green highlighted bar) in transcription factor binding sites. For example, the motifs (green) in RUNX3 and BHLHE40 binding sites. I want to know the coordinates for all annotated motifs in the human genome.



  I cannot find the information from the directory http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/ 

  Your help is really appreciated.

Sincerely yours,

Hai-Bing Xie Ph.D
Kunming Institute of Zoology
CAS

Brian Lee

unread,
Sep 12, 2014, 3:16:52 PM9/12/14
to xiehb, genome
Dear Hai-Bing Xie,

Thank you for using the UCSC Genome Browser and your question about the chromosomal coordinates for Factorbook-identified canonical motifs seen as green highlighted bars in the clustered transcription factor binding sites track.

The Factorbook motif identifications and localizations where provided by the Zlab (http://zlab.umassmed.edu/zlab/) at the UMass Medical School and are available in two tables, the first providing the position of each factorbook item, factorbookMotifPos, the second providing the position weight matrix, factorbookMotifPwm.

These are located in the general hg19 annotation database section of our hgdownload server along with a corresponding .sql file:

You can access these table via the Public MySQL server: http://genome.ucsc.edu/goldenPath/help/mysql.html
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e 'show tables like "factorbook%";' hg19
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e 'select * from factorbookMotifPos;' hg19

There are two additional tables, factorbookMotifCanonical and factorbookGeneAlias, that help map the information from the Zlab to the target terms used in the UCSC Genome Browser.

You can alternatively use the hg19 Table Browser to access these tables: http://genome.ucsc.edu/cgi-bin/hgTables
1. Set the "group:" to "All tables" 
2. Set the table to "factorbookMotifPos" 
3. Click "genome" to get the entire table, or click the "define regions" button and get enter coordinates of interest, such as "chrX 14000000 150000000".
4. Click "get output". If desired, you could set "output format" to "custom track" and see the results in the browser.

What is displayed in the wgEncodeRegTfbsClustered track is the result of a computational mapping of the factorbookMotifPos items to the clustered TFBS locations filtered for the highest score per cluster. There is not an easy path to obtain these exact mappings, but you can perform similar operations with the Table Browser.

For example if you were looking at the region around SOD1, chr21:33,031,597-33,041,570, you could enter this as the defined region in the Table Browser (step 3).
4. Click the "create" button next to "filter".
5. Set the "score" is ">" then a desired amount, such as "2" and click "submit".
6. Click the "create" button next to "intersection".
7. Select "group: Regulation" and "track: Txn Factor ChIP" and "table: wgEncodeRegTfbsClusteredV3" then click "submit".
8. Click "get output". If desired, you could set "output format" to "custom track" and see the results in the browser.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages