Dear Hai-Bing Xie,
Thank you for using the UCSC Genome Browser and your question about the chromosomal coordinates for Factorbook-identified canonical motifs seen as green highlighted bars in the clustered transcription factor binding sites track.
The Factorbook motif identifications and localizations where provided by the Zlab (
http://zlab.umassmed.edu/zlab/) at the UMass Medical School and are available in two tables, the first providing the position of each factorbook item, factorbookMotifPos, the second providing the position weight matrix, factorbookMotifPwm.
These are located in the general hg19 annotation database section of our hgdownload server along with a corresponding .sql file:
There are two additional tables, factorbookMotifCanonical and factorbookGeneAlias, that help map the information from the Zlab to the target terms used in the UCSC Genome Browser.
1. Set the "group:" to "All tables"
2. Set the table to "factorbookMotifPos"
3. Click "genome" to get the entire table, or click the "define regions" button and get enter coordinates of interest, such as "chrX 14000000 150000000".
4. Click "get output". If desired, you could set "output format" to "custom track" and see the results in the browser.
What is displayed in the wgEncodeRegTfbsClustered track is the result of a computational mapping of the factorbookMotifPos items to the clustered TFBS locations filtered for the highest score per cluster. There is not an easy path to obtain these exact mappings, but you can perform similar operations with the Table Browser.
For example if you were looking at the region around SOD1, chr21:33,031,597-33,041,570, you could enter this as the defined region in the Table Browser (step 3).
4. Click the "create" button next to "filter".
5. Set the "score" is ">" then a desired amount, such as "2" and click "submit".
6. Click the "create" button next to "intersection".
7. Select "group: Regulation" and "track: Txn Factor ChIP" and "table: wgEncodeRegTfbsClusteredV3" then click "submit".
8. Click "get output". If desired, you could set "output format" to "custom track" and see the results in the browser.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to
gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group